HyEm: Query-Adaptive Hyperbolic Retrieval for Biomedical Ontologies via Euclidean Vector Indexing

Abstract

Retrieval-augmented generation (RAG) for biomedical knowledge faces a hierarchy-aware ontology grounding challenge: resources like HPO, DO, and MeSH use deep ``is-a" taxonomies, yet production stacks rely on Euclidean embeddings and ANN indexes. While hyperbolic embeddings suit hierarchical representation, they face two barriers: (i) lack of native vector database support, and (ii) risk of underperforming on entity-centric queries where hierarchy is irrelevant. We present HyEm, a lightweight retrieval layer integrating hyperbolic ontology embeddings into existing Euclidean ANN infrastructure. HyEm learns radius-controlled hyperbolic embeddings, stores origin log-mapped vectors in standard Euclidean databases for candidate retrieval, then applies exact hyperbolic reranking. A query-adaptive gate outputs continuous mixing weights, combining Euclidean semantic similarity with hyperbolic hierarchy distance at reranking time. Our bi-Lipschitz analysis under radius constraints provides practical guidance for ANN oversampling and dimensionality.Experiments on biomedical ontology subsets demonstrate HyEm preserves 94-98% of Euclidean baseline performance on entity-centric queries while substantially improving hierarchy-navigation and mixed-intent queries, maintaining indexability at moderate oversampling.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…