Blog
What is semantic search?

Jonathan Bree

Keyword search finds documents that contain the words you typed. Semantic search finds documents that mean what you meant. Here is how it works, where it excels, and where it falls short.
Semantic search is a retrieval approach that finds information based on meaning rather than exact word matches. A user searching for "how to reduce latency" will surface results about "improving response time" and "optimising performance" even if those exact words were never in the query. The system understands that these concepts are related. It does not require the vocabulary to match.
This is a meaningful advance over keyword search for many retrieval problems. It is also not sufficient on its own for others. Understanding both sides requires understanding how it works.
How semantic search works
At the core of semantic search is the embedding: a mathematical representation of text as a point in a high-dimensional vector space. Words, sentences, and documents that mean similar things end up close together in that space. Things that mean different things end up further apart.
When you run a semantic search, two things happen. First, the query is embedded: converted into a vector using the same embedding model used to index the corpus. Second, the system finds the vectors in the index that are closest to the query vector, typically measured by cosine similarity or dot product. The closest vectors correspond to the most semantically similar documents or passages, which are returned as results.
The embedding model is what makes this work. Models like OpenAI's text-embedding-ada-002, Cohere's embed models, or open-source alternatives like sentence-transformers learn to represent meaning in vector space by training on large amounts of text. The quality of the embeddings determines the quality of the retrieval.
Where semantic search excels
Vocabulary variation. A user who described something as "fast" in one context and "low latency" in another will have both surface when either term is queried. Semantic search handles paraphrase, synonyms, and varied vocabulary naturally. Keyword search requires the exact term.
Intent matching. Queries express intent, not just keywords. "How do I make my agent remember things" is a question about agent memory infrastructure. Semantic search can surface relevant results even when the documents use different terminology to describe the solution.
Cross-lingual retrieval. Multilingual embedding models can retrieve documents in one language using a query in another, because meaning is represented in a shared vector space regardless of vocabulary.
Long-tail queries. Rare or highly specific queries that might not match any exact keyword in a document can still surface relevant results if the meaning is represented in the embedding space.
Where semantic search falls short
Precision on specific terms. Semantic search optimises for conceptual similarity, which means it trades exactness for coverage. A query containing a specific error code, a version number, a proper noun, or an exact identifier may surface results that are conceptually related but do not contain the specific term. Keyword search would find it immediately. See semantic search vs keyword search for AI agents for a full treatment of when each approach is appropriate.
Scale degradation. As the corpus grows, the embedding space fills up. Concepts that are semantically adjacent but factually distinct start occupying similar regions. Retrieval quality degrades in ways that are difficult to detect because confidence scores remain high. This is semantic collapse: the point at which similarity becomes an unreliable guide to relevance.
No temporal awareness. Embeddings do not have an inherent relationship with time. A fact stored six months ago is as retrievable as a fact stored yesterday, with no signal about which is current. For static corpora this does not matter. For evolving knowledge bases it produces memory drift.
No contradiction detection. When two documents make conflicting claims, semantic search has no mechanism for identifying the conflict or determining which is more reliable. Both may surface with similar scores on a relevant query.
Sensitivity to embedding quality. The retrieval is only as good as the embedding model. Domain-specific terminology, rare concepts, and highly technical content may be poorly represented in general-purpose embedding models, producing retrieval failures that are hard to diagnose.
Semantic search in agent memory
For agent memory specifically, semantic search is a necessary component but not a sufficient one. M-1's retrieval architecture treats semantic similarity as one signal among several rather than the primary retrieval mechanism.
The scoring function combines semantic similarity with lexical precision, temporal salience, importance scoring, and cross-memory coherence. Queries are decomposed into parallel retrieval passes rather than run as a single vector search. Results are re-ranked for coherence before being assembled into context.
This multi-signal approach is what produces M-1's benchmark results. Systems that rely primarily on semantic search, even with larger and more expensive models, score consistently lower on LongMemEval's harder categories, particularly temporal reasoning and multi-session synthesis, because semantic similarity alone cannot address those problems regardless of how good the embeddings are.
Semantic search is the foundation. Production agent memory requires the full architecture on top of it. See why a vector database is not a memory system for what that architecture looks like.
Exabase's Memory API implements hybrid retrieval combining semantic and lexical signals with temporal and coherence scoring. See the docs to get started.







