One platform, operational data + vectors
Atlas AI search and retrieval unifies your documents and their vector embeddings in a single database. There is no separate vector store to stand up, operate, or keep in sync with the source of truth — which removes an entire class of dual-write and staleness bugs that plague bolt-on vector stacks.
- Vectors next to your data. Embeddings, documents, full-text indexes, and metadata live together, queried through one API and governed by one access model.
- No sync layer. When a document changes, the retrieval index sits on the same data — not a downstream copy you have to reconcile.
- Deploy anywhere. Atlas runs on AWS, Azure, and Google Cloud, so retrieval sits in the same cloud and region as the rest of your stack.
- Framework-native. Works with AI orchestration frameworks including LangChain, LlamaIndex, and CrewAI, so the retrieval layer drops into the agent and RAG code you are already writing.
Atlas Vector Search
Atlas Vector Search runs semantic, similarity search over vector embeddings. Rather than matching exact keywords, it finds the documents whose meaning is closest to the query — the retrieval primitive underneath modern AI search.
- Powers RAG. Retrieve the most relevant chunks for a query and feed them, with citations, to the generation step.
- Powers agentic retrieval. Give agents a semantic memory and knowledge layer they can query mid-task, not just a static prompt.
- Powers semantic search. Search products, documents, tickets, and knowledge bases by intent, not vocabulary — robust to paraphrasing and synonyms.
Hybrid search (vector + full-text)
Pure semantic search misses queries where exact terms carry the meaning — a code identifier, a policy number, a proper noun. Pure keyword search misses queries where intent matters more than the words used. Hybrid search fuses both.
- Atlas Search provides Lucene-based full-text search on the same data — mature lexical, BM25-style relevance.
- Atlas Vector Search provides the semantic half — similarity over embeddings.
- Hybrid retrieval combines vector (semantic) with full-text (lexical/BM25) results, so both kinds of query land the right documents. On most enterprise corpora, hybrid is consistently more relevant than either approach alone — and it is the default humaineeti reaches for in production RAG.
Voyage AI embeddings & reranking
Voyage AI provides MongoDB's embedding and reranking models — state-of-the-art vectorisation and relevance scoring. humaineeti picks the right model per corpus and language, then layers reranking on top to sharpen the final ranking.
voyage-4-large
Best general-purpose & multilingual retrieval quality.
- The default when retrieval accuracy is paramount.
- The voyage-4 series shares one embedding space, so models mix cleanly.
voyage-context-3
Contextualised chunk embeddings.
- Each chunk is embedded with its surrounding document context.
- Higher accuracy on long, chunked documents; multilingual.
voyage-multimodal-3.5
Rich multimodal embeddings.
- Vectorises interleaved text and visuals with one model.
- PDF screenshots, slides, tables, figures, and video.
rerank-2.5
Generalist, instruction-following reranker.
- Re-scores candidates by true query relevance; multilingual.
- Works over embedding or lexical (BM25/TF-IDF) results.
The family extends further: voyage-code-3 for code retrieval with lower-dimensional quantized embeddings, domain-specialised models for finance and law, and a lighter rerank-lite-1 reranker. Quantized embeddings cut storage and latency cost; multimodal embeddings extend retrieval to images and video — including intelligent video search that pinpoints the exact moment a query refers to. (Voyage AI on MongoDB Atlas is a Preview capability.)
How humaineeti engineers RAG on Atlas
The model is the easy part. The retrieval layer is where accuracy is won or lost, and it is what humaineeti builds and evaluates end to end on Atlas.
- Chunking. Split source documents into retrieval units sized for the corpus and the questions users actually ask — not arbitrary fixed windows.
- Embedding. Choose the right Voyage AI model per corpus (general voyage-4-large, contextual voyage-context-3, code, multimodal, finance, legal), with quantized embeddings where cost and latency matter.
- Hybrid retrieval. Combine Atlas Vector Search and Atlas Search so both semantic and exact-term queries land the right documents.
- Reranking. Apply a Voyage AI reranker (rerank-2.5) over the candidate set to lift the most relevant results to the top before generation.
- Citation-grounded answers. Every generated answer carries its sources, so responses are traceable and auditable rather than free-floating.
- Evaluation. Measure retrieval and answer quality with Ragas and custom scorers via Eval@Core — on your corpus and queries, not vendor benchmarks — so changes are proven, not assumed.
Below, the shape of a retrieval-then-generate loop — conceptual, engine-agnostic on the model side:
# pseudocode — retrieval on Atlas, model-agnostic generation
q_vec = voyage.embed(query) # Voyage AI embedding
sem = atlas.vector_search(q_vec, k=50) # Atlas Vector Search
lex = atlas.text_search(query, k=50) # Atlas Search (BM25)
cands = fuse(sem, lex) # hybrid retrieval
top = voyage.rerank(query, cands, k=8) # Voyage AI reranker (rerank-2.5)
answer = llm.generate(query, context=top) # cite every source
Governance, residency & evaluation
Retrieval touches your most sensitive data, so humaineeti builds the governance in from the start — especially for India-based and regulated workloads.
- India data residency. Atlas deploys on AWS, Azure, and Google Cloud, so retrieval can sit in an Indian region with your primary data.
- DPDP-aware governance. Access controls, provenance metadata on retrieved chunks, and deletion that reaches the embeddings — not just the primary documents.
- Evaluation as a control. Ragas and custom scorers via Eval@Core keep retrieval quality measured over time, so regressions surface before users do.
- Citation grounding. Traceable answers make review, audit, and accountability practical rather than aspirational.
The stack
A single, coherent retrieval stack humaineeti stands up on your operational data:
- Database + vectors. MongoDB Atlas — operational data and embeddings in one place, on AWS, Azure, or Google Cloud.
- Retrieval. Atlas Vector Search (semantic) + Atlas Search (Lucene full-text), fused into hybrid retrieval.
- Embeddings & reranking. Voyage AI — voyage-4-large, voyage-context-3, voyage-multimodal-3.5, voyage-code-3, finance/legal models; rerankers rerank-2.5 and rerank-lite-1.
- Orchestration. LangChain, LlamaIndex, CrewAI for the agent and RAG application layer.
- Use cases. RAG applications, fraud prevention (real-time analytics with semantic search), claims-processing modernization, code search, and legal/financial retrieval.
Frequently asked questions
What is MongoDB Atlas Vector Search?
Semantic, similarity search over vector embeddings stored alongside your operational data. It finds the documents whose meaning is closest to a query — the retrieval layer that powers RAG, agentic retrieval, and semantic search. Because vectors live in the same database as your documents, there is no separate vector store to run or sync.
What is hybrid search?
Hybrid search combines Atlas Vector Search (semantic) with Atlas Search (Lucene-based full-text, BM25 lexical matching) and fuses the results. It catches queries where meaning matters and queries where exact terms matter. On most enterprise corpora it is more relevant than either approach alone.
What is Voyage AI?
MongoDB's embedding and reranking models. Featured embeddings include voyage-4-large (general-purpose, multilingual), voyage-context-3 (contextualised chunks), and voyage-multimodal-3.5 (text + visuals, incl. video); voyage-code-3 and finance/legal models extend the family. Rerankers rerank-2.5 (generalist, instruction-following) and the lighter rerank-lite-1 re-score query-document relevance over embedding or lexical (BM25/TF-IDF) results.
Do I need a separate vector database?
Not with Atlas. It unifies operational data and vector embeddings in one database, so there is no separate vector store to provision, operate, or synchronise. That removes dual-write and staleness bugs and simplifies governance, backup, and access control.