RAG that actually grounds
Retrieval, reranking, citations, refusal. The pattern that stops hallucination.
Most RAG demos hallucinate within 10 minutes of real use. Production RAG doesn't. The difference is process: hybrid search, reranking, citation-required prompts, and refusal patterns. We ship this on every chatbot engagement.
1. Hybrid search beats vector-only
Vector search misses exact-match queries — names, error codes, SKU numbers. Keyword search misses semantic intent. Hybrid (BM25 + vector) wins.
We score both, normalize, weight, and merge. The weight is tuned per domain. Most projects land 70/30 vector/BM25.
2. Reranking is the cheapest precision lift
Retrieve top 50 candidates. Rerank to top 5 with Cohere Rerank or a cross-encoder. The model only sees the top 5.
Adds ~150ms latency. Removes most retrieval misses. We've never shipped production RAG without reranking and don't plan to.
3. Chunk smartly, not uniformly
Fixed-size chunking is fine for v1 but you'll hit walls. For docs, chunk by heading. For codebases, chunk by function. For conversations, chunk by turn.
Always include the chunk's parent heading or context in the embedded text. Otherwise the model retrieves text out of context.
4. Citation-required prompting
System prompt instructs: cite the source for every fact. Refuse to answer if you can't cite. Users see citations as inline footnote links.
Hallucination drops 80%+ with citation-required prompting. The rest is the eval suite catching the long tail.
5. Refusal patterns
If retrieval confidence is low, refuse and route. Don't make it up. The agent that says 'I don't know — let me get someone' beats the agent that confidently fabricates.
We tune this with a confidence score on retrieval + an LLM-as-judge confidence on the generated answer. Threshold-based refusal triggers the escalation path.
Common failure mode
Skipping reranking 'to keep it simple'. The model retrieves the wrong chunks, hallucinates, and you blame RAG.
Common skip
Documenting your refusal patterns. If your QA team can't list them, your users will find them.