Playbook7 min5 pages

RAG that actually grounds

Retrieval, reranking, citations, refusal. The pattern that stops hallucination.

Most RAG demos hallucinate within 10 minutes of real use. Production RAG doesn't. The difference is process: hybrid search, reranking, citation-required prompts, and refusal patterns. We ship this on every chatbot engagement.

1. Hybrid search beats vector-only

Vector search misses exact-match queries — names, error codes, SKU numbers. Keyword search misses semantic intent. Hybrid (BM25 + vector) wins.

We score both, normalize, weight, and merge. The weight is tuned per domain. Most projects land 70/30 vector/BM25.

2. Reranking is the cheapest precision lift

Retrieve top 50 candidates. Rerank to top 5 with Cohere Rerank or a cross-encoder. The model only sees the top 5.

Adds ~150ms latency. Removes most retrieval misses. We've never shipped production RAG without reranking and don't plan to.

3. Chunk smartly, not uniformly

Fixed-size chunking is fine for v1 but you'll hit walls. For docs, chunk by heading. For codebases, chunk by function. For conversations, chunk by turn.

Always include the chunk's parent heading or context in the embedded text. Otherwise the model retrieves text out of context.

4. Citation-required prompting

System prompt instructs: cite the source for every fact. Refuse to answer if you can't cite. Users see citations as inline footnote links.

Hallucination drops 80%+ with citation-required prompting. The rest is the eval suite catching the long tail.

5. Refusal patterns

If retrieval confidence is low, refuse and route. Don't make it up. The agent that says 'I don't know — let me get someone' beats the agent that confidently fabricates.

We tune this with a confidence score on retrieval + an LLM-as-judge confidence on the generated answer. Threshold-based refusal triggers the escalation path.

Common failure mode

Skipping reranking 'to keep it simple'. The model retrieves the wrong chunks, hallucinates, and you blame RAG.

Common skip

Documenting your refusal patterns. If your QA team can't list them, your users will find them.