RAG vs fine-tuning: which one your e-commerce business actually needs
When to use retrieval, when to fine-tune, and when to do both. With a flowchart your team can actually use.
- rag
- fine-tuning
- ecommerce
The default answer is RAG
If you're an e-commerce business asking "should we fine-tune a model on our catalog?", the answer is almost always: not yet. Start with retrieval-augmented generation (RAG). It's cheaper, faster to iterate, and handles the most common cases — product Q&A, sizing, fit, recommendations grounded in actual inventory.
When to use RAG (most cases)
- Product catalog Q&A — "does this jacket come in tall sizes?"
- Policy and shipping — "what's your return window for sale items in EU?"
- Recommendations grounded in inventory — "what would go with this dress that's in stock in size 8?"
- Comparisons — "what's the difference between these two SKUs?"
RAG works because the answers live in structured data (inventory, policy docs, product descriptions). You don't need the model to learn anything new — you need it to retrieve, ground, and explain.
When to fine-tune
- Brand voice on long-form content — product descriptions, editorial, post-purchase emails. Fine-tuning teaches the model to write like you, which prompting can't fully replicate at scale.
- Visual classification — defective vs OK photos, style tags, color naming. Vision fine-tuning beats prompting decisively on these tasks.
- Speed/cost at scale — once you've nailed prompt+RAG, fine-tuning a smaller model on your distilled outputs cuts inference cost 5–10x.
When to do both
The mature setup is fine-tune + retrieval. You fine-tune for voice and structure; you retrieve for current data. Catalog changes every day — that's RAG. Brand voice changes never — that's fine-tuning.
The flowchart
- Is the data you need static or changing daily? → static = fine-tune candidate, dynamic = RAG.
- Is the task generation in your voice, or classification / fact retrieval? → voice = fine-tune, retrieval = RAG.
- Are you paying $$$/month in inference at scale? → distill a fine-tune later, after RAG works.
Build sequence we recommend
Phase 1 (4–6 weeks): RAG over catalog + policy. Citation-required. Ship. Phase 2 (2–4 weeks): If you're seeing voice drift on long-form, fine-tune a small model on your best-performing product copy. Phase 3 (ongoing): As inference cost matters, distill a smaller model from your prompt+RAG outputs.
Most e-commerce clients never reach Phase 3. That's fine.