Apr 7, 20267 min read

RAG vs fine-tuning: which one your e-commerce business actually needs

When to use retrieval, when to fine-tune, and when to do both. With a flowchart your team can actually use.

rag
fine-tuning
ecommerce

The default answer is RAG

If you're an e-commerce business asking "should we fine-tune a model on our catalog?", the answer is almost always: not yet. Start with retrieval-augmented generation (RAG). It's cheaper, faster to iterate, and handles the most common cases — product Q&A, sizing, fit, recommendations grounded in actual inventory.

When to use RAG (most cases)

Product catalog Q&A — "does this jacket come in tall sizes?"
Policy and shipping — "what's your return window for sale items in EU?"
Recommendations grounded in inventory — "what would go with this dress that's in stock in size 8?"
Comparisons — "what's the difference between these two SKUs?"

RAG works because the answers live in structured data (inventory, policy docs, product descriptions). You don't need the model to learn anything new — you need it to retrieve, ground, and explain.

When to fine-tune

Brand voice on long-form content — product descriptions, editorial, post-purchase emails. Fine-tuning teaches the model to write like you, which prompting can't fully replicate at scale.
Visual classification — defective vs OK photos, style tags, color naming. Vision fine-tuning beats prompting decisively on these tasks.
Speed/cost at scale — once you've nailed prompt+RAG, fine-tuning a smaller model on your distilled outputs cuts inference cost 5–10x.

When to do both

The mature setup is fine-tune + retrieval. You fine-tune for voice and structure; you retrieve for current data. Catalog changes every day — that's RAG. Brand voice changes never — that's fine-tuning.

The flowchart

Is the data you need static or changing daily? → static = fine-tune candidate, dynamic = RAG.
Is the task generation in your voice, or classification / fact retrieval? → voice = fine-tune, retrieval = RAG.
Are you paying $$$/month in inference at scale? → distill a fine-tune later, after RAG works.

Build sequence we recommend

Phase 1 (4–6 weeks): RAG over catalog + policy. Citation-required. Ship. Phase 2 (2–4 weeks): If you're seeing voice drift on long-form, fine-tune a small model on your best-performing product copy. Phase 3 (ongoing): As inference cost matters, distill a smaller model from your prompt+RAG outputs.

Most e-commerce clients never reach Phase 3. That's fine.