RAG in Production: What Breaks First
Dec 2, 2023 • 6 min read
RAG systems are powerful, but they degrade quickly without careful instrumentation and retrieval hygiene.
Failure mode 1: stale embeddings
When source documents change and embeddings do not, answer quality drops silently.
Failure mode 2: irrelevant top-k
Similarity alone can pull in semantically adjacent but policy-inaccurate content. Add reranking and metadata filters.
Failure mode 3: cost creep
Context windows grow over time. Enforce token budgets and track spend per workflow and per customer segment.