Corrective RAG and Self-RAG: Self-Correcting Retrieval Patterns for Fewer Hallucinations
Corrective RAG (CRAG) and Self-RAG are self-correcting retrieval patterns. CRAG assesses the relevance of retrieved results and switches to a web-search fallback when quality is poor. Self-RAG lets the model decide for itself, via reflection tokens, whether to retrieve at all and whether its own answer is supported by the sources.
Key Takeaways
- ✓CRAG adds a retrieval evaluator to the RAG pipeline: results are rated as correct, ambiguous or incorrect; when quality is poor, a web-search fallback steps in instead of blindly generating on weak context.
- ✓Self-RAG (Asai et al. 2023, arXiv:2310.11511) trains the model on reflection tokens that let it decide for itself whether retrieval is needed and check its own answer against the sources (faithfulness self-critique).
- ✓Both patterns address the core problem of context-unfaithful generation: hallucinations despite RAG, where the model cites sources but deviates from them in substance (anti-pattern AP9 of the research).
- ✓They are special cases of agentic RAG (Singh et al. 2025): retrieval becomes a reflectively and iteratively invoked tool rather than static preprocessing.
- ✓The trade-off is real: additional LLM calls for assessment and self-critique increase latency and cost and carry the risk of uncontrolled tool loops; worthwhile above all when hallucination risk is high and knowledge bases are incomplete.
- ✓A pragmatic starting point for most DACH projects: first exhaust hybrid search, re-ranking and faithfulness guardrails (RAGAS), then add CRAG/Self-RAG selectively.
Corrective RAG (CRAG) and Self-RAG are two self-correcting retrieval patterns that extend the classic RAG approach with a layer of quality control. CRAG assesses the relevance of the retrieved results and switches to a web-search fallback when retrieval quality is poor. Self-RAG lets the language model decide for itself, via reflection tokens, whether to retrieve at all, and then checks whether its own answer is supported by the sources. Both target the same problem: hallucinations that arise despite RAG.
- CRAG in one sentence: A retrieval evaluator classifies results as correct, ambiguous or incorrect and triggers a corrective measure when context is poor, instead of blindly generating on.
- Self-RAG in one sentence: Via trained reflection tokens, the model itself controls when it retrieves and whether its answer fits the sources.
- Shared core: Both belong to the family of agentic RAG patterns and treat retrieval not as static preprocessing but as a checkable, controllable step.
Why classic RAG is not enough
Standard RAG (Naive RAG, per the taxonomy of Gao et al. 2023, arXiv:2312.10997) follows a fixed sequence: embed, retrieve, insert into the prompt, generate. The problem: the pipeline blindly trusts the top-k results. If they are semantically similar but substantively irrelevant, or simply not present at all, the model still generates an answer and bases it on unsuitable sources.
The research for this pillar identifies two relevant anti-patterns. In anti-pattern AP9, context-unfaithful generation, the model cites sources but deviates from them in substance. The recommended countermeasures are faithfulness guardrails, self-critique and citation-forcing prompts. In anti-pattern AP3, re-ranking is missing, so that top-k results are similar but not relevant. It is precisely this gap that CRAG and Self-RAG address: they make retrieval quality explicitly checkable and respond to it.
An important point of context: the biggest lever against hallucinations still lies in retrieval quality itself. Anthropic Contextual Retrieval reduces the top-20 retrieval error rate by 49 per cent (from 5.7 to 2.9 per cent), and in combination with reranking by 67 per cent (to 1.9 per cent) — these are vendor benchmarks from Anthropic, as of September 2024. Self-correction is the second line of defence, not the first.
Corrective RAG (CRAG): assess results, then correct
CRAG adds a retrieval evaluator to the pipeline. After retrieval but before generation, this evaluator assesses the relevance of the documents found and sorts them into three confidence classes:
- Correct (high confidence): The results are relevant. They are refined (for example via decompose-recompose, to remove noise) and passed to the generator.
- Incorrect (low confidence): The results are unsuitable. Instead of generating on poor context, a fallback is triggered, typically a web search, to load current external content.
- Ambiguous (medium confidence): Uncertain. Both routes are combined: internal context plus web-search results.
The decisive mechanism is the web-search fallback when internal retrieval quality is weak. With it, CRAG compensates for an incomplete or outdated knowledge base without the model lapsing into hallucination. CRAG is model-agnostic: it can be inserted as an additional node ahead of any existing RAG pipeline without retraining the generator LLM. (CRAG goes back to Yan et al. 2024; the pattern is established general knowledge and is not named in detail in the pillar dossier itself.)
Self-RAG: the model critiques itself
Self-RAG (Asai et al., University of Washington, arXiv:2310.11511, October 2023) shifts control into the model. It is trained to produce so-called reflection tokens, with which it steers and assesses its own process. In simplified terms, three decisions take place:
- Retrieve? The model decides per request (or per segment) whether retrieval is needed at all. Trivial questions, or ones covered by the model's knowledge, require no retrieval.
- Is Supported? After retrieval, the model checks whether the generated statement is supported by the retrieved passage — a built-in faithfulness self-critique.
- Is Useful? The model assesses how usefully the answer addresses the question.
In the pillar dossier, Self-RAG is listed as an early agentic RAG pattern with self-reflection tokens (source Q5). The appeal: the model only retrieves when it deems it necessary itself (saving costs on simple questions), and it aborts or refuses answers when the sources do not support them. The price: the original Self-RAG implementation presupposes a model specifically trained on reflection tokens. In practice, many teams approximate the behaviour via prompting and a self-critique loop with a standard model — cheaper to implement, but less reliable than a model trained for the purpose.
Comparison: CRAG vs. Self-RAG vs. standard RAG
Dimension | Standard RAG (Naive) | Corrective RAG (CRAG) | Self-RAG |
|---|---|---|---|
Control instance | none | external retrieval evaluator | the model itself (reflection tokens) |
Retrieval decision | always | always, then assessed | model-controlled (on demand) |
Response to poor context | none, generates anyway | web-search fallback / refinement | self-critique, refusal if needed |
Faithfulness check | external (e.g. RAGAS) required | via evaluator | built in (Is-Supported token) |
Model training required | no | no | yes (original variant) |
Implementation effort | low | medium | high (or medium as a prompt approximation) |
Latency / cost per request | baseline |
|
|
Typical failure mode | chunk mismatch, AP9 | misjudgement by the evaluator | uncontrolled loops, cost |
Both patterns belong to the agentic RAG stage of the RAG evolution taxonomy (Naive, Advanced, Modular, Agentic). Singh et al. 2025 (arXiv:2501.09136) describe agentic RAG as patterns in which retrieval functions as a dynamically and iteratively invoked tool of an agent, with reflection, planning and tool use. Self-RAG is regarded here as one of the early examples. Honesty requires acknowledging, as part of this context, that the term agentic RAG is not yet sharply consolidated in 2026.
Concrete example: a support knowledge base with gaps
Suppose an agency operates a RAG-supported support assistant over an internal knowledge base for a B2B client. A user asks about a feature that was published only yesterday in a release note that has not yet been indexed.
- Standard RAG: Fetches the three most similar legacy documents, fails to find the answer, and hallucinates a plausible-sounding but incorrect configuration. Classic AP9.
- CRAG: The retrieval evaluator classifies the results as incorrect (low relevance confidence), triggers a web search on the public product documentation, finds the release note and generates a correct answer with a source.
- Self-RAG: Via the Is-Supported token, the model checks whether its answer is supported by the retrieved chunks, recognises the lack of support, and refuses an invented answer or escalates to a human.
A calculation on the trade-off: if a standard RAG call costs, by way of example, one LLM call, CRAG adds at least one evaluator call plus, in the fallback case, the web-search processing; a self-critique loop in the Self-RAG style can entail further calls depending on the design. On simple questions, Self-RAG can even save the retrieval step. As a general rule: self-correction is not a free upgrade but a deliberate trade of additional latency and cost against lower hallucination risk. The anti-pattern of uncontrolled tool loops and cost explosion is explicitly named in the research as a typical failure of agentic RAG.
When these patterns are sensible
Self-correcting patterns are worthwhile when at least one of these points applies:
- The knowledge base is incomplete or outdated, and a web or third-party source fallback adds genuine value (CRAG).
- Incorrect answers are expensive: law, finance, regulated support, anything with a liability dimension.
- Refusing to answer under uncertainty is desired and accepted (Self-RAG).
They are less sensible when the knowledge base is complete and cleanly indexed, when latency is the priority, or when the budget per request is tight. In these cases, the cheaper measures from the Advanced RAG stage usually deliver the better ratio: hybrid search (dense plus BM25 against missing exact codes), cross-encoder re-ranking (up to 67 per cent fewer retrieval errors according to Anthropic), Contextual Retrieval against lost-in-the-chunks (AP4), as well as a faithfulness guardrail via RAGAS in the CI pipeline. Only after that is it worthwhile to add CRAG or Self-RAG selectively for the critical paths.
For agencies and B2B decision-makers
For agencies, the message to clients is clear: self-correcting retrieval patterns are not a standard feature to be activated across the board, but a targeted investment for use cases with high hallucination risk. The pragmatic route runs through a measurable baseline — capturing retrieval quality and faithfulness with RAGAS — and adds CRAG or Self-RAG only where the numbers justify it. For DACH B2B projects, an additional point applies: a web-search fallback (CRAG) must be aligned with data-protection and source governance, since external content suddenly enters the answer chain. Blck Alpaca designs RAG architectures so that self-correction sits where it makes the difference — and does not inflate cost and latency everywhere.
FAQ
What is the difference between Corrective RAG and Self-RAG?
Do self-correcting RAG patterns really reduce hallucinations?
How much implementation effort do CRAG and Self-RAG require?
When are these patterns worthwhile and when not?
Are CRAG and Self-RAG the same as agentic RAG?
Want to go deeper?
Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.