Skip to content
4.11Advanced7 min

Corrective RAG and Self-RAG: Self-Correcting Retrieval Patterns for Fewer Hallucinations

Blck Alpaca·
Definition

Corrective RAG (CRAG) and Self-RAG are self-correcting retrieval patterns. CRAG assesses the relevance of retrieved results and switches to a web-search fallback when quality is poor. Self-RAG lets the model decide for itself, via reflection tokens, whether to retrieve at all and whether its own answer is supported by the sources.

Key Takeaways

  • CRAG adds a retrieval evaluator to the RAG pipeline: results are rated as correct, ambiguous or incorrect; when quality is poor, a web-search fallback steps in instead of blindly generating on weak context.
  • Self-RAG (Asai et al. 2023, arXiv:2310.11511) trains the model on reflection tokens that let it decide for itself whether retrieval is needed and check its own answer against the sources (faithfulness self-critique).
  • Both patterns address the core problem of context-unfaithful generation: hallucinations despite RAG, where the model cites sources but deviates from them in substance (anti-pattern AP9 of the research).
  • They are special cases of agentic RAG (Singh et al. 2025): retrieval becomes a reflectively and iteratively invoked tool rather than static preprocessing.
  • The trade-off is real: additional LLM calls for assessment and self-critique increase latency and cost and carry the risk of uncontrolled tool loops; worthwhile above all when hallucination risk is high and knowledge bases are incomplete.
  • A pragmatic starting point for most DACH projects: first exhaust hybrid search, re-ranking and faithfulness guardrails (RAGAS), then add CRAG/Self-RAG selectively.

Corrective RAG (CRAG) and Self-RAG are two self-correcting retrieval patterns that extend the classic RAG approach with a layer of quality control. CRAG assesses the relevance of the retrieved results and switches to a web-search fallback when retrieval quality is poor. Self-RAG lets the language model decide for itself, via reflection tokens, whether to retrieve at all, and then checks whether its own answer is supported by the sources. Both target the same problem: hallucinations that arise despite RAG.

  • CRAG in one sentence: A retrieval evaluator classifies results as correct, ambiguous or incorrect and triggers a corrective measure when context is poor, instead of blindly generating on.
  • Self-RAG in one sentence: Via trained reflection tokens, the model itself controls when it retrieves and whether its answer fits the sources.
  • Shared core: Both belong to the family of agentic RAG patterns and treat retrieval not as static preprocessing but as a checkable, controllable step.

Why classic RAG is not enough

Standard RAG (Naive RAG, per the taxonomy of Gao et al. 2023, arXiv:2312.10997) follows a fixed sequence: embed, retrieve, insert into the prompt, generate. The problem: the pipeline blindly trusts the top-k results. If they are semantically similar but substantively irrelevant, or simply not present at all, the model still generates an answer and bases it on unsuitable sources.

The research for this pillar identifies two relevant anti-patterns. In anti-pattern AP9, context-unfaithful generation, the model cites sources but deviates from them in substance. The recommended countermeasures are faithfulness guardrails, self-critique and citation-forcing prompts. In anti-pattern AP3, re-ranking is missing, so that top-k results are similar but not relevant. It is precisely this gap that CRAG and Self-RAG address: they make retrieval quality explicitly checkable and respond to it.

An important point of context: the biggest lever against hallucinations still lies in retrieval quality itself. Anthropic Contextual Retrieval reduces the top-20 retrieval error rate by 49 per cent (from 5.7 to 2.9 per cent), and in combination with reranking by 67 per cent (to 1.9 per cent) — these are vendor benchmarks from Anthropic, as of September 2024. Self-correction is the second line of defence, not the first.

Corrective RAG (CRAG): assess results, then correct

CRAG adds a retrieval evaluator to the pipeline. After retrieval but before generation, this evaluator assesses the relevance of the documents found and sorts them into three confidence classes:

  • Correct (high confidence): The results are relevant. They are refined (for example via decompose-recompose, to remove noise) and passed to the generator.
  • Incorrect (low confidence): The results are unsuitable. Instead of generating on poor context, a fallback is triggered, typically a web search, to load current external content.
  • Ambiguous (medium confidence): Uncertain. Both routes are combined: internal context plus web-search results.

The decisive mechanism is the web-search fallback when internal retrieval quality is weak. With it, CRAG compensates for an incomplete or outdated knowledge base without the model lapsing into hallucination. CRAG is model-agnostic: it can be inserted as an additional node ahead of any existing RAG pipeline without retraining the generator LLM. (CRAG goes back to Yan et al. 2024; the pattern is established general knowledge and is not named in detail in the pillar dossier itself.)

Self-RAG: the model critiques itself

Self-RAG (Asai et al., University of Washington, arXiv:2310.11511, October 2023) shifts control into the model. It is trained to produce so-called reflection tokens, with which it steers and assesses its own process. In simplified terms, three decisions take place:

  • Retrieve? The model decides per request (or per segment) whether retrieval is needed at all. Trivial questions, or ones covered by the model's knowledge, require no retrieval.
  • Is Supported? After retrieval, the model checks whether the generated statement is supported by the retrieved passage — a built-in faithfulness self-critique.
  • Is Useful? The model assesses how usefully the answer addresses the question.

In the pillar dossier, Self-RAG is listed as an early agentic RAG pattern with self-reflection tokens (source Q5). The appeal: the model only retrieves when it deems it necessary itself (saving costs on simple questions), and it aborts or refuses answers when the sources do not support them. The price: the original Self-RAG implementation presupposes a model specifically trained on reflection tokens. In practice, many teams approximate the behaviour via prompting and a self-critique loop with a standard model — cheaper to implement, but less reliable than a model trained for the purpose.

Comparison: CRAG vs. Self-RAG vs. standard RAG

Dimension

Standard RAG (Naive)

Corrective RAG (CRAG)

Self-RAG

Control instance

none

external retrieval evaluator

the model itself (reflection tokens)

Retrieval decision

always

always, then assessed

model-controlled (on demand)

Response to poor context

none, generates anyway

web-search fallback / refinement

self-critique, refusal if needed

Faithfulness check

external (e.g. RAGAS) required

via evaluator

built in (Is-Supported token)

Model training required

no

no

yes (original variant)

Implementation effort

low

medium

high (or medium as a prompt approximation)

Latency / cost per request

baseline

  • evaluator, possibly web search
  • reflection / critique steps

Typical failure mode

chunk mismatch, AP9

misjudgement by the evaluator

uncontrolled loops, cost

Both patterns belong to the agentic RAG stage of the RAG evolution taxonomy (Naive, Advanced, Modular, Agentic). Singh et al. 2025 (arXiv:2501.09136) describe agentic RAG as patterns in which retrieval functions as a dynamically and iteratively invoked tool of an agent, with reflection, planning and tool use. Self-RAG is regarded here as one of the early examples. Honesty requires acknowledging, as part of this context, that the term agentic RAG is not yet sharply consolidated in 2026.

Concrete example: a support knowledge base with gaps

Suppose an agency operates a RAG-supported support assistant over an internal knowledge base for a B2B client. A user asks about a feature that was published only yesterday in a release note that has not yet been indexed.

  • Standard RAG: Fetches the three most similar legacy documents, fails to find the answer, and hallucinates a plausible-sounding but incorrect configuration. Classic AP9.
  • CRAG: The retrieval evaluator classifies the results as incorrect (low relevance confidence), triggers a web search on the public product documentation, finds the release note and generates a correct answer with a source.
  • Self-RAG: Via the Is-Supported token, the model checks whether its answer is supported by the retrieved chunks, recognises the lack of support, and refuses an invented answer or escalates to a human.

A calculation on the trade-off: if a standard RAG call costs, by way of example, one LLM call, CRAG adds at least one evaluator call plus, in the fallback case, the web-search processing; a self-critique loop in the Self-RAG style can entail further calls depending on the design. On simple questions, Self-RAG can even save the retrieval step. As a general rule: self-correction is not a free upgrade but a deliberate trade of additional latency and cost against lower hallucination risk. The anti-pattern of uncontrolled tool loops and cost explosion is explicitly named in the research as a typical failure of agentic RAG.

When these patterns are sensible

Self-correcting patterns are worthwhile when at least one of these points applies:

  • The knowledge base is incomplete or outdated, and a web or third-party source fallback adds genuine value (CRAG).
  • Incorrect answers are expensive: law, finance, regulated support, anything with a liability dimension.
  • Refusing to answer under uncertainty is desired and accepted (Self-RAG).

They are less sensible when the knowledge base is complete and cleanly indexed, when latency is the priority, or when the budget per request is tight. In these cases, the cheaper measures from the Advanced RAG stage usually deliver the better ratio: hybrid search (dense plus BM25 against missing exact codes), cross-encoder re-ranking (up to 67 per cent fewer retrieval errors according to Anthropic), Contextual Retrieval against lost-in-the-chunks (AP4), as well as a faithfulness guardrail via RAGAS in the CI pipeline. Only after that is it worthwhile to add CRAG or Self-RAG selectively for the critical paths.

For agencies and B2B decision-makers

For agencies, the message to clients is clear: self-correcting retrieval patterns are not a standard feature to be activated across the board, but a targeted investment for use cases with high hallucination risk. The pragmatic route runs through a measurable baseline — capturing retrieval quality and faithfulness with RAGAS — and adds CRAG or Self-RAG only where the numbers justify it. For DACH B2B projects, an additional point applies: a web-search fallback (CRAG) must be aligned with data-protection and source governance, since external content suddenly enters the answer chain. Blck Alpaca designs RAG architectures so that self-correction sits where it makes the difference — and does not inflate cost and latency everywhere.

FAQ

What is the difference between Corrective RAG and Self-RAG?
CRAG places a separate retrieval evaluator before generation: it assesses the relevance of the retrieved results and triggers a web-search fallback or knowledge refinement when quality is poor. Self-RAG shifts control into the model itself: via trained reflection tokens, the LLM decides whether to retrieve at all and then assesses whether its own answer is supported by the sources. CRAG therefore corrects the retrieval step, whereas Self-RAG corrects the entire decision and generation process.
Do self-correcting RAG patterns really reduce hallucinations?
They specifically address the anti-pattern of context-unfaithful generation, i.e. hallucinations despite RAG. By detecting poor context (CRAG) or checking the model's own answer against the sources (Self-RAG), the risk that the model generates freely on irrelevant results is reduced. The biggest lever, however, remains clean retrieval quality: hybrid search and re-ranking reduce retrieval errors by up to 67 per cent according to Anthropic Contextual Retrieval, before self-correction even has to kick in.
How much implementation effort do CRAG and Self-RAG require?
CRAG can be implemented with moderate effort via existing frameworks (LangGraph, LlamaIndex, Haystack, as of 2026) as an additional evaluator node in the pipeline and requires no model adaptation. Self-RAG in its original form (Asai et al. 2023) presupposes a model trained on reflection tokens, which is considerably more involved; practice-oriented variants approximate the behaviour via prompting and a self-critique loop with a standard model.
When are these patterns worthwhile and when not?
They are worthwhile when hallucination risk is high, knowledge bases are incomplete or outdated, and in use cases where an incorrectly answered query is expensive (law, finance, support with liability). They are not sensible when the knowledge base is complete and well indexed, latency is critical or the budget per request is tight: every additional assessment and self-critique step costs further LLM calls.
Are CRAG and Self-RAG the same as agentic RAG?
They are special cases of it. Agentic RAG (Singh et al. 2025, arXiv:2501.09136) describes, in general terms, patterns in which retrieval is treated as a dynamically and iteratively invoked tool of an agent, with reflection, planning and tool use. Self-RAG is regarded as an early agentic RAG pattern; CRAG adds reflection on retrieval quality. The term agentic RAG, however, is not yet sharply consolidated in 2026.

Want to go deeper?

Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.