Skip to content
4.9Advanced7 min

Graph RAG: When Relationships Matter More Than Similarity

Blck Alpaca·
Definition

Graph RAG is a retrieval-augmented generation approach that stores knowledge not (only) as vectors, but as a knowledge graph of entities and their relationships. Instead of purely semantic similarity, the system uses the graph structure to answer multi-hop questions and connect information across many documents.

Key Takeaways

  • Graph RAG extracts entities and relationships from documents and builds a knowledge graph from them, complementing or replacing vector RAG.
  • Microsoft's GraphRAG approach additionally uses community detection and pre-generated community summaries to answer even global overview questions across the entire corpus.
  • The added value emerges with multi-hop questions and relationship questions, where pure vector similarity (top-k) fails to establish the necessary connections.
  • Graph RAG is more expensive and more complex: LLM-assisted graph extraction incurs high one-off indexing costs and ongoing maintenance effort.
  • For most standard RAG use cases, vector or hybrid RAG with re-ranking remains the cost-rational choice; Graph RAG is a targeted add-on for relationship-intensive domains.
  • In DACH contexts, the same GDPR obligations apply as with classic RAG: entities in the graph must be treated as addressable, deletable records (as of 2026, informational, not legal advice).

Graph RAG is a retrieval-augmented generation approach that stores knowledge not (only) as vectors, but as a knowledge graph of entities and their relationships. While classic RAG answers a question via semantic similarity — it searches for the text passages closest to the query — Graph RAG follows the explicit connections between entities. This makes it strong wherever relationships and multi-step inferences (multi-hop) matter more than pure text similarity.

  • What it is: A RAG variant that uses a knowledge graph (entities plus relations) as the knowledge source — either on its own or in addition to the vector index.
  • When it helps: With multi-hop questions, relationship questions and global overview questions across an entire corpus.
  • What it costs: Higher indexing and maintenance costs than vector RAG, because graph extraction is LLM-assisted.

Why pure similarity reaches its limits

Classic RAG works in two stages: in the indexing path, documents are parsed, split into chunks, converted into vectors by an embedding model and written into a vector database. In the query path, the question is likewise embedded, the most similar chunks (top-k) are retrieved, optionally combined via hybrid search with BM25 and refined via a re-ranker, and finally everything lands in the generator's prompt. This pattern is robust and sufficient for most knowledge questions. Anthropic's Contextual Retrieval, for instance, shows that retrieval failures can be reduced by up to 67% through chunk-specific context headers and reranking (Anthropic, as of 09/2024).

The blind spot arises with two question types:

  • Multi-hop questions: "Which suppliers of our largest customer are themselves affected by sanctions?" requires a chain of inferences — customer -> suppliers -> sanction status — that is not contained in any single chunk. Top-k similarity here often delivers only fragments without establishing the connection.
  • Global overview questions: "What are the five central themes across all 2,000 support tickets?" cannot be answered by retrieving five to ten similar chunks. The answer requires aggregation across the entire corpus.

This is exactly where Graph RAG comes in: instead of finding isolated pieces of text, it navigates an explicit knowledge structure.

Building a knowledge graph instead of (or in addition to) a vector index

The core of Graph RAG is indexing. A graph is extracted from the source documents with the help of an LLM:

  1. Entity extraction: The LLM identifies entities (people, organisations, products, locations, concepts) and describes them.
  2. Relationship extraction: It recognises relationships between these entities ("supplies to", "is a subsidiary of", "causes") including a description and, where applicable, a weight.
  3. Graph construction: Entities become nodes, relationships become edges. Entities mentioned multiple times are merged.

Microsoft's open-source GraphRAG approach (Microsoft Research) extends this basic pattern with two key steps (as of 2026):

  • Community detection: Via graph clustering (e.g. the Leiden algorithm), closely connected nodes are grouped into hierarchical "communities" — thematic subgraphs.
  • Community summaries: For each community, an LLM generates a summary already at index time. These summaries are the key to global questions: for an overview question, not individual chunks but community summaries are used as building blocks of a map-reduce answer.

This results in two query modes: Local Search answers focused questions about a specific entity and its neighbourhood in the graph; Global Search answers corpus-wide thematic questions via the community summaries.

Pseudocode: indexing and querying

```text

Indexing (offline, LLM-intensive)

for document in corpus:
entities, relationships = LLM_extract(document)
graph.merge(entities, relationships)

communities = cluster(graph) # e.g. Leiden
for c in communities:
c.summary = LLM_summarise(c) # community summary

Query (online)

if question_is_global(question):
partial_answers = [LLM(question, c.summary) for c in relevant_communities]
answer = LLM_reduce(question, partial_answers) # Global Search
else:
subgraph = graph.traverse(start=entities_from(question), hops=2)
answer = LLM(question, subgraph + associated_chunks) # Local Search
```

Graph RAG vs vector RAG in direct comparison

The decision is not an either/or question, but a question of question type and budget. The following matrix summarises the differences.

Dimension

Vector / Hybrid RAG

Graph RAG

Retrieval principle

Semantic similarity (top-k), optionally BM25

Traversal of entities and relationships

Strength

Fact retrieval, broad coverage, faster roll-out

Multi-hop reasoning, relationship questions, global overviews

Weakness

Poor at linking facts across documents

Overkill for simple lookups

Indexing costs

Low–medium (embedding per chunk)

High (LLM extraction + community summaries)

Maintenance on updates

Incremental upserts, simple

Graph re-construction sometimes needed, more demanding

Latency

Medium (~100–800 ms hybrid + rerank)

Higher, especially with Global Search (map-reduce)

Maturity

Very mature, broad tooling landscape

Younger, in flux (as of 2026)

Source attribution

Native via chunk IDs

Via entities, relationships and source chunks

The latency and hybrid figures for classic RAG come from the Blck Alpaca research base (RAG architectures comparison matrix). The Graph RAG characteristics are based on established general knowledge about the GraphRAG project and about knowledge graph indices in frameworks such as LlamaIndex.

Costs and complexity calculated honestly

The biggest difference lies in indexing. With vector RAG, the most expensive one-off operation is the embedding per chunk — as an order of magnitude, the research base cites indexing costs of around 0.02–0.13 US dollars per one million tokens, with Anthropic prompt caching for Contextual Retrieval roughly 1.02 US dollars per million document tokens (as of 09/2024). Graph RAG additionally requires several LLM passes per document: entity extraction, relationship extraction and community summaries. This multiplies the indexing costs and the indexing duration.

A concrete calculation example

Suppose an agency indexes a knowledge corpus of 50,000 documents for a client:

  • Vector RAG: One-off embedding plus hybrid index. The main costs are the embedding API and vector DB hosting. Updates are made incrementally via stable doc IDs.
  • Graph RAG: Each document incurs several LLM calls for extraction, followed by LLM calls per community for the summaries. Even at medium corpus size, this results in a multiple of the indexing costs of vector RAG — plus additional operational effort for the graph schema, deduplication of entities and re-indexing on larger changes.

The rule of thumb: Graph RAG pays off when the added value of relationship-based answers justifies the additional effort — not as a default. The Blck Alpaca research generally warns against the anti-pattern of "RAG as a silver bullet"; this applies to Graph RAG in an intensified form, because here the total cost of ownership is considerably higher.

When Graph RAG, when not

Suitable when:

  • Answers need to connect information across many documents (supply chains, org structures, compliance networks, research and patent landscapes).
  • Explicit relationships between entities are central to the subject matter.
  • Global overview and thematic questions are to be answered across an entire corpus.

Rather not when:

  • The questions predominantly concern individual facts or narrowly defined passages — hybrid RAG with re-ranking is sufficient here.
  • The corpus is small or frequently rewritten in full (graph maintenance becomes the bottleneck).
  • The budget cannot support LLM-intensive indexing.

In practice, the hybrid architecture is the most viable: vector or hybrid retrieval delivers broad semantic coverage, while the knowledge graph is brought in specifically for relationship and multi-hop questions. This keeps Graph RAG a precise add-on rather than an expensive complete replacement.

DACH note: data protection applies in the graph too

For DACH companies, Graph RAG changes nothing about the GDPR principles. Entities and relationships involving personal data must — like embeddings and chunks in classic RAG — be treated as addressable, deletable records. Purpose limitation and data minimisation (Art. 5), a legal basis for processing (Art. 6) and the right to erasure (Art. 17) apply. Tenant separation, a roles/permissions concept and EU-region hosting remain mandatory. The German Data Protection Conference (Datenschutzkonferenz) addresses these requirements in its guidance on RAG (as of 2024/2025). This information is informational and does not constitute legal advice.

For agencies and B2B decision-makers

Graph RAG is not a hype replacement for vector RAG, but a specialised tool for relationship-intensive knowledge. For agencies, this means: start with hybrid RAG plus re-ranking as a solid standard base and introduce Graph RAG only where multi-hop or overview questions deliver demonstrable business value. For B2B decision-makers, the central question is not "Which technology is newer?", but "What questions do our users really ask — and do those answers require relationships or just similarity?". Blck Alpaca supports you with this distinction, calculates the TCO honestly and builds GDPR-compliant RAG architectures that grow with the actual need — from a lean vector index to a hybrid knowledge graph system.

FAQ

What is the difference between Graph RAG and classic vector RAG?
Vector RAG finds text passages via semantic similarity (embeddings, top-k) and passes them to the LLM. Graph RAG additionally builds a knowledge graph of entities and relationships and retrieves knowledge along these connections. This means it answers multi-hop and relationship questions better, but is more expensive and more complex to build.
What is GraphRAG by Microsoft?
GraphRAG is an open-source approach published by Microsoft Research that uses an LLM to extract entities and relationships from documents, builds a knowledge graph from them, forms thematic clusters via community detection, and pre-generates summaries for these clusters. These enable both local detail questions and global overview questions across the entire corpus (as of 2026).
When is Graph RAG worthwhile compared to vector RAG?
Graph RAG is worthwhile when answers need to connect information across multiple documents (multi-hop), when explicit relationships between entities are central (e.g. supply chains, organisational charts, compliance networks), or when global overview questions are asked across an entire corpus. For narrowly defined fact retrieval, hybrid RAG with re-ranking is usually sufficient.
Is Graph RAG more expensive than vector RAG?
Yes. Indexing is significantly more expensive, because each document requires LLM calls for entity and relationship extraction as well as for generating community summaries. Ongoing maintenance (graph updates when new documents are added) and the operational know-how are also more demanding than with a pure vector pipeline.
Can Graph RAG and vector RAG be combined?
Yes, this is the most common practical deployment. Hybrid architectures use vector or hybrid retrieval for broad semantic coverage and the knowledge graph specifically for relationship and multi-hop questions. Frameworks such as LlamaIndex offer knowledge graph indices that can be combined with classic RAG.
How does Graph RAG relate to GDPR and DACH requirements?
The same principles apply as with any RAG system: entities and relationships involving personal data must be treated as addressable records that are deletable (Art. 17), purpose-bound and tenant-separated. EU-region hosting and a roles/permissions concept are recommended. This is an informational note, not legal advice (as of 2026).

Want to go deeper?

Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.