Graph RAG: When Relationships Matter More Than Similarity
Graph RAG is a retrieval-augmented generation approach that stores knowledge not (only) as vectors, but as a knowledge graph of entities and their relationships. Instead of purely semantic similarity, the system uses the graph structure to answer multi-hop questions and connect information across many documents.
Key Takeaways
- ✓Graph RAG extracts entities and relationships from documents and builds a knowledge graph from them, complementing or replacing vector RAG.
- ✓Microsoft's GraphRAG approach additionally uses community detection and pre-generated community summaries to answer even global overview questions across the entire corpus.
- ✓The added value emerges with multi-hop questions and relationship questions, where pure vector similarity (top-k) fails to establish the necessary connections.
- ✓Graph RAG is more expensive and more complex: LLM-assisted graph extraction incurs high one-off indexing costs and ongoing maintenance effort.
- ✓For most standard RAG use cases, vector or hybrid RAG with re-ranking remains the cost-rational choice; Graph RAG is a targeted add-on for relationship-intensive domains.
- ✓In DACH contexts, the same GDPR obligations apply as with classic RAG: entities in the graph must be treated as addressable, deletable records (as of 2026, informational, not legal advice).
Graph RAG is a retrieval-augmented generation approach that stores knowledge not (only) as vectors, but as a knowledge graph of entities and their relationships. While classic RAG answers a question via semantic similarity — it searches for the text passages closest to the query — Graph RAG follows the explicit connections between entities. This makes it strong wherever relationships and multi-step inferences (multi-hop) matter more than pure text similarity.
- What it is: A RAG variant that uses a knowledge graph (entities plus relations) as the knowledge source — either on its own or in addition to the vector index.
- When it helps: With multi-hop questions, relationship questions and global overview questions across an entire corpus.
- What it costs: Higher indexing and maintenance costs than vector RAG, because graph extraction is LLM-assisted.
Why pure similarity reaches its limits
Classic RAG works in two stages: in the indexing path, documents are parsed, split into chunks, converted into vectors by an embedding model and written into a vector database. In the query path, the question is likewise embedded, the most similar chunks (top-k) are retrieved, optionally combined via hybrid search with BM25 and refined via a re-ranker, and finally everything lands in the generator's prompt. This pattern is robust and sufficient for most knowledge questions. Anthropic's Contextual Retrieval, for instance, shows that retrieval failures can be reduced by up to 67% through chunk-specific context headers and reranking (Anthropic, as of 09/2024).
The blind spot arises with two question types:
- Multi-hop questions: "Which suppliers of our largest customer are themselves affected by sanctions?" requires a chain of inferences — customer -> suppliers -> sanction status — that is not contained in any single chunk. Top-k similarity here often delivers only fragments without establishing the connection.
- Global overview questions: "What are the five central themes across all 2,000 support tickets?" cannot be answered by retrieving five to ten similar chunks. The answer requires aggregation across the entire corpus.
This is exactly where Graph RAG comes in: instead of finding isolated pieces of text, it navigates an explicit knowledge structure.
Building a knowledge graph instead of (or in addition to) a vector index
The core of Graph RAG is indexing. A graph is extracted from the source documents with the help of an LLM:
- Entity extraction: The LLM identifies entities (people, organisations, products, locations, concepts) and describes them.
- Relationship extraction: It recognises relationships between these entities ("supplies to", "is a subsidiary of", "causes") including a description and, where applicable, a weight.
- Graph construction: Entities become nodes, relationships become edges. Entities mentioned multiple times are merged.
Microsoft's open-source GraphRAG approach (Microsoft Research) extends this basic pattern with two key steps (as of 2026):
- Community detection: Via graph clustering (e.g. the Leiden algorithm), closely connected nodes are grouped into hierarchical "communities" — thematic subgraphs.
- Community summaries: For each community, an LLM generates a summary already at index time. These summaries are the key to global questions: for an overview question, not individual chunks but community summaries are used as building blocks of a map-reduce answer.
This results in two query modes: Local Search answers focused questions about a specific entity and its neighbourhood in the graph; Global Search answers corpus-wide thematic questions via the community summaries.
Pseudocode: indexing and querying
```text
Indexing (offline, LLM-intensive)
for document in corpus:
entities, relationships = LLM_extract(document)
graph.merge(entities, relationships)
communities = cluster(graph) # e.g. Leiden
for c in communities:
c.summary = LLM_summarise(c) # community summary
Query (online)
if question_is_global(question):
partial_answers = [LLM(question, c.summary) for c in relevant_communities]
answer = LLM_reduce(question, partial_answers) # Global Search
else:
subgraph = graph.traverse(start=entities_from(question), hops=2)
answer = LLM(question, subgraph + associated_chunks) # Local Search
```
Graph RAG vs vector RAG in direct comparison
The decision is not an either/or question, but a question of question type and budget. The following matrix summarises the differences.
Dimension | Vector / Hybrid RAG | Graph RAG |
|---|---|---|
Retrieval principle | Semantic similarity (top-k), optionally BM25 | Traversal of entities and relationships |
Strength | Fact retrieval, broad coverage, faster roll-out | Multi-hop reasoning, relationship questions, global overviews |
Weakness | Poor at linking facts across documents | Overkill for simple lookups |
Indexing costs | Low–medium (embedding per chunk) | High (LLM extraction + community summaries) |
Maintenance on updates | Incremental upserts, simple | Graph re-construction sometimes needed, more demanding |
Latency | Medium (~100–800 ms hybrid + rerank) | Higher, especially with Global Search (map-reduce) |
Maturity | Very mature, broad tooling landscape | Younger, in flux (as of 2026) |
Source attribution | Native via chunk IDs | Via entities, relationships and source chunks |
The latency and hybrid figures for classic RAG come from the Blck Alpaca research base (RAG architectures comparison matrix). The Graph RAG characteristics are based on established general knowledge about the GraphRAG project and about knowledge graph indices in frameworks such as LlamaIndex.
Costs and complexity calculated honestly
The biggest difference lies in indexing. With vector RAG, the most expensive one-off operation is the embedding per chunk — as an order of magnitude, the research base cites indexing costs of around 0.02–0.13 US dollars per one million tokens, with Anthropic prompt caching for Contextual Retrieval roughly 1.02 US dollars per million document tokens (as of 09/2024). Graph RAG additionally requires several LLM passes per document: entity extraction, relationship extraction and community summaries. This multiplies the indexing costs and the indexing duration.
A concrete calculation example
Suppose an agency indexes a knowledge corpus of 50,000 documents for a client:
- Vector RAG: One-off embedding plus hybrid index. The main costs are the embedding API and vector DB hosting. Updates are made incrementally via stable doc IDs.
- Graph RAG: Each document incurs several LLM calls for extraction, followed by LLM calls per community for the summaries. Even at medium corpus size, this results in a multiple of the indexing costs of vector RAG — plus additional operational effort for the graph schema, deduplication of entities and re-indexing on larger changes.
The rule of thumb: Graph RAG pays off when the added value of relationship-based answers justifies the additional effort — not as a default. The Blck Alpaca research generally warns against the anti-pattern of "RAG as a silver bullet"; this applies to Graph RAG in an intensified form, because here the total cost of ownership is considerably higher.
When Graph RAG, when not
Suitable when:
- Answers need to connect information across many documents (supply chains, org structures, compliance networks, research and patent landscapes).
- Explicit relationships between entities are central to the subject matter.
- Global overview and thematic questions are to be answered across an entire corpus.
Rather not when:
- The questions predominantly concern individual facts or narrowly defined passages — hybrid RAG with re-ranking is sufficient here.
- The corpus is small or frequently rewritten in full (graph maintenance becomes the bottleneck).
- The budget cannot support LLM-intensive indexing.
In practice, the hybrid architecture is the most viable: vector or hybrid retrieval delivers broad semantic coverage, while the knowledge graph is brought in specifically for relationship and multi-hop questions. This keeps Graph RAG a precise add-on rather than an expensive complete replacement.
DACH note: data protection applies in the graph too
For DACH companies, Graph RAG changes nothing about the GDPR principles. Entities and relationships involving personal data must — like embeddings and chunks in classic RAG — be treated as addressable, deletable records. Purpose limitation and data minimisation (Art. 5), a legal basis for processing (Art. 6) and the right to erasure (Art. 17) apply. Tenant separation, a roles/permissions concept and EU-region hosting remain mandatory. The German Data Protection Conference (Datenschutzkonferenz) addresses these requirements in its guidance on RAG (as of 2024/2025). This information is informational and does not constitute legal advice.
For agencies and B2B decision-makers
Graph RAG is not a hype replacement for vector RAG, but a specialised tool for relationship-intensive knowledge. For agencies, this means: start with hybrid RAG plus re-ranking as a solid standard base and introduce Graph RAG only where multi-hop or overview questions deliver demonstrable business value. For B2B decision-makers, the central question is not "Which technology is newer?", but "What questions do our users really ask — and do those answers require relationships or just similarity?". Blck Alpaca supports you with this distinction, calculates the TCO honestly and builds GDPR-compliant RAG architectures that grow with the actual need — from a lean vector index to a hybrid knowledge graph system.
FAQ
What is the difference between Graph RAG and classic vector RAG?
What is GraphRAG by Microsoft?
When is Graph RAG worthwhile compared to vector RAG?
Is Graph RAG more expensive than vector RAG?
Can Graph RAG and vector RAG be combined?
How does Graph RAG relate to GDPR and DACH requirements?
Want to go deeper?
Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.