Embedding Models 2026 Compared: text-embedding-3, Cohere, BGE-M3, Voyage & Jina
An embedding model comparison evaluates models such as OpenAI text-embedding-3, Cohere Embed v4, BGE-M3, Voyage and Jina by dimensions, context length, MTEB/MMTEB benchmarks, multilinguality, cost, self-hosting and licence. For German-language RAG systems, what counts is not the English MTEB rank but the demonstrated quality on MMTEB, MIRACL and MTEB-DE.
Key Takeaways
- ✓The English MTEB rank is not a German rank: on MMTEB, MIRACL-de and MTEB-DE, English-optimised models often lose 5-15 nDCG@10 points on compound words, technical language and long-word-heavy retrieval.
- ✓For German-language RAG systems, Cohere Embed v4 (best proprietary German), BGE-M3 (best open-source multilingual, MIT) and Jina v3/v4 (Berlin, Apache 2.0) are the leaders for 2026.
- ✓Self-hosting determines sovereignty: BGE-M3, Jina v3/v4 and mxbai are downloadable and therefore GDPR/DACH-compliant; OpenAI and Voyage remain API-only under US jurisdiction.
- ✓Matryoshka models (text-embedding-3, Cohere v4, BGE, Voyage 4) allow dimension truncation: going from 3,072 to 512-1,024 dims costs only around 1-3% retrieval quality but saves 3-6x storage and latency.
- ✓A cross-encoder reranker is the single most effective measure: plus 5-15 points recall@5; embeddings + BM25 + reranker reduce failed retrievals by up to 67% according to Anthropic.
- ✓Embeddings derived from personal data are considered personal data - for regulated DACH workloads, only sovereignly hostable models (BGE-M3, Jina, mxbai, Cohere on STACKIT) are permissible.
An embedding model converts text into numerical vectors that make semantic similarity measurable - the foundation of every RAG system. A well-founded embedding model comparison evaluates the candidates by dimensions, context length, MTEB/MMTEB benchmarks, multilinguality, cost, self-hosting capability and licence. For the DACH region, the rule is: the English MTEB rank is not a German rank. What matters is MMTEB, MIRACL and MTEB-DE.
- Best German (proprietary): Cohere Embed v4 - multilingual leader, sovereignly hostable on STACKIT (as of 2026).
- Best open-source multilingual: BGE-M3 (MIT) - SOTA on MIRACL, combining dense, sparse and multi-vector in a single model.
- DACH-native & multimodal: Jina v3/v4 (Berlin, Apache 2.0) - downloadable, and v4 also processes visual documents.
Why the English MTEB rank is misleading
MTEB v1 is dominated by English tasks. The correct DACH references are MMTEB (the Massive Multilingual Text Embedding Benchmark with over 1,000 tasks and 250+ languages), MIRACL (18-language monolingual retrieval) and German subsets such as MTEB-DE, GermanQuAD-Retrieval and MIRACL-de. These benchmarks regularly reorder the leaderboard: models that shine in English often lose 5-15 nDCG@10 points in German - on compound words, technical language, legal and medical terminology, as well as tokenisation-heavy long-word retrieval.
For German B2B corpora, a further complication is that named entities, product codes, paragraph numbers, ICD codes, SAP material numbers, IBANs and case reference numbers are precisely the tokens where pure dense retrieval models struggle. The choice of model is therefore always a decision about German language quality - not about the global leaderboard.
The most important embedding models of 2026 compared
The following table summarises the key selection criteria. All figures are drawn from the internal research basis, as of 2026.
Model | Dimensions | Context | Licence / Hosting | German signal | Matryoshka | Multimodal | Sovereignty |
|---|---|---|---|---|---|---|---|
OpenAI text-embedding-3-large | 3,072 (truncatable) | 8,192 | API + Azure OpenAI (EU regions) | solid EN, weaker on DE technical tasks | yes | no | US jurisdiction, no on-prem |
OpenAI text-embedding-3-small | 1,536 (truncatable) | 8,192 | as above | similar pattern | yes | no | US jurisdiction |
Cohere Embed v4 | 256-1,536 | 128k | API, Bedrock EU, Azure EU, STACKIT | top-tier German, MTEB v2 ~65 | yes | yes | sovereign on STACKIT |
BGE-M3 (BAAI) | 1,024 + sparse + multi-vec | 8,192 | MIT, self-host | top-tier multilingual, SOTA MIRACL | no | fully sovereign | |
Jina Embeddings v3 (Berlin) | 1,024 (task LoRA) | 8,192 | Apache 2.0, self-host | strong, beats OpenAI/Cohere on MTEB at 570M params | yes | no | DACH-native |
Jina Embeddings v4 | up to 2,048 / multi-vec | 32k | Apache 2.0, self-host | MMTEB 66.49; ViDoRe 90.17 | yes | yes | DACH-native |
jina-embeddings-v2-base-de | 768 | 8,192 | Apache 2.0, self-host | bilingual DE/EN, CPU-capable (322 MB) | no | no | fully sovereign |
Voyage-3.5 / voyage-4 | variable | up to 32k | API only (MongoDB/Voyage) | EN/finance/legal/code-focused | partial | yes (multimodal-3/3.5) | US jurisdiction |
mxbai-embed-large-v1 (Berlin) | 1,024 | 512 | Apache 2.0, self-host | EU-developed, EN-heavy | yes | no | DACH-native |
Qwen3-Embedding-8B | variable | 32k | Apache 2.0, self-host | MTEB v2 ~70.58, very strong | yes | no | fully sovereign |
multilingual-e5-large-instruct | 1,024 | 514 | MIT, self-host | solid multilingual | partial | no | fully sovereign |
Proprietary APIs: OpenAI, Cohere, Voyage
OpenAI text-embedding-3-large delivers solid English scores with 3,072 dimensions and 8,192 tokens of context, but falls noticeably behind Cohere and BGE on German specialist tasks. It remains API-only (including via Azure OpenAI in EU regions such as Sweden Central or Switzerland North), which means US jurisdiction persists. Cohere Embed v4 is the proprietary multilingual leader with the best German signal, up to 128k context and Matryoshka truncation from 256 to 1,536 dimensions - and it is sovereignly hostable via STACKIT. Voyage (part of MongoDB since February 2025) is specialised in English, finance, legal and code; for German-language retrieval it is not the benchmark.
Open-source leaders: BGE-M3, Jina, Qwen3
BGE-M3 (BAAI, MIT) is the open-source standard for DACH multilingual RAG: 1,024 dimensions, 8,192 tokens of context, SOTA on MIRACL and - uniquely - dense, sparse and multi-vector embeddings in a single model. Jina v3 and v4 from Berlin (Apache 2.0) are the DACH-native favourites; v4, with 32k context, additionally processes visually rich documents (tables, charts, diagrams) and reaches ViDoRe 90.17 in multi-vector mode. For GPU-less mid-market stacks, jina-embeddings-v2-base-de at 322 MB is the pragmatic bilingual choice. Beware of licences: NV-Embed-v2 (NVIDIA) is CC-BY-NC and therefore excluded for commercial DACH deployments.
Practical example: storage costs and Matryoshka
A concrete calculation example for 10 million vectors at 1,024 dimensions with an HNSW index:
- float32 (baseline): raw vectors 40 GB, plus 50-100% HNSW overhead -> effective working set 60-80 GB.
- halfvec (float16): around 30-40 GB with negligible recall loss.
- Scalar Quantization (SQ8): around 10-20 GB with only 1-3% recall loss.
- Binary + rescore: around 5-10 GB - but only with full-vector rescoring of the top-N, otherwise 30-60% recall loss.
Add to this Matryoshka: text-embedding-3-large with 3,072 dimensions is overdimensioned for most enterprise RAG cases. Truncating to 1,024 or 512 dimensions costs only around 1-3% retrieval quality but saves 3-6x storage and ANN latency - at zero training cost. The same applies to Cohere Embed v4, BGE, mxbai-2d and Voyage 4. For greenfield builds, the recommendation is: choose a Matryoshka-capable model and store at 512-1,024 dimensions.
Don't forget the reranker
The choice of model is only half the battle. A cross-encoder reranker after the first retrieval stage is the single most effective measure in the pipeline - it typically lifts recall@5 by 5-15 percentage points. The Anthropic study on Contextual Retrieval demonstrates that embeddings plus BM25 reduce failed retrievals by around 49%, and with an additional reranker by up to 67%. Sovereignly self-hostable options are BGE Reranker M3 (MIT), Jina Reranker v2/v3 (Apache 2.0) and mxbai-rerank-large-v2; Cohere Rerank Multilingual is available as a premium variant via STACKIT.
Recommendation for DACH and multilingual use cases
The concrete ranking for German-language enterprise RAG (as of 2026) is: 1. Cohere Embed v4 (best German, proprietary, sovereign on STACKIT), 2. BGE-M3 (best open-source multilingual, MIT, fully sovereign), 3. Jina v3/v4 (DACH-native, multimodal), 4. BGE-multilingual-gemma2 (heavier, but SOTA on several splits), 5. Qwen3-Embedding-8B (Apache 2.0, MTEB v2 ~70.58). OpenAI text-embedding-3-large ranks well behind on German tasks.
The most sovereign stack: BGE-M3 or Jina v4, self-hosted on a single L4/A10G GPU, complemented by BGE Reranker M3 - all models downloadable, all layers operable on DACH sovereign clouds (STACKIT, IONOS, OTC, Hetzner). Cohere Embed v4 on STACKIT comes into play where the slightly better German quality justifies the commercial commitment. Important for compliance: embeddings derived from personal data are most probably considered personal data in their own right (EDPB Opinion 28/2024). For regulated workloads, therefore, only sovereignly hostable models are permissible.
For agencies and B2B decision-makers
The choice of embedding model determines the retrieval quality, storage costs and GDPR compliance of your RAG system in equal measure. Anyone building for the DACH market should select not by English MTEB rank but by demonstrated German quality and sovereignty. As a Vienna-based agency, Blck Alpaca designs and implements sovereign, multilingual RAG and AI agent stacks - from model selection through self-hosting to reranker integration. Talk to us if you want to set up a GDPR-compliant embedding setup for German-language content or optimise an existing system for German retrieval quality.
FAQ
Which embedding model is the best for German in 2026?
What do dimensions and context length mean for embedding models?
Can I self-host embedding models?
What is Matryoshka in embeddings and why does it save costs?
Are embeddings personal data under the GDPR?
Want to go deeper?
Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.