Skip to content
4.3Intermediate7 min

Embedding Models 2026 Compared: text-embedding-3, Cohere, BGE-M3, Voyage & Jina

Blck Alpaca·
Definition

An embedding model comparison evaluates models such as OpenAI text-embedding-3, Cohere Embed v4, BGE-M3, Voyage and Jina by dimensions, context length, MTEB/MMTEB benchmarks, multilinguality, cost, self-hosting and licence. For German-language RAG systems, what counts is not the English MTEB rank but the demonstrated quality on MMTEB, MIRACL and MTEB-DE.

Key Takeaways

  • The English MTEB rank is not a German rank: on MMTEB, MIRACL-de and MTEB-DE, English-optimised models often lose 5-15 nDCG@10 points on compound words, technical language and long-word-heavy retrieval.
  • For German-language RAG systems, Cohere Embed v4 (best proprietary German), BGE-M3 (best open-source multilingual, MIT) and Jina v3/v4 (Berlin, Apache 2.0) are the leaders for 2026.
  • Self-hosting determines sovereignty: BGE-M3, Jina v3/v4 and mxbai are downloadable and therefore GDPR/DACH-compliant; OpenAI and Voyage remain API-only under US jurisdiction.
  • Matryoshka models (text-embedding-3, Cohere v4, BGE, Voyage 4) allow dimension truncation: going from 3,072 to 512-1,024 dims costs only around 1-3% retrieval quality but saves 3-6x storage and latency.
  • A cross-encoder reranker is the single most effective measure: plus 5-15 points recall@5; embeddings + BM25 + reranker reduce failed retrievals by up to 67% according to Anthropic.
  • Embeddings derived from personal data are considered personal data - for regulated DACH workloads, only sovereignly hostable models (BGE-M3, Jina, mxbai, Cohere on STACKIT) are permissible.

An embedding model converts text into numerical vectors that make semantic similarity measurable - the foundation of every RAG system. A well-founded embedding model comparison evaluates the candidates by dimensions, context length, MTEB/MMTEB benchmarks, multilinguality, cost, self-hosting capability and licence. For the DACH region, the rule is: the English MTEB rank is not a German rank. What matters is MMTEB, MIRACL and MTEB-DE.

  • Best German (proprietary): Cohere Embed v4 - multilingual leader, sovereignly hostable on STACKIT (as of 2026).
  • Best open-source multilingual: BGE-M3 (MIT) - SOTA on MIRACL, combining dense, sparse and multi-vector in a single model.
  • DACH-native & multimodal: Jina v3/v4 (Berlin, Apache 2.0) - downloadable, and v4 also processes visual documents.

Why the English MTEB rank is misleading

MTEB v1 is dominated by English tasks. The correct DACH references are MMTEB (the Massive Multilingual Text Embedding Benchmark with over 1,000 tasks and 250+ languages), MIRACL (18-language monolingual retrieval) and German subsets such as MTEB-DE, GermanQuAD-Retrieval and MIRACL-de. These benchmarks regularly reorder the leaderboard: models that shine in English often lose 5-15 nDCG@10 points in German - on compound words, technical language, legal and medical terminology, as well as tokenisation-heavy long-word retrieval.

For German B2B corpora, a further complication is that named entities, product codes, paragraph numbers, ICD codes, SAP material numbers, IBANs and case reference numbers are precisely the tokens where pure dense retrieval models struggle. The choice of model is therefore always a decision about German language quality - not about the global leaderboard.

The most important embedding models of 2026 compared

The following table summarises the key selection criteria. All figures are drawn from the internal research basis, as of 2026.

Model

Dimensions

Context

Licence / Hosting

German signal

Matryoshka

Multimodal

Sovereignty

OpenAI text-embedding-3-large

3,072 (truncatable)

8,192

API + Azure OpenAI (EU regions)

solid EN, weaker on DE technical tasks

yes

no

US jurisdiction, no on-prem

OpenAI text-embedding-3-small

1,536 (truncatable)

8,192

as above

similar pattern

yes

no

US jurisdiction

Cohere Embed v4

256-1,536

128k

API, Bedrock EU, Azure EU, STACKIT

top-tier German, MTEB v2 ~65

yes

yes

sovereign on STACKIT

BGE-M3 (BAAI)

1,024 + sparse + multi-vec

8,192

MIT, self-host

top-tier multilingual, SOTA MIRACL

no

fully sovereign

Jina Embeddings v3 (Berlin)

1,024 (task LoRA)

8,192

Apache 2.0, self-host

strong, beats OpenAI/Cohere on MTEB at 570M params

yes

no

DACH-native

Jina Embeddings v4

up to 2,048 / multi-vec

32k

Apache 2.0, self-host

MMTEB 66.49; ViDoRe 90.17

yes

yes

DACH-native

jina-embeddings-v2-base-de

768

8,192

Apache 2.0, self-host

bilingual DE/EN, CPU-capable (322 MB)

no

no

fully sovereign

Voyage-3.5 / voyage-4

variable

up to 32k

API only (MongoDB/Voyage)

EN/finance/legal/code-focused

partial

yes (multimodal-3/3.5)

US jurisdiction

mxbai-embed-large-v1 (Berlin)

1,024

512

Apache 2.0, self-host

EU-developed, EN-heavy

yes

no

DACH-native

Qwen3-Embedding-8B

variable

32k

Apache 2.0, self-host

MTEB v2 ~70.58, very strong

yes

no

fully sovereign

multilingual-e5-large-instruct

1,024

514

MIT, self-host

solid multilingual

partial

no

fully sovereign

Proprietary APIs: OpenAI, Cohere, Voyage

OpenAI text-embedding-3-large delivers solid English scores with 3,072 dimensions and 8,192 tokens of context, but falls noticeably behind Cohere and BGE on German specialist tasks. It remains API-only (including via Azure OpenAI in EU regions such as Sweden Central or Switzerland North), which means US jurisdiction persists. Cohere Embed v4 is the proprietary multilingual leader with the best German signal, up to 128k context and Matryoshka truncation from 256 to 1,536 dimensions - and it is sovereignly hostable via STACKIT. Voyage (part of MongoDB since February 2025) is specialised in English, finance, legal and code; for German-language retrieval it is not the benchmark.

Open-source leaders: BGE-M3, Jina, Qwen3

BGE-M3 (BAAI, MIT) is the open-source standard for DACH multilingual RAG: 1,024 dimensions, 8,192 tokens of context, SOTA on MIRACL and - uniquely - dense, sparse and multi-vector embeddings in a single model. Jina v3 and v4 from Berlin (Apache 2.0) are the DACH-native favourites; v4, with 32k context, additionally processes visually rich documents (tables, charts, diagrams) and reaches ViDoRe 90.17 in multi-vector mode. For GPU-less mid-market stacks, jina-embeddings-v2-base-de at 322 MB is the pragmatic bilingual choice. Beware of licences: NV-Embed-v2 (NVIDIA) is CC-BY-NC and therefore excluded for commercial DACH deployments.

Practical example: storage costs and Matryoshka

A concrete calculation example for 10 million vectors at 1,024 dimensions with an HNSW index:

  • float32 (baseline): raw vectors 40 GB, plus 50-100% HNSW overhead -> effective working set 60-80 GB.
  • halfvec (float16): around 30-40 GB with negligible recall loss.
  • Scalar Quantization (SQ8): around 10-20 GB with only 1-3% recall loss.
  • Binary + rescore: around 5-10 GB - but only with full-vector rescoring of the top-N, otherwise 30-60% recall loss.

Add to this Matryoshka: text-embedding-3-large with 3,072 dimensions is overdimensioned for most enterprise RAG cases. Truncating to 1,024 or 512 dimensions costs only around 1-3% retrieval quality but saves 3-6x storage and ANN latency - at zero training cost. The same applies to Cohere Embed v4, BGE, mxbai-2d and Voyage 4. For greenfield builds, the recommendation is: choose a Matryoshka-capable model and store at 512-1,024 dimensions.

Don't forget the reranker

The choice of model is only half the battle. A cross-encoder reranker after the first retrieval stage is the single most effective measure in the pipeline - it typically lifts recall@5 by 5-15 percentage points. The Anthropic study on Contextual Retrieval demonstrates that embeddings plus BM25 reduce failed retrievals by around 49%, and with an additional reranker by up to 67%. Sovereignly self-hostable options are BGE Reranker M3 (MIT), Jina Reranker v2/v3 (Apache 2.0) and mxbai-rerank-large-v2; Cohere Rerank Multilingual is available as a premium variant via STACKIT.

Recommendation for DACH and multilingual use cases

The concrete ranking for German-language enterprise RAG (as of 2026) is: 1. Cohere Embed v4 (best German, proprietary, sovereign on STACKIT), 2. BGE-M3 (best open-source multilingual, MIT, fully sovereign), 3. Jina v3/v4 (DACH-native, multimodal), 4. BGE-multilingual-gemma2 (heavier, but SOTA on several splits), 5. Qwen3-Embedding-8B (Apache 2.0, MTEB v2 ~70.58). OpenAI text-embedding-3-large ranks well behind on German tasks.

The most sovereign stack: BGE-M3 or Jina v4, self-hosted on a single L4/A10G GPU, complemented by BGE Reranker M3 - all models downloadable, all layers operable on DACH sovereign clouds (STACKIT, IONOS, OTC, Hetzner). Cohere Embed v4 on STACKIT comes into play where the slightly better German quality justifies the commercial commitment. Important for compliance: embeddings derived from personal data are most probably considered personal data in their own right (EDPB Opinion 28/2024). For regulated workloads, therefore, only sovereignly hostable models are permissible.

For agencies and B2B decision-makers

The choice of embedding model determines the retrieval quality, storage costs and GDPR compliance of your RAG system in equal measure. Anyone building for the DACH market should select not by English MTEB rank but by demonstrated German quality and sovereignty. As a Vienna-based agency, Blck Alpaca designs and implements sovereign, multilingual RAG and AI agent stacks - from model selection through self-hosting to reranker integration. Talk to us if you want to set up a GDPR-compliant embedding setup for German-language content or optimise an existing system for German retrieval quality.

FAQ

Which embedding model is the best for German in 2026?
For German-language enterprise RAG, MMTEB, MIRACL and MTEB-DE evaluations show Cohere Embed v4 leading among the proprietary models, BGE-M3 (MIT licence) among the open-source models, and Jina v3/v4 (Berlin, Apache 2.0) among the DACH-native options. OpenAI text-embedding-3-large lies materially behind on German technical tasks.
What do dimensions and context length mean for embedding models?
Dimensions are the length of the vector per text chunk (e.g. 1,024 or 3,072) and determine storage and latency costs. Context length is the maximum number of tokens per embedding call: BGE-M3 and Jina v3 reach 8,192 tokens, Jina v4 32,000, and Cohere Embed v4 even 128k - relevant for late chunking of long documents.
Can I self-host embedding models?
Yes, with open-source models. BGE-M3 (MIT), Jina v3/v4 (Apache 2.0), multilingual-e5, mxbai-embed-large and Qwen3-Embedding are downloadable and run on your own infrastructure or DACH sovereign clouds such as STACKIT, IONOS or OTC. OpenAI and Voyage are available exclusively as an API.
What is Matryoshka in embeddings and why does it save costs?
Matryoshka embeddings encode the most important information in the leading vector dimensions, so you can truncate the vector at query time. Truncating text-embedding-3-large from 3,072 to 1,024 or 512 dims costs around 1-3% retrieval quality but saves 3-6x storage and ANN latency - at zero training cost.
Are embeddings personal data under the GDPR?
Most probably yes, if they were derived from personal data - subject to the required re-identification risk assessment in line with EDPB Opinion 28/2024 and Guidelines 01/2025 (as of 2026). Inversion attacks reconstruct up to 92% of 32-token inputs. Embeddings of personal data therefore belong in sovereign infrastructure.

Want to go deeper?

Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.