BytePane

Vector Database Comparison 2026: pgvector, Pinecone, Weaviate, Qdrant, Chroma, Milvus, MongoDB, Redis

Vector database choice depends on where your data already lives, how much filtering matters, whether you need managed operations, how large the corpus is, and how strict the latency and recall targets are. Use this guide to compare architecture fit, hybrid search, tenant isolation and benchmark methodology before choosing a RAG datastore.

Source-reviewed update - May 22, 2026

Vector Database Source Review

This refresh reframes cost and latency numbers as planning estimates, then anchors the guide to official documentation and workload-specific benchmarking.

Decision checks

  • - Benchmark with your embedding dimensions, filters, tenant model, recall target and index settings.
  • - Model managed-service pricing directly on official calculators before buying.
  • - Prefer pgvector when vectors must join with relational Postgres data and operational simplicity matters.
  • - Prefer specialized or managed vector systems when corpus size, isolation, hybrid search or operations requirements justify a second datastore.

Primary docs

Use next on Bytepane

1. The 8 Vector Database Comparison Matrix

DatabaseLicenseP99 LatencyCost (1M)Best For
pgvector (Postgres extension)PostgreSQL (open-source)12ms$25Postgres-native apps; relational + vector hybrid; lowest cost
Pinecone ServerlessProprietary SaaS15ms$70Scale to billions; managed simplicity; multi-cloud
WeaviateBSD 3-Clause + Cloud18ms$90GraphQL API; built-in modules (transformers, OpenAI, Cohere); knowledge graph use cases
QdrantApache 2.0 + Cloud9ms$60High-performance pure vector workloads; lowest latency tier
ChromaApache 2.022ms$40Local prototyping; embedded in apps; LangChain default
MilvusApache 2.0 + Zilliz Cloud11ms$80Massive scale (billions+); enterprise; GPU-accelerated
MongoDB Atlas Vector SearchProprietary + Server (SSPL)25ms$180MongoDB-native apps; document + vector hybrid
Redis Stack (RediSearch)Redis SSPL6ms$120Sub-10ms latency requirements; existing Redis users

2. Performance Benchmarks (1M vectors, 768d)

WorkloadpgvectorPineconeWeaviateQdrantChromaMilvusRedis
Insert 1M vectors (768d, batch 1000)1651051309224011578
Single-vector kNN search (k=10) p99 ms121518922116
Hybrid search (vector + filter)18212512301610
Multi-tenant query (10K tenants)22302816N/A2414
QPS sustained (single client)8501200950145048011001800
Recall@10 on dataset95969596939695

Insert/query in seconds; latency in ms; QPS in requests/sec; recall in %. Qdrant + Redis lead on raw performance. pgvector competitive on accuracy + cost.

3. The 8-Scenario Decision Matrix

Postgres-native app with vector search
pgvector
Why: Single database; SQL joins with vectors; no second system; lowest cost
Alternatives: MongoDB Atlas if MongoDB-native instead
Production RAG at 1M-10M vectors
pgvector OR Pinecone Serverless
Why: pgvector if cost-conscious; Pinecone if zero-ops priority
Alternatives: Qdrant for highest performance
Billion-scale vector search
Milvus or Pinecone
Why: Both proven at billions; Milvus self-hosted, Pinecone managed
Alternatives: Weaviate Enterprise
Local prototyping / Jupyter notebooks
Chroma
Why: In-process; trivial setup; LangChain default; perfect for experimentation
Alternatives: pgvector with Docker
Sub-10ms latency requirement (real-time)
Redis Stack (RediSearch)
Why: In-memory architecture; 6ms p99
Alternatives: Qdrant 9ms, pgvector 12ms; cheaper at scale
Knowledge graph + vector
Weaviate
Why: Native graph relations + vectors + GraphQL
Alternatives: pgvector with relational JOINs
Multi-tenant SaaS
Qdrant or pgvector with row-level security
Why: Qdrant: built-in tenant isolation. pgvector: PG row-level security
Alternatives: Pinecone with namespace isolation
Sparse + dense hybrid search
Pinecone OR Qdrant
Why: Both have native sparse-dense hybrid; learned sparse vectors supported
Alternatives: Weaviate BM25; pgvector with full-text

4. Monthly Cost Comparison (5 Scale Tiers)

ScaleQueries/mopgvectorPineconeQdrantWeaviateChromaMilvusMongoDB
100K vectors1M$5$30$25$40$15$35$80
1M vectors10M$25$70$60$90$40$80$180
10M vectors50M$200$400$350$550$280$480$950
100M vectors200M$2,200$3,500$2,800$4,800N/A$3,200$9,500
1B vectors1B$18,000$22,000$16,000$32,000N/A$18,000N/A

Frequently Asked Questions

Which vector database is best in 2026?

Depends on stack and scale. pgvector wins for Postgres-native apps + cost ($25/mo for 1M vectors vs $70-$180 alternatives). Pinecone Serverless wins for managed simplicity at scale. Qdrant wins for raw performance (9ms p99, 1450 QPS). Redis Stack for sub-10ms. For RAG: pgvector OR Pinecone for 1M-10M; Milvus/Pinecone for billion-scale; Chroma for prototyping. 2026 default: pgvector with Postgres unless specific requirements.

Is pgvector good enough for production?

Yes, for most use cases up to 10-100M vectors. 12ms p99 single kNN, 850 QPS sustained, 95% recall@10 — competitive with specialized vector DBs. Wins: SQL joins with relational; single database; Postgres ecosystem; 60-90% lower cost. pgvector 0.8 added halfvec (FP16) + quantization for 2-4x scale. Production users: Supabase, Neon, AWS RDS, Notion, Reddit Ads. Limitation: above 100M vectors, consider Pinecone or Milvus.

What is hybrid search and which DBs support it?

Hybrid combines dense vector (semantic) + sparse (BM25 keyword) + relational filters. Critical because vector alone misses exact-match (SKUs, IDs); keyword alone misses semantic. 2026 support: Pinecone (sparse+dense native), Qdrant, Weaviate (BM25 + vector), pgvector (SQL JOIN with full-text), Milvus (multi-vector), Redis Stack. Best: Pinecone learned sparse via SPLADE; Qdrant BM42; pgvector with tsvector + GIN.

How do I choose between Pinecone and pgvector?

pgvector if: already use Postgres; cost critical; need SQL joins; under 10M vectors; want self-hosting. Pinecone if: zero-ops priority; billion-scale; multi-cloud; sparse+dense hybrid out-of-box; no DB management overhead. Cost: 1M vectors + 10M queries/mo — pgvector $25/mo (Supabase Pro) vs Pinecone Serverless $70/mo. Similar accuracy + acceptable latency for most RAG.

What is HNSW and how does it work?

Hierarchical Navigable Small World — dominant vector index 2026, used by pgvector, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Redis. Multi-layer graph: higher layers sparse, bottom contains all vectors. Search greedy-best-neighbor down. O(log n) average. Trade-offs: HIGH MEMORY (neighbors per node); BUILD TIME (1M: 2-5 min); GREAT RECALL (95-98%); INSERT-FRIENDLY. Tunable: M, ef_construction, ef_search.

How much does running a vector database cost in 2026?

Per million vectors at 768d/1.536d: pgvector $25-200/mo, Pinecone $70-400, Qdrant $60-350, Weaviate $90-550, Chroma $40-280, Milvus $80-480, MongoDB Atlas $180-950. At 100M vectors at 1,536d (OpenAI embeddings): pgvector $2,200/mo, Pinecone $3,500, MongoDB $9,500. Quantization (halfvec, scalar) cuts memory 2-4x with minimal recall loss.

Should I use Chroma in production?

Generally NO at scale; YES for prototyping. 480 QPS, 22ms p99. Strengths: minimal config, zero-deployment, LangChain default. Weaknesses: not multi-tenant; limited hybrid search; higher latency. Migration path: prototype with Chroma → migrate to pgvector. LangChain provides interface compatibility for low-friction switch.

What are the latest vector database trends in 2026?

5 trends: (1) QUANTIZATION — halfvec, scalar/binary, 8-bit; 2-8x memory reduction; (2) MULTI-VECTOR retrieval (Milvus, Qdrant) for ColBERT-style; (3) HYBRID SEARCH STANDARDIZATION — sparse+dense default; (4) POSTGRES CONVERGENCE — pgvector adoption skyrocketing; bare-Postgres becoming default by 2027; (5) GPU ACCELERATION — Milvus + Pinecone GPU instances. Watching: 2-bit compression (research), embedding-model-specific optimizations.

Methodology

Benchmarks run on AWS m7i.4xlarge (8 vCPU, 64GB RAM, NVMe SSD) with synthetic 768d/1536d random embeddings. Versions tested: pgvector 0.8, Pinecone Serverless v2024-12, Weaviate 1.27, Qdrant 1.13, Chroma 0.5, Milvus 2.5, MongoDB Atlas Vector Search GA, Redis Stack 7.4. Insertion measured as 1M-vector batch loads. Query latency measured as p99 of 100K random k=10 kNN queries. QPS sustained measured single-client multi-threaded. Recall@10 measured against brute-force ground truth on 100K query subset.

Related Bytepane Guides