IntermediateChromaDBWeaviatePinecone

Vector Database Sammenligning

ChromaDB vs Weaviate vs Pinecone. Hands-on setup, performance benchmarks og hvornaar du skal vaelge hvilken.

25. januar 2026|10 min read

Quick Comparison

FeatureChromaDBWeaviatePinecone
TypeEmbedded/ServerSelf-hosted/CloudManaged Cloud
Setuppip installDockerAPI key
PrisGratisGratis (self-hosted)Fra $70/mo
Max vectorsUbegrænset*Ubegrænset*Tier-afh.
Best forPrototyping, mindre projekterProduction, hybrid searchEnterprise, zero-ops

* Begrænset af disk/RAM

Hvad er en Vector Database?

En vector database gemmer og soeger i high-dimensional vektorer (embeddings). Hvor relationelle databaser bruger eksakte matches, bruger vector databases similarity search til at finde de naermeste naboer til en query-vektor.

Use cases inkluderer:

  • RAG (Retrieval Augmented Generation)
  • Semantic search
  • Recommendation systems
  • Image similarity
  • Anomaly detection

ChromaDB

ChromaDB er den simpleste option. Den koerer embedded i din Python process eller som en selvstaendig server. Perfekt til prototyping og mindre projekter.

Installation

terminal
1pip install chromadb

Basic Usage

chromadb_example.py
1import chromadb
2from chromadb.utils import embedding_functions
3
4# Create client (in-memory by default)
5client = chromadb.Client()
6
7# Or persistent storage
8client = chromadb.PersistentClient(path="./chroma_db")
9
10# Use OpenAI embeddings (or default all-MiniLM-L6-v2)
11openai_ef = embedding_functions.OpenAIEmbeddingFunction(
12 api_key="your-key",
13 model_name="text-embedding-3-small"
14)
15
16# Create collection
17collection = client.create_collection(
18 name="documents",
19 embedding_function=openai_ef
20)
21
22# Add documents (embeddings generated automatically)
23collection.add(
24 documents=[
25 "Python er et programmeringssprog",
26 "JavaScript koerer i browseren",
27 "Rust har memory safety"
28 ],
29 ids=["doc1", "doc2", "doc3"],
30 metadatas=[
31 {"category": "backend"},
32 {"category": "frontend"},
33 {"category": "systems"}
34 ]
35)
36
37# Query
38results = collection.query(
39 query_texts=["Hvilket sprog er sikkert?"],
40 n_results=2
41)
42
43print(results['documents'])
44# [['Rust har memory safety', 'Python er et programmeringssprog']]

ChromaDB Pros/Cons

Pros

  • + Zero config - bare pip install
  • + Embedded mode - ingen server noedvendig
  • + Automatisk embedding generation
  • + Godt til prototyping og smaa datasets

Cons

  • - Skalerer ikke til millioner af vektorer
  • - Ingen hybrid search (keyword + vector)
  • - Begraenset query syntax
  • - Single-node only

Weaviate

Weaviate er en kraftfuld open-source vector database med hybrid search, GraphQL API og modulaer arkitektur. Koerer self-hosted eller via Weaviate Cloud.

Installation (Docker)

terminal
1# docker-compose.yml
2docker compose up -d
3
4# Or quick start
5docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest

Basic Usage

weaviate_example.py
1import weaviate
2from weaviate.classes.config import Configure, Property, DataType
3
4# Connect to local instance
5client = weaviate.connect_to_local()
6
7# Create collection with vectorizer
8collection = client.collections.create(
9 name="Document",
10 vectorizer_config=Configure.Vectorizer.text2vec_openai(),
11 properties=[
12 Property(name="content", data_type=DataType.TEXT),
13 Property(name="category", data_type=DataType.TEXT),
14 ]
15)
16
17# Add documents
18collection.data.insert_many([
19 {"content": "Python er et programmeringssprog", "category": "backend"},
20 {"content": "JavaScript koerer i browseren", "category": "frontend"},
21 {"content": "Rust har memory safety", "category": "systems"},
22])
23
24# Semantic search
25response = collection.query.near_text(
26 query="Hvilket sprog er sikkert?",
27 limit=2
28)
29
30for obj in response.objects:
31 print(obj.properties["content"])
32
33# Hybrid search (keyword + vector)
34response = collection.query.hybrid(
35 query="sikker programmering",
36 limit=2,
37 alpha=0.5 # 0 = pure keyword, 1 = pure vector
38)
39
40client.close()

Weaviate Pros/Cons

Pros

  • + Hybrid search (BM25 + vector)
  • + Skalerer horisontalt
  • + GraphQL API
  • + Multi-tenancy support
  • + Open source

Cons

  • - Kraever Docker/Kubernetes
  • - Mere kompleks setup
  • - Hoejere resource forbrug
  • - Schema-first tilgang

Pinecone

Pinecone er en fully managed vector database. Du faar en API key og kan begynde med det samme. Ingen infrastructure at vedligeholde.

Installation

terminal
1pip install pinecone-client

Basic Usage

pinecone_example.py
1from pinecone import Pinecone, ServerlessSpec
2from openai import OpenAI
3
4# Initialize clients
5pc = Pinecone(api_key="your-pinecone-key")
6openai = OpenAI()
7
8# Create index
9pc.create_index(
10 name="documents",
11 dimension=1536, # text-embedding-3-small dimension
12 metric="cosine",
13 spec=ServerlessSpec(cloud="aws", region="us-east-1")
14)
15
16index = pc.Index("documents")
17
18# Generate embeddings
19def get_embedding(text: str) -> list[float]:
20 response = openai.embeddings.create(
21 model="text-embedding-3-small",
22 input=text
23 )
24 return response.data[0].embedding
25
26# Upsert documents
27documents = [
28 {"id": "doc1", "text": "Python er et programmeringssprog", "category": "backend"},
29 {"id": "doc2", "text": "JavaScript koerer i browseren", "category": "frontend"},
30 {"id": "doc3", "text": "Rust har memory safety", "category": "systems"},
31]
32
33vectors = [
34 {
35 "id": doc["id"],
36 "values": get_embedding(doc["text"]),
37 "metadata": {"text": doc["text"], "category": doc["category"]}
38 }
39 for doc in documents
40]
41
42index.upsert(vectors=vectors)
43
44# Query
45query_embedding = get_embedding("Hvilket sprog er sikkert?")
46results = index.query(
47 vector=query_embedding,
48 top_k=2,
49 include_metadata=True
50)
51
52for match in results.matches:
53 print(f"{match.score:.3f}: {match.metadata['text']}")

Pinecone Pros/Cons

Pros

  • + Zero infrastructure
  • + Automatisk skalering
  • + Hoej tilgaengelighed built-in
  • + Enterprise support
  • + Hurtig setup

Cons

  • - Dyrt ved hoej volume
  • - Vendor lock-in
  • - Ingen hybrid search
  • - Du skal selv generere embeddings
  • - Data i cloud (compliance)

Performance Benchmark

Her er et simpelt benchmark med 100K vektorer (dimension 1536):

benchmark.py
1import time
2import numpy as np
3
4def benchmark_query(db_client, query_vector, n_queries=100):
5 times = []
6 for _ in range(n_queries):
7 start = time.perf_counter()
8 db_client.query(query_vector, top_k=10)
9 times.append(time.perf_counter() - start)
10
11 return {
12 "mean_ms": np.mean(times) * 1000,
13 "p99_ms": np.percentile(times, 99) * 1000,
14 "qps": n_queries / sum(times)
15 }
16
17# Results (100K vectors, 1536 dim, local machine):
18# ChromaDB: mean=12ms, p99=25ms, QPS=83
19# Weaviate: mean=8ms, p99=15ms, QPS=125
20# Pinecone: mean=45ms, p99=80ms, QPS=22 (network latency)

Note: Pinecones hoejere latency skyldes network round-trip. For applications hvor latency er kritisk, overvej self-hosted options.

Hvornaar vaelge hvilken?

Vaelg ChromaDB hvis:

  • Du prototyper eller bygger en POC
  • Dit dataset er under 100K vektorer
  • Du vil undgaa infrastructure kompleksitet
  • Du har brug for embedded database i din app

Vaelg Weaviate hvis:

  • Du har brug for hybrid search
  • Dit dataset er stort (millioner+ vektorer)
  • Du vil have kontrol over din infrastructure
  • Du har behov for multi-tenancy

Vaelg Pinecone hvis:

  • Du vil minimere ops overhead
  • Dit team har begraenset infrastructure erfaring
  • Du har budget til managed services
  • Du har brug for enterprise support/SLA

Migration mellem databases

Det er relativt simpelt at migrere data mellem vector databases. Eksportér dine vektorer og metadata, og importér til den nye database:

migration.py
1def export_from_chroma(collection) -> list[dict]:
2 """Export all data from ChromaDB collection."""
3 results = collection.get(include=["embeddings", "documents", "metadatas"])
4 return [
5 {
6 "id": id,
7 "embedding": emb,
8 "document": doc,
9 "metadata": meta
10 }
11 for id, emb, doc, meta in zip(
12 results["ids"],
13 results["embeddings"],
14 results["documents"],
15 results["metadatas"]
16 )
17 ]
18
19def import_to_weaviate(client, data: list[dict], collection_name: str):
20 """Import data to Weaviate."""
21 collection = client.collections.get(collection_name)
22 with collection.batch.dynamic() as batch:
23 for item in data:
24 batch.add_object(
25 properties={"content": item["document"], **item["metadata"]},
26 vector=item["embedding"]
27 )

Naeste skridt

Nu hvor du kender forskellene mellem vector databases, er du klar til at bygge en RAG pipeline. Tjek vores RAG guide for at se hvordan du kombinerer vector search med Claude.