Vector Database Sammenligning
ChromaDB vs Weaviate vs Pinecone. Hands-on setup, performance benchmarks og hvornaar du skal vaelge hvilken.
Quick Comparison
| Feature | ChromaDB | Weaviate | Pinecone |
|---|---|---|---|
| Type | Embedded/Server | Self-hosted/Cloud | Managed Cloud |
| Setup | pip install | Docker | API key |
| Pris | Gratis | Gratis (self-hosted) | Fra $70/mo |
| Max vectors | Ubegrænset* | Ubegrænset* | Tier-afh. |
| Best for | Prototyping, mindre projekter | Production, hybrid search | Enterprise, zero-ops |
* Begrænset af disk/RAM
Hvad er en Vector Database?
En vector database gemmer og soeger i high-dimensional vektorer (embeddings). Hvor relationelle databaser bruger eksakte matches, bruger vector databases similarity search til at finde de naermeste naboer til en query-vektor.
Use cases inkluderer:
- RAG (Retrieval Augmented Generation)
- Semantic search
- Recommendation systems
- Image similarity
- Anomaly detection
ChromaDB
ChromaDB er den simpleste option. Den koerer embedded i din Python process eller som en selvstaendig server. Perfekt til prototyping og mindre projekter.
Installation
1pip install chromadbBasic Usage
1import chromadb2from chromadb.utils import embedding_functions3
4# Create client (in-memory by default)5client = chromadb.Client()6
7# Or persistent storage8client = chromadb.PersistentClient(path="./chroma_db")9
10# Use OpenAI embeddings (or default all-MiniLM-L6-v2)11openai_ef = embedding_functions.OpenAIEmbeddingFunction(12 api_key="your-key",13 model_name="text-embedding-3-small"14)15
16# Create collection17collection = client.create_collection(18 name="documents",19 embedding_function=openai_ef20)21
22# Add documents (embeddings generated automatically)23collection.add(24 documents=[25 "Python er et programmeringssprog",26 "JavaScript koerer i browseren",27 "Rust har memory safety"28 ],29 ids=["doc1", "doc2", "doc3"],30 metadatas=[31 {"category": "backend"},32 {"category": "frontend"},33 {"category": "systems"}34 ]35)36
37# Query38results = collection.query(39 query_texts=["Hvilket sprog er sikkert?"],40 n_results=241)42
43print(results['documents'])44# [['Rust har memory safety', 'Python er et programmeringssprog']]ChromaDB Pros/Cons
Pros
- + Zero config - bare pip install
- + Embedded mode - ingen server noedvendig
- + Automatisk embedding generation
- + Godt til prototyping og smaa datasets
Cons
- - Skalerer ikke til millioner af vektorer
- - Ingen hybrid search (keyword + vector)
- - Begraenset query syntax
- - Single-node only
Weaviate
Weaviate er en kraftfuld open-source vector database med hybrid search, GraphQL API og modulaer arkitektur. Koerer self-hosted eller via Weaviate Cloud.
Installation (Docker)
1# docker-compose.yml2docker compose up -d3
4# Or quick start5docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latestBasic Usage
1import weaviate2from weaviate.classes.config import Configure, Property, DataType3
4# Connect to local instance5client = weaviate.connect_to_local()6
7# Create collection with vectorizer8collection = client.collections.create(9 name="Document",10 vectorizer_config=Configure.Vectorizer.text2vec_openai(),11 properties=[12 Property(name="content", data_type=DataType.TEXT),13 Property(name="category", data_type=DataType.TEXT),14 ]15)16
17# Add documents18collection.data.insert_many([19 {"content": "Python er et programmeringssprog", "category": "backend"},20 {"content": "JavaScript koerer i browseren", "category": "frontend"},21 {"content": "Rust har memory safety", "category": "systems"},22])23
24# Semantic search25response = collection.query.near_text(26 query="Hvilket sprog er sikkert?",27 limit=228)29
30for obj in response.objects:31 print(obj.properties["content"])32
33# Hybrid search (keyword + vector)34response = collection.query.hybrid(35 query="sikker programmering",36 limit=2,37 alpha=0.5 # 0 = pure keyword, 1 = pure vector38)39
40client.close()Weaviate Pros/Cons
Pros
- + Hybrid search (BM25 + vector)
- + Skalerer horisontalt
- + GraphQL API
- + Multi-tenancy support
- + Open source
Cons
- - Kraever Docker/Kubernetes
- - Mere kompleks setup
- - Hoejere resource forbrug
- - Schema-first tilgang
Pinecone
Pinecone er en fully managed vector database. Du faar en API key og kan begynde med det samme. Ingen infrastructure at vedligeholde.
Installation
1pip install pinecone-clientBasic Usage
1from pinecone import Pinecone, ServerlessSpec2from openai import OpenAI3
4# Initialize clients5pc = Pinecone(api_key="your-pinecone-key")6openai = OpenAI()7
8# Create index9pc.create_index(10 name="documents",11 dimension=1536, # text-embedding-3-small dimension12 metric="cosine",13 spec=ServerlessSpec(cloud="aws", region="us-east-1")14)15
16index = pc.Index("documents")17
18# Generate embeddings19def get_embedding(text: str) -> list[float]:20 response = openai.embeddings.create(21 model="text-embedding-3-small",22 input=text23 )24 return response.data[0].embedding25
26# Upsert documents27documents = [28 {"id": "doc1", "text": "Python er et programmeringssprog", "category": "backend"},29 {"id": "doc2", "text": "JavaScript koerer i browseren", "category": "frontend"},30 {"id": "doc3", "text": "Rust har memory safety", "category": "systems"},31]32
33vectors = [34 {35 "id": doc["id"],36 "values": get_embedding(doc["text"]),37 "metadata": {"text": doc["text"], "category": doc["category"]}38 }39 for doc in documents40]41
42index.upsert(vectors=vectors)43
44# Query45query_embedding = get_embedding("Hvilket sprog er sikkert?")46results = index.query(47 vector=query_embedding,48 top_k=2,49 include_metadata=True50)51
52for match in results.matches:53 print(f"{match.score:.3f}: {match.metadata['text']}")Pinecone Pros/Cons
Pros
- + Zero infrastructure
- + Automatisk skalering
- + Hoej tilgaengelighed built-in
- + Enterprise support
- + Hurtig setup
Cons
- - Dyrt ved hoej volume
- - Vendor lock-in
- - Ingen hybrid search
- - Du skal selv generere embeddings
- - Data i cloud (compliance)
Performance Benchmark
Her er et simpelt benchmark med 100K vektorer (dimension 1536):
1import time2import numpy as np3
4def benchmark_query(db_client, query_vector, n_queries=100):5 times = []6 for _ in range(n_queries):7 start = time.perf_counter()8 db_client.query(query_vector, top_k=10)9 times.append(time.perf_counter() - start)10
11 return {12 "mean_ms": np.mean(times) * 1000,13 "p99_ms": np.percentile(times, 99) * 1000,14 "qps": n_queries / sum(times)15 }16
17# Results (100K vectors, 1536 dim, local machine):18# ChromaDB: mean=12ms, p99=25ms, QPS=8319# Weaviate: mean=8ms, p99=15ms, QPS=12520# Pinecone: mean=45ms, p99=80ms, QPS=22 (network latency)Note: Pinecones hoejere latency skyldes network round-trip. For applications hvor latency er kritisk, overvej self-hosted options.
Hvornaar vaelge hvilken?
Vaelg ChromaDB hvis:
- Du prototyper eller bygger en POC
- Dit dataset er under 100K vektorer
- Du vil undgaa infrastructure kompleksitet
- Du har brug for embedded database i din app
Vaelg Weaviate hvis:
- Du har brug for hybrid search
- Dit dataset er stort (millioner+ vektorer)
- Du vil have kontrol over din infrastructure
- Du har behov for multi-tenancy
Vaelg Pinecone hvis:
- Du vil minimere ops overhead
- Dit team har begraenset infrastructure erfaring
- Du har budget til managed services
- Du har brug for enterprise support/SLA
Migration mellem databases
Det er relativt simpelt at migrere data mellem vector databases. Eksportér dine vektorer og metadata, og importér til den nye database:
1def export_from_chroma(collection) -> list[dict]:2 """Export all data from ChromaDB collection."""3 results = collection.get(include=["embeddings", "documents", "metadatas"])4 return [5 {6 "id": id,7 "embedding": emb,8 "document": doc,9 "metadata": meta10 }11 for id, emb, doc, meta in zip(12 results["ids"],13 results["embeddings"],14 results["documents"],15 results["metadatas"]16 )17 ]18
19def import_to_weaviate(client, data: list[dict], collection_name: str):20 """Import data to Weaviate."""21 collection = client.collections.get(collection_name)22 with collection.batch.dynamic() as batch:23 for item in data:24 batch.add_object(25 properties={"content": item["document"], **item["metadata"]},26 vector=item["embedding"]27 )Naeste skridt
Nu hvor du kender forskellene mellem vector databases, er du klar til at bygge en RAG pipeline. Tjek vores RAG guide for at se hvordan du kombinerer vector search med Claude.