RAG Implementation fra Scratch
En hands-on guide til at bygge en komplet RAG-pipeline. Vi starter med det basale og bygger op til en production-ready loesning.
TL;DR
RAG = Retrieval + Augmented Generation. Du henter relevante dokumenter fra en vector database og giver dem til LLM'en som kontekst.
1# 1. Embed og gem dokumenter2embeddings = model.encode(documents)3collection.add(embeddings=embeddings, documents=documents)4
5# 2. Hent relevante docs for query6results = collection.query(query_text, n_results=3)7
8# 3. Generer svar med kontekst9response = llm.generate(f"Kontekst: {results}\n\nSpoergsmaal: {query}")Prerequisites
1pip install anthropic chromadb sentence-transformers- • Python 3.11+
- • Anthropic API key (eller OpenAI)
- • Ca. 1GB disk space til embedding model
Hvad er RAG?
Retrieval Augmented Generation (RAG) er en teknik hvor du kombinerer en LLM med et eksternt knowledge base. I stedet for at stole pa modellens traeningsdata, henter du relevante dokumenter og giver dem til modellen som kontekst.
Det loeser tre fundamentale problemer:
- Hallucinations - Modellen svarer kun ud fra faktiske data
- Outdated knowledge - Din data er altid up-to-date
- Domain-specific knowledge - Du kan tilfoeje intern viden
RAG Arkitekturen
En typisk RAG-pipeline bestar af tre komponenter:
- Embedding model - Konverterer tekst til vektorer
- Vector database - Gemmer og soeger i vektorer
- LLM - Genererer svar baseret pa retrieved context
Query → Embed → Vector Search → Top-K Documents → Prompt + Context → LLM → Response
↑
Vector Database
(pre-embedded documents)Step 1: Document Preprocessing
Foer vi kan embedde dokumenter, skal de chunkes. Valg af chunk-strategi har stor betydning for retrieval quality.
Fixed-size chunks
Den simpleste tilgang. Du splitter ved et fast antal tokens med overlap:
1def fixed_size_chunks(2 text: str,3 chunk_size: int = 500,4 overlap: int = 505) -> list[str]:6 """Split text into fixed-size chunks with overlap."""7 words = text.split()8 chunks = []9
10 for i in range(0, len(words), chunk_size - overlap):11 chunk = " ".join(words[i:i + chunk_size])12 if chunk: # Skip empty chunks13 chunks.append(chunk)14
15 return chunks16
17# Example18text = open("document.txt").read()19chunks = fixed_size_chunks(text, chunk_size=300, overlap=30)20print(f"Created {len(chunks)} chunks")Semantic chunking
Mere avanceret: Split ved semantiske graenser som afsnit eller sektioner. Det giver bedre retrieval fordi hver chunk er mere koherent:
1import re2from dataclasses import dataclass3
4@dataclass5class Chunk:6 content: str7 metadata: dict8
9def semantic_chunks(10 text: str,11 max_chunk_size: int = 1000,12 source: str = ""13) -> list[Chunk]:14 """Split at semantic boundaries (paragraphs, headers)."""15 # Split by double newlines (paragraphs)16 paragraphs = re.split(r'\n\n+', text)17
18 chunks = []19 current_chunk = ""20 current_start = 021
22 for i, para in enumerate(paragraphs):23 if len(current_chunk) + len(para) < max_chunk_size:24 current_chunk += para + "\n\n"25 else:26 if current_chunk:27 chunks.append(Chunk(28 content=current_chunk.strip(),29 metadata={30 "source": source,31 "chunk_index": len(chunks),32 "paragraph_start": current_start33 }34 ))35 current_chunk = para + "\n\n"36 current_start = i37
38 if current_chunk:39 chunks.append(Chunk(40 content=current_chunk.strip(),41 metadata={42 "source": source,43 "chunk_index": len(chunks),44 "paragraph_start": current_start45 }46 ))47
48 return chunksStep 2: Embedding Generation
Embeddings er vektorrepraesentationer af tekst. Lignende tekster har lignende vektorer, hvilket muliggoer semantic search.
Valg af embedding model
| Model | Dimension | Pris | Best for |
|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | Gratis (local) | Prototyping |
| text-embedding-3-small | 1536 | $0.02/1M | Production |
| text-embedding-3-large | 3072 | $0.13/1M | High accuracy |
1from sentence_transformers import SentenceTransformer2from openai import OpenAI3
4# Option 1: Local model (free, but less accurate)5local_model = SentenceTransformer('all-MiniLM-L6-v2')6
7def embed_local(texts: list[str]) -> list[list[float]]:8 return local_model.encode(texts).tolist()9
10# Option 2: OpenAI (paid, more accurate)11openai_client = OpenAI()12
13def embed_openai(texts: list[str]) -> list[list[float]]:14 response = openai_client.embeddings.create(15 model="text-embedding-3-small",16 input=texts17 )18 return [e.embedding for e in response.data]19
20# Usage21texts = ["Python er et programmeringssprog", "JavaScript koerer i browseren"]22embeddings = embed_local(texts)23print(f"Embedding dimension: {len(embeddings[0])}")Step 3: Vector Database Setup
Nu skal vi gemme embeddings i en vector database. Vi bruger ChromaDB fordi den er simpel at komme i gang med:
1import chromadb2from chromadb.utils import embedding_functions3
4# Initialize ChromaDB with persistence5client = chromadb.PersistentClient(path="./chroma_db")6
7# Use sentence-transformers for embedding8embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(9 model_name="all-MiniLM-L6-v2"10)11
12# Create or get collection13collection = client.get_or_create_collection(14 name="documents",15 embedding_function=embedding_fn,16 metadata={"hnsw:space": "cosine"} # Use cosine similarity17)18
19def add_documents(chunks: list[Chunk]):20 """Add document chunks to vector store."""21 collection.add(22 documents=[c.content for c in chunks],23 metadatas=[c.metadata for c in chunks],24 ids=[f"chunk_{i}" for i in range(len(chunks))]25 )26 print(f"Added {len(chunks)} chunks to collection")27
28def search(query: str, n_results: int = 5) -> list[dict]:29 """Search for relevant documents."""30 results = collection.query(31 query_texts=[query],32 n_results=n_results,33 include=["documents", "metadatas", "distances"]34 )35
36 return [37 {38 "content": doc,39 "metadata": meta,40 "similarity": 1 - dist # Convert distance to similarity41 }42 for doc, meta, dist in zip(43 results["documents"][0],44 results["metadatas"][0],45 results["distances"][0]46 )47 ]Step 4: RAG Pipeline
Nu samler vi det hele i en komplet RAG-pipeline:
1from anthropic import Anthropic2
3client = Anthropic()4
5class RAGPipeline:6 def __init__(self, collection, system_prompt: str = None):7 self.collection = collection8 self.system_prompt = system_prompt or """Du er en hjaelpsom assistent.9Brug KUN den givne kontekst til at svare pa spoergsmaal.10Hvis konteksten ikke indeholder svaret, sig det aeerligt.11Citer altid hvilke dele af konteksten du baserer dit svar pa."""12
13 def retrieve(self, query: str, n_results: int = 5) -> list[dict]:14 """Retrieve relevant documents."""15 results = self.collection.query(16 query_texts=[query],17 n_results=n_results18 )19 return results["documents"][0]20
21 def format_context(self, documents: list[str]) -> str:22 """Format retrieved documents as context."""23 context_parts = []24 for i, doc in enumerate(documents, 1):25 context_parts.append(f"[Kilde {i}]\n{doc}")26 return "\n\n---\n\n".join(context_parts)27
28 def generate(self, query: str, context: str) -> str:29 """Generate response with context."""30 message = client.messages.create(31 model="claude-sonnet-4-20250514",32 max_tokens=2048,33 system=self.system_prompt,34 messages=[35 {36 "role": "user",37 "content": f"""Kontekst:38{context}39
40---41
42Spoergsmaal: {query}43
44Svar baseret pa konteksten ovenfor:"""45 }46 ]47 )48 return message.content[0].text49
50 def query(self, question: str, n_results: int = 5) -> dict:51 """Full RAG query: retrieve + generate."""52 # Retrieve relevant documents53 documents = self.retrieve(question, n_results)54
55 # Format context56 context = self.format_context(documents)57
58 # Generate response59 response = self.generate(question, context)60
61 return {62 "question": question,63 "answer": response,64 "sources": documents,65 "n_sources": len(documents)66 }67
68# Usage69rag = RAGPipeline(collection)70result = rag.query("Hvad er fordelene ved Python?")71print(result["answer"])Step 5: Production Improvements
Hybrid Search
Kombiner vector search med keyword search (BM25) for bedre resultater:
1from rank_bm25 import BM25Okapi2import numpy as np3
4class HybridSearch:5 def __init__(self, collection, documents: list[str]):6 self.collection = collection7 self.documents = documents8 # Tokenize documents for BM259 tokenized = [doc.lower().split() for doc in documents]10 self.bm25 = BM25Okapi(tokenized)11
12 def search(13 self,14 query: str,15 n_results: int = 5,16 alpha: float = 0.5 # Balance between vector and keyword17 ) -> list[dict]:18 # Vector search19 vector_results = self.collection.query(20 query_texts=[query],21 n_results=n_results * 2 # Get more for merging22 )23 vector_scores = {24 doc: 1 - dist25 for doc, dist in zip(26 vector_results["documents"][0],27 vector_results["distances"][0]28 )29 }30
31 # BM25 keyword search32 tokenized_query = query.lower().split()33 bm25_scores = self.bm25.get_scores(tokenized_query)34 bm25_scores = bm25_scores / (bm25_scores.max() + 1e-6) # Normalize35
36 # Combine scores37 combined = {}38 for doc in vector_scores:39 combined[doc] = alpha * vector_scores.get(doc, 0)40
41 for i, doc in enumerate(self.documents):42 if doc in combined:43 combined[doc] += (1 - alpha) * bm25_scores[i]44 else:45 combined[doc] = (1 - alpha) * bm25_scores[i]46
47 # Sort and return top results48 sorted_docs = sorted(combined.items(), key=lambda x: x[1], reverse=True)49 return [{"content": doc, "score": score} for doc, score in sorted_docs[:n_results]]Reranking
Brug en reranker model til at forbedre relevansen af retrieved documents:
1from sentence_transformers import CrossEncoder2
3class Reranker:4 def __init__(self, model_name: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"):5 self.model = CrossEncoder(model_name)6
7 def rerank(8 self,9 query: str,10 documents: list[str],11 top_k: int = 512 ) -> list[dict]:13 # Create query-document pairs14 pairs = [[query, doc] for doc in documents]15
16 # Score all pairs17 scores = self.model.predict(pairs)18
19 # Sort by score20 doc_scores = list(zip(documents, scores))21 doc_scores.sort(key=lambda x: x[1], reverse=True)22
23 return [24 {"content": doc, "score": float(score)}25 for doc, score in doc_scores[:top_k]26 ]27
28# Usage in RAG pipeline29def retrieve_and_rerank(query: str, n_initial: int = 20, n_final: int = 5):30 # Get initial candidates31 initial_docs = collection.query(query_texts=[query], n_results=n_initial)32
33 # Rerank34 reranker = Reranker()35 reranked = reranker.rerank(query, initial_docs["documents"][0], top_k=n_final)36
37 return [doc["content"] for doc in reranked]Query Expansion
Generer flere queries for bedre recall:
1def expand_query(query: str) -> list[str]:2 """Generate alternative queries using Claude."""3 message = client.messages.create(4 model="claude-sonnet-4-20250514",5 max_tokens=500,6 messages=[{7 "role": "user",8 "content": f"""Generer 3 alternative formuleringer af dette spoergsmaal.9Behold den samme betydning, men brug forskellige ord.10
11Original: {query}12
13Alternativer (en per linje):"""14 }]15 )16
17 alternatives = message.content[0].text.strip().split("\n")18 return [query] + [alt.strip("- ") for alt in alternatives if alt.strip()]19
20# Usage21queries = expand_query("Hvad er fordelene ved microservices?")22# ["Hvad er fordelene ved microservices?",23# "Hvilke benefits giver microservice arkitektur?",24# "Hvorfor vaelge microservices over monolith?",25# "Hvad er styrken ved at bruge microservices?"]26
27# Search with all queries and merge results28all_results = []29for q in queries:30 results = collection.query(query_texts=[q], n_results=3)31 all_results.extend(results["documents"][0])32
33# Deduplicate34unique_results = list(dict.fromkeys(all_results))Evaluation
Det er kritisk at evaluere din RAG-pipeline. Her er simple metrics:
1from dataclasses import dataclass2
3@dataclass4class EvalResult:5 query: str6 expected_answer: str7 actual_answer: str8 retrieved_docs: list[str]9 relevant_docs_found: int10 precision: float11 answer_relevance: float12
13def evaluate_retrieval(14 query: str,15 retrieved: list[str],16 relevant: list[str]17) -> float:18 """Calculate precision@k for retrieval."""19 relevant_set = set(relevant)20 found = sum(1 for doc in retrieved if doc in relevant_set)21 return found / len(retrieved) if retrieved else 022
23def evaluate_answer(24 question: str,25 expected: str,26 actual: str27) -> float:28 """Use LLM to evaluate answer quality (0-1)."""29 response = client.messages.create(30 model="claude-sonnet-4-20250514",31 max_tokens=100,32 messages=[{33 "role": "user",34 "content": f"""Sammenlign disse to svar pa sporgsmaalet.35Giv en score fra 0.0 til 1.0 for hvor godt det faktiske svar matcher det forventede.36
37Spoergsmaal: {question}38
39Forventet svar: {expected}40
41Faktisk svar: {actual}42
43Score (kun tal):"""44 }]45 )46 try:47 return float(response.content[0].text.strip())48 except ValueError:49 return 0.050
51# Run evaluation52def run_eval(test_cases: list[dict], rag_pipeline) -> list[EvalResult]:53 results = []54 for case in test_cases:55 result = rag_pipeline.query(case["query"])56
57 results.append(EvalResult(58 query=case["query"],59 expected_answer=case["expected"],60 actual_answer=result["answer"],61 retrieved_docs=result["sources"],62 relevant_docs_found=len(set(result["sources"]) & set(case.get("relevant_docs", []))),63 precision=evaluate_retrieval(case["query"], result["sources"], case.get("relevant_docs", [])),64 answer_relevance=evaluate_answer(case["query"], case["expected"], result["answer"])65 ))66
67 return resultsCommon Pitfalls
- For smaa chunks - Mister kontekst. Start med 300-500 tokens.
- For store chunks - Irrelevant indhold fortynder svaret.
- Ingen overlap - Information kan splittes mellem chunks.
- Forkert embedding model - Brug en model traenet til dit sprog.
- For faa results - Hent flere docs og rerank.
- Ingen metadata - Gem kilde, dato, etc. for filtering.
Naeste skridt
Denne guide giver dig fundamentet for en RAG-pipeline. For at gaa videre kan du:
- Implementer hybrid search for bedre retrieval
- Tilfoej reranking for hoejere precision
- Brug LangChain eller LlamaIndex for hurtigere iteration
- Eksperimenter med forskellige chunking strategier