Context Window Yönetimi ve Long Context Stratejileri
Context window, bir LLM'in aynı anda işleyebildiği maksimum token sayısıdır. Etkili context yönetimi, AI uygulamalarının performansını doğrudan etkiler.
Context Window Limitleri
Model Karşılaştırması
| Model | Context Length | ~Kelime |
|---|---|---|
| GPT-3.5 Turbo | 16K | 12,000 |
| GPT Turbo | 128K | 96,000 |
| Claude 3 Opus | 200K | 150,000 |
| Gemini 1.5 Pro | 1M | 750,000 |
| Llama 3 | 8K-128K | 6-96K |
Token Hesaplama
1import tiktoken 2 3def count_tokens(text: str, model: str = "gpt-4") -> int: 4 encoding = tiktoken.encoding_for_model(model) 5 return len(encoding.encode(text)) 6 7def estimate_tokens(text: str) -> int: 8 # Hızlı tahmin: ~4 karakter = 1 token (İngilizce) 9 # Türkçe için: ~3 karakter = 1 token 10 return len(text) // 3
Chunking Stratejileri
Fixed-Size Chunking
1def fixed_size_chunk(text: str, chunk_size: int = 1000, overlap: int = 200) -> list: 2 chunks = [] 3 start = 0 4 5 while start < len(text): 6 end = start + chunk_size 7 chunk = text[start:end] 8 chunks.append(chunk) 9 start = end - overlap 10 11 return chunks
Semantic Chunking
1from langchain.text_splitter import RecursiveCharacterTextSplitter 2 3def semantic_chunk(text: str, chunk_size: int = 1000) -> list: 4 splitter = RecursiveCharacterTextSplitter( 5 chunk_size=chunk_size, 6 chunk_overlap=200, 7 separators=["\n\n", "\n", ". ", " ", ""], 8 length_function=len 9 ) 10 11 return splitter.split_text(text)
Document Structure Aware
1def structure_aware_chunk(document: str) -> list: 2 chunks = [] 3 current_section = "" 4 current_header = "" 5 6 for line in document.split("\n"): 7 # Header detection 8 if line.startswith("#"): 9 if current_section: 10 chunks.append({ 11 "header": current_header, 12 "content": current_section.strip() 13 }) 14 current_header = line 15 current_section = "" 16 else: 17 current_section += line + "\n" 18 19 if current_section: 20 chunks.append({ 21 "header": current_header, 22 "content": current_section.strip() 23 }) 24 25 return chunks
Context Compression
Summarization
1def compress_context(text: str, max_tokens: int = 2000) -> str: 2 current_tokens = count_tokens(text) 3 4 if current_tokens <= max_tokens: 5 return text 6 7 # LLM ile özetleme 8 response = client.chat.completions.create( 9 model="gpt-4-turbo", 10 messages=[ 11 { 12 "role": "system", 13 "content": f"Aşağıdaki metni {max_tokens} token altında özetle. " 14 "Önemli bilgileri koru." 15 }, 16 {"role": "user", "content": text} 17 ] 18 ) 19 20 return response.choices[0].message.content
Extractive Compression
1from sklearn.feature_extraction.text import TfidfVectorizer 2import numpy as np 3 4def extractive_compress(text: str, ratio: float = 0.3) -> str: 5 sentences = text.split(". ") 6 7 # TF-IDF ile önemli cümleleri bul 8 vectorizer = TfidfVectorizer() 9 tfidf_matrix = vectorizer.fit_transform(sentences) 10 11 # Her cümlenin önem skoru 12 scores = np.array(tfidf_matrix.sum(axis=1)).flatten() 13 14 # En önemli cümleleri seç 15 num_sentences = max(1, int(len(sentences) * ratio)) 16 top_indices = np.argsort(scores)[-num_sentences:] 17 top_indices = sorted(top_indices) # Sırayı koru 18 19 return ". ".join([sentences[i] for i in top_indices])
Sliding Window
Conversation History Management
1class SlidingWindowMemory: 2 def __init__(self, max_tokens: int = 4000): 3 self.max_tokens = max_tokens 4 self.messages = [] 5 6 def add_message(self, role: str, content: str): 7 self.messages.append({"role": role, "content": content}) 8 self._trim() 9 10 def _trim(self): 11 while self._total_tokens() > self.max_tokens and len(self.messages) > 2: 12 # System message'ı koru, en eski user/assistant'ı sil 13 if self.messages[0]["role"] == "system": 14 self.messages.pop(1) 15 else: 16 self.messages.pop(0) 17 18 def _total_tokens(self) -> int: 19 return sum(count_tokens(m["content"]) for m in self.messages) 20 21 def get_messages(self) -> list: 22 return self.messages.copy()
Document Processing Window
1def process_long_document(document: str, query: str, window_size: int = 8000): 2 chunks = semantic_chunk(document, chunk_size=window_size) 3 results = [] 4 5 for i, chunk in enumerate(chunks): 6 response = client.chat.completions.create( 7 model="gpt-4-turbo", 8 messages=[ 9 { 10 "role": "system", 11 "content": "Verilen metin parçasını analiz et." 12 }, 13 { 14 "role": "user", 15 "content": f"Metin:\n{chunk}\n\nSoru: {query}" 16 } 17 ] 18 ) 19 20 results.append({ 21 "chunk_index": i, 22 "response": response.choices[0].message.content 23 }) 24 25 # Sonuçları birleştir 26 return synthesize_results(results, query)
Map-Reduce Pattern
Long Document QA
1def map_reduce_qa(document: str, question: str): 2 chunks = semantic_chunk(document, chunk_size=4000) 3 4 # Map: Her chunk için ayrı analiz 5 partial_answers = [] 6 for chunk in chunks: 7 response = client.chat.completions.create( 8 model="gpt-4-turbo", 9 messages=[ 10 { 11 "role": "user", 12 "content": f"Metin:\n{chunk}\n\nSoru: {question}\n\n" 13 "Bu metin parçasına göre cevapla. " 14 "Bilgi yoksa 'Bu parçada bilgi yok' de." 15 } 16 ] 17 ) 18 partial_answers.append(response.choices[0].message.content) 19 20 # Reduce: Cevapları birleştir 21 combined = "\n\n".join([ 22 f"Kaynak {i+1}: {ans}" 23 for i, ans in enumerate(partial_answers) 24 ]) 25 26 final_response = client.chat.completions.create( 27 model="gpt-4-turbo", 28 messages=[ 29 { 30 "role": "user", 31 "content": f"Farklı kaynaklardan gelen bilgiler:\n{combined}\n\n" 32 f"Soru: {question}\n\n" 33 "Tüm bilgileri sentezleyerek kapsamlı bir cevap ver." 34 } 35 ] 36 ) 37 38 return final_response.choices[0].message.content
Retrieval Augmented Context
Smart Context Selection
1def select_relevant_context(query: str, documents: list, max_tokens: int = 4000): 2 # Embedding-based relevance 3 query_embedding = get_embedding(query) 4 5 scored_docs = [] 6 for doc in documents: 7 doc_embedding = get_embedding(doc["content"]) 8 score = cosine_similarity(query_embedding, doc_embedding) 9 scored_docs.append({"doc": doc, "score": score}) 10 11 # Sort by relevance 12 scored_docs.sort(key=lambda x: x["score"], reverse=True) 13 14 # Token limit'e kadar ekle 15 selected = [] 16 current_tokens = 0 17 18 for item in scored_docs: 19 doc_tokens = count_tokens(item["doc"]["content"]) 20 if current_tokens + doc_tokens <= max_tokens: 21 selected.append(item["doc"]) 22 current_tokens += doc_tokens 23 else: 24 break 25 26 return selected
Long Context Best Practices
1. Prompt Positioning
1def optimize_prompt_position(context: str, query: str) -> str: 2 """Önemli bilgileri başa ve sona koy (Lost in the Middle)""" 3 4 chunks = semantic_chunk(context) 5 6 # İlk ve son chunk'ları koru 7 if len(chunks) > 2: 8 middle = chunks[1:-1] 9 compressed_middle = compress_context(" ".join(middle)) 10 context = f"{chunks[0]}\n\n{compressed_middle}\n\n{chunks[-1]}" 11 12 return f"Bağlam:\n{context}\n\n---\n\nSoru: {query}"
2. Hierarchical Processing
1def hierarchical_summarize(document: str, levels: int = 2): 2 """Hiyerarşik özetleme""" 3 4 current_text = document 5 6 for level in range(levels): 7 chunks = semantic_chunk(current_text, chunk_size=4000) 8 9 summaries = [] 10 for chunk in chunks: 11 summary = compress_context(chunk, max_tokens=500) 12 summaries.append(summary) 13 14 current_text = "\n\n".join(summaries) 15 16 return current_text
3. Attention Sinks
1def add_attention_anchors(prompt: str) -> str: 2 """Attention anchor'ları ekle""" 3 4 return f""" 5[IMPORTANT START] 6{prompt[:500]} 7[/IMPORTANT] 8 9{prompt[500:-500]} 10 11[IMPORTANT END] 12{prompt[-500:]} 13[/IMPORTANT] 14"""
Monitoring ve Debugging
1class ContextMonitor: 2 def __init__(self): 3 self.logs = [] 4 5 def log_request(self, messages: list, model: str): 6 total_tokens = sum(count_tokens(m["content"]) for m in messages) 7 8 self.logs.append({ 9 "timestamp": datetime.now(), 10 "model": model, 11 "input_tokens": total_tokens, 12 "message_count": len(messages) 13 }) 14 15 # Uyarılar 16 if total_tokens > 100000: 17 print(f"⚠️ High token count: {total_tokens}") 18 19 def get_stats(self): 20 return { 21 "avg_tokens": np.mean([l["input_tokens"] for l in self.logs]), 22 "max_tokens": max(l["input_tokens"] for l in self.logs), 23 "total_requests": len(self.logs) 24 }
Sonuç
Context window yönetimi, LLM uygulamalarının ölçeklenebilirliği ve maliyeti için kritik öneme sahiptir. Chunking, compression ve smart retrieval stratejileri ile uzun belgelerle etkili çalışabilirsiniz.
Veni AI olarak, long context AI çözümleri geliştiriyoruz.
