Veni AI
LLM Optimizasyon

Context Window Yönetimi ve Long Context Stratejileri

LLM context window limitleri, long context handling, chunking stratejileri, summarization ve context compression teknikleri rehberi.

Veni AI Teknik Ekibi30 Aralık 20245 dk okuma

Reference Overview

FieldValueSource
Canonical Path/blog/context-window-yonetimi-long-context-stratejileriVeni AI Blog
Primary CategoryLLM OptimizasyonPost Metadata
AuthorVeni AI Teknik EkibiPost Metadata
Context Window Yönetimi ve Long Context Stratejileri

Context Window Yönetimi ve Long Context Stratejileri

Context window, bir LLM'in aynı anda işleyebildiği maksimum token sayısıdır. Etkili context yönetimi, AI uygulamalarının performansını doğrudan etkiler.

Context Window Limitleri

Model Karşılaştırması

ModelContext Length~Kelime
GPT-3.5 Turbo16K12,000
GPT Turbo128K96,000
Claude 3 Opus200K150,000
Gemini 1.5 Pro1M750,000
Llama 38K-128K6-96K

Token Hesaplama

1import tiktoken 2 3def count_tokens(text: str, model: str = "gpt-4") -> int: 4 encoding = tiktoken.encoding_for_model(model) 5 return len(encoding.encode(text)) 6 7def estimate_tokens(text: str) -> int: 8 # Hızlı tahmin: ~4 karakter = 1 token (İngilizce) 9 # Türkçe için: ~3 karakter = 1 token 10 return len(text) // 3

Chunking Stratejileri

Fixed-Size Chunking

1def fixed_size_chunk(text: str, chunk_size: int = 1000, overlap: int = 200) -> list: 2 chunks = [] 3 start = 0 4 5 while start < len(text): 6 end = start + chunk_size 7 chunk = text[start:end] 8 chunks.append(chunk) 9 start = end - overlap 10 11 return chunks

Semantic Chunking

1from langchain.text_splitter import RecursiveCharacterTextSplitter 2 3def semantic_chunk(text: str, chunk_size: int = 1000) -> list: 4 splitter = RecursiveCharacterTextSplitter( 5 chunk_size=chunk_size, 6 chunk_overlap=200, 7 separators=["\n\n", "\n", ". ", " ", ""], 8 length_function=len 9 ) 10 11 return splitter.split_text(text)

Document Structure Aware

1def structure_aware_chunk(document: str) -> list: 2 chunks = [] 3 current_section = "" 4 current_header = "" 5 6 for line in document.split("\n"): 7 # Header detection 8 if line.startswith("#"): 9 if current_section: 10 chunks.append({ 11 "header": current_header, 12 "content": current_section.strip() 13 }) 14 current_header = line 15 current_section = "" 16 else: 17 current_section += line + "\n" 18 19 if current_section: 20 chunks.append({ 21 "header": current_header, 22 "content": current_section.strip() 23 }) 24 25 return chunks

Context Compression

Summarization

1def compress_context(text: str, max_tokens: int = 2000) -> str: 2 current_tokens = count_tokens(text) 3 4 if current_tokens <= max_tokens: 5 return text 6 7 # LLM ile özetleme 8 response = client.chat.completions.create( 9 model="gpt-4-turbo", 10 messages=[ 11 { 12 "role": "system", 13 "content": f"Aşağıdaki metni {max_tokens} token altında özetle. " 14 "Önemli bilgileri koru." 15 }, 16 {"role": "user", "content": text} 17 ] 18 ) 19 20 return response.choices[0].message.content

Extractive Compression

1from sklearn.feature_extraction.text import TfidfVectorizer 2import numpy as np 3 4def extractive_compress(text: str, ratio: float = 0.3) -> str: 5 sentences = text.split(". ") 6 7 # TF-IDF ile önemli cümleleri bul 8 vectorizer = TfidfVectorizer() 9 tfidf_matrix = vectorizer.fit_transform(sentences) 10 11 # Her cümlenin önem skoru 12 scores = np.array(tfidf_matrix.sum(axis=1)).flatten() 13 14 # En önemli cümleleri seç 15 num_sentences = max(1, int(len(sentences) * ratio)) 16 top_indices = np.argsort(scores)[-num_sentences:] 17 top_indices = sorted(top_indices) # Sırayı koru 18 19 return ". ".join([sentences[i] for i in top_indices])

Sliding Window

Conversation History Management

1class SlidingWindowMemory: 2 def __init__(self, max_tokens: int = 4000): 3 self.max_tokens = max_tokens 4 self.messages = [] 5 6 def add_message(self, role: str, content: str): 7 self.messages.append({"role": role, "content": content}) 8 self._trim() 9 10 def _trim(self): 11 while self._total_tokens() > self.max_tokens and len(self.messages) > 2: 12 # System message'ı koru, en eski user/assistant'ı sil 13 if self.messages[0]["role"] == "system": 14 self.messages.pop(1) 15 else: 16 self.messages.pop(0) 17 18 def _total_tokens(self) -> int: 19 return sum(count_tokens(m["content"]) for m in self.messages) 20 21 def get_messages(self) -> list: 22 return self.messages.copy()

Document Processing Window

1def process_long_document(document: str, query: str, window_size: int = 8000): 2 chunks = semantic_chunk(document, chunk_size=window_size) 3 results = [] 4 5 for i, chunk in enumerate(chunks): 6 response = client.chat.completions.create( 7 model="gpt-4-turbo", 8 messages=[ 9 { 10 "role": "system", 11 "content": "Verilen metin parçasını analiz et." 12 }, 13 { 14 "role": "user", 15 "content": f"Metin:\n{chunk}\n\nSoru: {query}" 16 } 17 ] 18 ) 19 20 results.append({ 21 "chunk_index": i, 22 "response": response.choices[0].message.content 23 }) 24 25 # Sonuçları birleştir 26 return synthesize_results(results, query)

Map-Reduce Pattern

Long Document QA

1def map_reduce_qa(document: str, question: str): 2 chunks = semantic_chunk(document, chunk_size=4000) 3 4 # Map: Her chunk için ayrı analiz 5 partial_answers = [] 6 for chunk in chunks: 7 response = client.chat.completions.create( 8 model="gpt-4-turbo", 9 messages=[ 10 { 11 "role": "user", 12 "content": f"Metin:\n{chunk}\n\nSoru: {question}\n\n" 13 "Bu metin parçasına göre cevapla. " 14 "Bilgi yoksa 'Bu parçada bilgi yok' de." 15 } 16 ] 17 ) 18 partial_answers.append(response.choices[0].message.content) 19 20 # Reduce: Cevapları birleştir 21 combined = "\n\n".join([ 22 f"Kaynak {i+1}: {ans}" 23 for i, ans in enumerate(partial_answers) 24 ]) 25 26 final_response = client.chat.completions.create( 27 model="gpt-4-turbo", 28 messages=[ 29 { 30 "role": "user", 31 "content": f"Farklı kaynaklardan gelen bilgiler:\n{combined}\n\n" 32 f"Soru: {question}\n\n" 33 "Tüm bilgileri sentezleyerek kapsamlı bir cevap ver." 34 } 35 ] 36 ) 37 38 return final_response.choices[0].message.content

Retrieval Augmented Context

Smart Context Selection

1def select_relevant_context(query: str, documents: list, max_tokens: int = 4000): 2 # Embedding-based relevance 3 query_embedding = get_embedding(query) 4 5 scored_docs = [] 6 for doc in documents: 7 doc_embedding = get_embedding(doc["content"]) 8 score = cosine_similarity(query_embedding, doc_embedding) 9 scored_docs.append({"doc": doc, "score": score}) 10 11 # Sort by relevance 12 scored_docs.sort(key=lambda x: x["score"], reverse=True) 13 14 # Token limit'e kadar ekle 15 selected = [] 16 current_tokens = 0 17 18 for item in scored_docs: 19 doc_tokens = count_tokens(item["doc"]["content"]) 20 if current_tokens + doc_tokens <= max_tokens: 21 selected.append(item["doc"]) 22 current_tokens += doc_tokens 23 else: 24 break 25 26 return selected

Long Context Best Practices

1. Prompt Positioning

1def optimize_prompt_position(context: str, query: str) -> str: 2 """Önemli bilgileri başa ve sona koy (Lost in the Middle)""" 3 4 chunks = semantic_chunk(context) 5 6 # İlk ve son chunk'ları koru 7 if len(chunks) > 2: 8 middle = chunks[1:-1] 9 compressed_middle = compress_context(" ".join(middle)) 10 context = f"{chunks[0]}\n\n{compressed_middle}\n\n{chunks[-1]}" 11 12 return f"Bağlam:\n{context}\n\n---\n\nSoru: {query}"

2. Hierarchical Processing

1def hierarchical_summarize(document: str, levels: int = 2): 2 """Hiyerarşik özetleme""" 3 4 current_text = document 5 6 for level in range(levels): 7 chunks = semantic_chunk(current_text, chunk_size=4000) 8 9 summaries = [] 10 for chunk in chunks: 11 summary = compress_context(chunk, max_tokens=500) 12 summaries.append(summary) 13 14 current_text = "\n\n".join(summaries) 15 16 return current_text

3. Attention Sinks

1def add_attention_anchors(prompt: str) -> str: 2 """Attention anchor'ları ekle""" 3 4 return f""" 5[IMPORTANT START] 6{prompt[:500]} 7[/IMPORTANT] 8 9{prompt[500:-500]} 10 11[IMPORTANT END] 12{prompt[-500:]} 13[/IMPORTANT] 14"""

Monitoring ve Debugging

1class ContextMonitor: 2 def __init__(self): 3 self.logs = [] 4 5 def log_request(self, messages: list, model: str): 6 total_tokens = sum(count_tokens(m["content"]) for m in messages) 7 8 self.logs.append({ 9 "timestamp": datetime.now(), 10 "model": model, 11 "input_tokens": total_tokens, 12 "message_count": len(messages) 13 }) 14 15 # Uyarılar 16 if total_tokens > 100000: 17 print(f"⚠️ High token count: {total_tokens}") 18 19 def get_stats(self): 20 return { 21 "avg_tokens": np.mean([l["input_tokens"] for l in self.logs]), 22 "max_tokens": max(l["input_tokens"] for l in self.logs), 23 "total_requests": len(self.logs) 24 }

Sonuç

Context window yönetimi, LLM uygulamalarının ölçeklenebilirliği ve maliyeti için kritik öneme sahiptir. Chunking, compression ve smart retrieval stratejileri ile uzun belgelerle etkili çalışabilirsiniz.

Veni AI olarak, long context AI çözümleri geliştiriyoruz.

İlgili Makaleler