Vector Databases and Embedding Search Systems
Vector databases are specialized databases optimized for storing and performing similarity searches on high-dimensional vectors. They are the fundamental component of modern AI applications, particularly RAG (Retrieval-Augmented Generation) systems.
What is a Vector Database?
While traditional databases are optimized for exact match queries, vector databases focus on Approximate Nearest Neighbor (ANN) searches.
Core Concepts
Embedding: A numerical vector representation of data (text, image, audio).
"Artificial intelligence" → [0.12, -0.45, 0.89, ..., 0.34] (e.g., 1536 dimensions)
Similarity Search: Finding the vectors closest to a query vector.
query_vector → Top-K most similar vectors
Distance Metrics:
- Cosine Similarity: Directional similarity.
- Euclidean Distance (L2): Geometric distance.
- Dot Product: Inner product of vectors.
Similarity Metrics: Detailed Analysis
Cosine Similarity
cos(A, B) = (A · B) / (||A|| × ||B||)
Value Range: [-1, 1]
- 1: Same direction (identical).
- 0: Orthogonal (unrelated).
- -1: Opposite direction.
Use Case: Text similarity, semantic search.
Euclidean Distance (L2)
d(A, B) = √(Σ(Aᵢ - Bᵢ)²)
Value Range: [0, ∞) Use Case: Image similarity, clustering.
Dot Product
A · B = Σ(Aᵢ × Bᵢ)
Use Case: Equivalent to cosine for normalized embeddings.
Indexing Algorithms
1. Brute Force (Flat Index)
Comparing the query against every single vector in the database.
Complexity: O(n × d)
- n: Number of vectors.
- d: Dimension.
Advantage: 100% accuracy. Disadvantage: Very slow for large datasets.
2. IVF (Inverted File Index)
Narrowing the search space by dividing vectors into clusters.
Algorithm:
- Create centroids using K-means.
- Assign each vector to its nearest centroid.
- During search, only look within the nearest nprobe clusters.
1Parameters: 2- nlist: Number of clusters (typically √n) 3- nprobe: Number of clusters to search 4 5Trade-off: Higher nprobe → higher accuracy, lower speed.
3. HNSW (Hierarchical Navigable Small World)
A graph-based approach and the most popular method today.
Structure:
1Layer 2: o-------o-------o (sparse) 2 | | | 3Layer 1: o-o-o---o-o-o---o-o-o (medium) 4 | | | | | | | | | 5Layer 0: o-o-o-o-o-o-o-o-o-o-o-o (dense)
Parameters:
- M: Maximum number of connections for each node.
- ef_construction: Number of candidates during index building.
- ef_search: Number of candidates during query execution.
Advantages:
- Extremely fast search: O(log n).
- High recall rates.
- Supports dynamic insert/delete.
4. Product Quantization (PQ)
Reducing memory usage by compressing vectors.
Method:
- Split the vector into M sub-vectors.
- Map each sub-vector to one of K centroids.
- Store centroid IDs instead of the original vector components.
1Original: 1536 dim × 4 bytes = 6KB 2PQ (M=96, K=256): 96 × 1 byte = 96 bytes 3Compression: ~64x
5. Scalar Quantization (SQ)
Converting Float32 representations to Int8.
1Original: 1536 × 4 bytes = 6KB 2SQ8: 1536 × 1 byte = 1.5KB 3Compression: 4x
Popular Vector Databases Comparison
Pinecone
Features:
- Fully managed cloud service.
- Automatic scaling.
- Metadata filtering.
- Namespace isolation.
Usage:
1import pinecone 2 3pinecone.init(api_key="xxx", environment="us-west1-gcp") 4index = pinecone.Index("my-index") 5 6# Upsert 7index.upsert(vectors=[ 8 {"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "tech"}} 9]) 10 11# Query 12results = index.query(vector=[0.1, 0.2, ...], top_k=10, filter={"category": "tech"})
Weaviate
Features:
- Open source.
- Built-in vectorization.
- GraphQL API support.
- Hybrid search (vector + keyword) capability.
Qdrant
Features:
- Written in Rust for high performance.
- Rich filtering options.
- Payload indexing.
- Distributed deployment support.
Milvus
Features:
- GPU acceleration.
- Multi-vector search.
- Time travel (versioning).
- Kubernetes native architecture.
ChromaDB
Features:
- Developer-friendly and easy to setup.
- In-memory + persistent modes.
- Python-first approach.
- Ideal for prototyping.
Comparison Table
| Feature | Pinecone | Weaviate | Qdrant | Milvus |
|---|---|---|---|---|
| Hosting | Cloud | Both | Both | Both |
| Scalability | Auto | Manual | Manual | Auto |
| Hybrid Search | ✓ | ✓ | ✓ | ✓ |
| GPU Support | - | - | ✓ | ✓ |
| Pricing | Per vector | Free/Paid | Free/Paid | Free/Paid |
Filtering and Metadata
Pre-filtering vs Post-filtering
Pre-filtering:
- Apply metadata filter first.
- Perform vector search within the filtered set.
- Advantage: Faster.
- Disadvantage: Potential recall loss.
Post-filtering:
- Find Top-K × multiplier results via vector search.
- Apply metadata filter to these results.
- Return the final top K.
- Advantage: Better recall.
- Disadvantage: Slower performance.
Hybrid Search
Combining Keyword (BM25) + Vector search:
final_score = α × vector_score + (1-α) × keyword_score
Performance Optimization
Index Parameters
Optimal HNSW Settings:
1High Recall: M=32, ef=200 2High Speed: M=16, ef=50 3Balanced: M=24, ef=100
Batch Processing
1# Poor: Singular insert 2for vec in vectors: 3 index.upsert([vec]) 4 5# Good: Batch insert 6index.upsert(vectors, batch_size=100)
Connection Pooling
1from pinecone import Pinecone 2 3pc = Pinecone( 4 api_key="xxx", 5 pool_threads=30 # Parallel connections 6)
Enterprise Architecture Example
1┌─────────────────────────────────────────────────────┐ 2│ Application │ 3└──────────────────────┬──────────────────────────────┘ 4 │ 5┌──────────────────────▼──────────────────────────────┐ 6│ Vector Search Service │ 7│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ 8│ │ Query │ │ Reranker │ │ Cache │ │ 9│ │ Engine │ │ Service │ │ (Redis) │ │ 10│ └─────────────┘ └─────────────┘ └─────────────┘ │ 11└──────────────────────┬──────────────────────────────┘ 12 │ 13┌──────────────────────▼──────────────────────────────┐ 14│ Vector Database Cluster │ 15│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ 16│ │ Shard 1 │ │ Shard 2 │ │ Shard 3 │ │ 17│ └─────────┘ └─────────┘ └─────────┘ │ 18└─────────────────────────────────────────────────────┘
Monitoring and Observability
Key Metrics
- Query Latency (p50, p95, p99)
- Recall Rate
- QPS (Queries Per Second)
- Index Size
- Memory Usage
Alerting Thresholds
1alerts: 2 - name: high_latency 3 condition: p99_latency > 200ms 4 severity: warning 5 6 - name: low_recall 7 condition: recall < 0.9 8 severity: critical
Conclusion
Vector databases are indispensable components of modern AI applications. With the right choice of database, indexing strategy, and optimizations, you can build high-performance semantic search systems.
At Veni AI, we offer enterprise vector search solutions. Contact us for your requirements.
