Veni AI
Database Technologies

Vector Databases and Embedding Search Systems

A comprehensive guide on the technical architecture of vector databases, embedding search algorithms, HNSW, IVF indexing, and enterprise semantic search applications.

Veni AI Technical TeamJanuary 12, 20256 min read
Vector Databases and Embedding Search Systems

Vector Databases and Embedding Search Systems

Vector databases are specialized databases optimized for storing and performing similarity searches on high-dimensional vectors. They are the fundamental component of modern AI applications, particularly RAG (Retrieval-Augmented Generation) systems.

What is a Vector Database?

While traditional databases are optimized for exact match queries, vector databases focus on Approximate Nearest Neighbor (ANN) searches.

Core Concepts

Embedding: A numerical vector representation of data (text, image, audio).

"Artificial intelligence" → [0.12, -0.45, 0.89, ..., 0.34] (e.g., 1536 dimensions)

Similarity Search: Finding the vectors closest to a query vector.

query_vector → Top-K most similar vectors

Distance Metrics:

  • Cosine Similarity: Directional similarity.
  • Euclidean Distance (L2): Geometric distance.
  • Dot Product: Inner product of vectors.

Similarity Metrics: Detailed Analysis

Cosine Similarity

cos(A, B) = (A · B) / (||A|| × ||B||)

Value Range: [-1, 1]

  • 1: Same direction (identical).
  • 0: Orthogonal (unrelated).
  • -1: Opposite direction.

Use Case: Text similarity, semantic search.

Euclidean Distance (L2)

d(A, B) = √(Σ(Aᵢ - Bᵢ)²)

Value Range: [0, ∞) Use Case: Image similarity, clustering.

Dot Product

A · B = Σ(Aᵢ × Bᵢ)

Use Case: Equivalent to cosine for normalized embeddings.

Indexing Algorithms

1. Brute Force (Flat Index)

Comparing the query against every single vector in the database.

Complexity: O(n × d)

  • n: Number of vectors.
  • d: Dimension.

Advantage: 100% accuracy. Disadvantage: Very slow for large datasets.

2. IVF (Inverted File Index)

Narrowing the search space by dividing vectors into clusters.

Algorithm:

  1. Create centroids using K-means.
  2. Assign each vector to its nearest centroid.
  3. During search, only look within the nearest nprobe clusters.
1Parameters: 2- nlist: Number of clusters (typically √n) 3- nprobe: Number of clusters to search 4 5Trade-off: Higher nprobe → higher accuracy, lower speed.

3. HNSW (Hierarchical Navigable Small World)

A graph-based approach and the most popular method today.

Structure:

1Layer 2: o-------o-------o (sparse) 2 | | | 3Layer 1: o-o-o---o-o-o---o-o-o (medium) 4 | | | | | | | | | 5Layer 0: o-o-o-o-o-o-o-o-o-o-o-o (dense)

Parameters:

  • M: Maximum number of connections for each node.
  • ef_construction: Number of candidates during index building.
  • ef_search: Number of candidates during query execution.

Advantages:

  • Extremely fast search: O(log n).
  • High recall rates.
  • Supports dynamic insert/delete.

4. Product Quantization (PQ)

Reducing memory usage by compressing vectors.

Method:

  1. Split the vector into M sub-vectors.
  2. Map each sub-vector to one of K centroids.
  3. Store centroid IDs instead of the original vector components.
1Original: 1536 dim × 4 bytes = 6KB 2PQ (M=96, K=256): 96 × 1 byte = 96 bytes 3Compression: ~64x

5. Scalar Quantization (SQ)

Converting Float32 representations to Int8.

1Original: 1536 × 4 bytes = 6KB 2SQ8: 1536 × 1 byte = 1.5KB 3Compression: 4x

Popular Vector Databases Comparison

Pinecone

Features:

  • Fully managed cloud service.
  • Automatic scaling.
  • Metadata filtering.
  • Namespace isolation.

Usage:

1import pinecone 2 3pinecone.init(api_key="xxx", environment="us-west1-gcp") 4index = pinecone.Index("my-index") 5 6# Upsert 7index.upsert(vectors=[ 8 {"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "tech"}} 9]) 10 11# Query 12results = index.query(vector=[0.1, 0.2, ...], top_k=10, filter={"category": "tech"})

Weaviate

Features:

  • Open source.
  • Built-in vectorization.
  • GraphQL API support.
  • Hybrid search (vector + keyword) capability.

Qdrant

Features:

  • Written in Rust for high performance.
  • Rich filtering options.
  • Payload indexing.
  • Distributed deployment support.

Milvus

Features:

  • GPU acceleration.
  • Multi-vector search.
  • Time travel (versioning).
  • Kubernetes native architecture.

ChromaDB

Features:

  • Developer-friendly and easy to setup.
  • In-memory + persistent modes.
  • Python-first approach.
  • Ideal for prototyping.

Comparison Table

FeaturePineconeWeaviateQdrantMilvus
HostingCloudBothBothBoth
ScalabilityAutoManualManualAuto
Hybrid Search
GPU Support--
PricingPer vectorFree/PaidFree/PaidFree/Paid

Filtering and Metadata

Pre-filtering vs Post-filtering

Pre-filtering:

  1. Apply metadata filter first.
  2. Perform vector search within the filtered set.
  • Advantage: Faster.
  • Disadvantage: Potential recall loss.

Post-filtering:

  1. Find Top-K × multiplier results via vector search.
  2. Apply metadata filter to these results.
  3. Return the final top K.
  • Advantage: Better recall.
  • Disadvantage: Slower performance.

Hybrid Search

Combining Keyword (BM25) + Vector search:

final_score = α × vector_score + (1-α) × keyword_score

Performance Optimization

Index Parameters

Optimal HNSW Settings:

1High Recall: M=32, ef=200 2High Speed: M=16, ef=50 3Balanced: M=24, ef=100

Batch Processing

1# Poor: Singular insert 2for vec in vectors: 3 index.upsert([vec]) 4 5# Good: Batch insert 6index.upsert(vectors, batch_size=100)

Connection Pooling

1from pinecone import Pinecone 2 3pc = Pinecone( 4 api_key="xxx", 5 pool_threads=30 # Parallel connections 6)

Enterprise Architecture Example

1┌─────────────────────────────────────────────────────┐ 2│ Application │ 3└──────────────────────┬──────────────────────────────┘ 45┌──────────────────────▼──────────────────────────────┐ 6│ Vector Search Service │ 7│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ 8│ │ Query │ │ Reranker │ │ Cache │ │ 9│ │ Engine │ │ Service │ │ (Redis) │ │ 10│ └─────────────┘ └─────────────┘ └─────────────┘ │ 11└──────────────────────┬──────────────────────────────┘ 1213┌──────────────────────▼──────────────────────────────┐ 14│ Vector Database Cluster │ 15│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ 16│ │ Shard 1 │ │ Shard 2 │ │ Shard 3 │ │ 17│ └─────────┘ └─────────┘ └─────────┘ │ 18└─────────────────────────────────────────────────────┘

Monitoring and Observability

Key Metrics

  • Query Latency (p50, p95, p99)
  • Recall Rate
  • QPS (Queries Per Second)
  • Index Size
  • Memory Usage

Alerting Thresholds

1alerts: 2 - name: high_latency 3 condition: p99_latency > 200ms 4 severity: warning 5 6 - name: low_recall 7 condition: recall < 0.9 8 severity: critical

Conclusion

Vector databases are indispensable components of modern AI applications. With the right choice of database, indexing strategy, and optimizations, you can build high-performance semantic search systems.

At Veni AI, we offer enterprise vector search solutions. Contact us for your requirements.

İlgili Makaleler