Scaling Similarity: The Power of Approximate Nearest Neighbors (ANN)

At Palaimon, our dive into Approximate Nearest Neighbors (ANN) was born out of a real-world engineering challenge while improving autoVI, our automated visual inspection solution.

Inside autoVI, we often need to categorize and find visually similar defect types across massive datasets. When dealing with 100,000 to over a million individual image crops, a standard linear search becomes a massive bottleneck. To provide a seamless user experience and maintain high-speed processing, we needed a way to perform image similarity queries in milliseconds, not minutes. ANN provided the bridge between high-dimensional image embeddings and production-grade performance.

The Nearest Neighbor Problem

Gabriel Graph

At its core, Nearest Neighbor (NN) search is the task of finding the point in a dataset that is most similar to a given query point. In a traditional k-Nearest Neighbor (k-NN) approach, we calculate the distance between the query and every single point in the database. This is mathematically exact but computationally expensive—O(N) complexity.

What makes ANN “Approximate”? ANN algorithms trade a small amount of accuracy (precision/recall) for a massive increase in speed and memory efficiency. Instead of checking every point, ANN uses clever data structures—like graphs, trees, or hashes—to narrow down the search space to a small “neighborhood” of likely candidates. In practice, being 99% accurate while being 1000x faster is a trade-off most production systems gladly make.

This trade-off has become essential due to the rise of Embeddings: high-dimensional vector representations of unstructured data.

Key Applications

Image Similarity & Search: Finding visually similar products in e-commerce or reverse image searches.
Embedding Comparison: Powering Retrieval-Augmented Generation (RAG) by finding relevant context for LLMs.
Recommender Systems: Matching user profiles with content items (movies, music, articles) in milliseconds.

Distance Metrics

The “core problem” of finding neighbors remains the same regardless of how you define “close.” ANN frameworks support various metrics:

Euclidean Distance (L2): Straight-line distance. Common for physical spatial data.
Cosine Similarity: Measures the angle between vectors. Industry standard for text/NLP—focuses on “direction” (meaning) rather than magnitude.
Inner Product (IP): Used in recommendation systems where feature strength (magnitude) matters.

When High-Level Tools Hit Their Limits

For many use cases, vector databases provide an excellent developer experience:

pgvector is a PostgreSQL extension that allows ANN searches directly within SQL queries—perfect for teams wanting to avoid a dedicated vector database overhead.

Milvus is a cloud-native, open-source vector database built specifically for managing embedding vectors. Unlike a plugin, Milvus is a standalone system that decouples storage and compute, scaling horizontally to handle billions of vectors with built-in support for HNSW, IVF, and other indexing strategies.

DuckDB with its vss (Vector Similarity Search) extension brings analytical power to the vector space. By supporting HNSW indexes, it enables fast vector similarity searches directly on Parquet or CSV files—excellent for local analysis or embedded applications.

Faiss (Meta’s Facebook AI Similarity Search) is the gold standard for dense vector search. Written in C++ with Python wrappers, it’s highly optimized for CPU and GPU, excelling at indexing billions of vectors using Product Quantization (PQ).

Our Story: The pgvector Bottleneck

When scaling autoVI, we needed to run 100k queries against every single point in our dataset. This seemingly simple extension challenged the assumptions on which pgvector was built—the per-query overhead dominated our computation costs, and batch query implementation wasn’t available.

We discovered we could outperform pgvector using plain matrix multiplication—and massively outperform it using cuML’s GPU acceleration. This experience highlighted a critical insight: while convenient wrappers like pgvector or Milvus provide excellent DX, they often impose an abstraction “tax.”

Low-Level Alternatives for Maximum Performance

When raw speed matters, these frameworks offer the lowest overhead:

PyNNDescent is a Python-based implementation of Nearest Neighbor Descent, incredibly fast for building k-nearest neighbor graphs—often used as preprocessing for UMAP.

cuML (RAPIDS AI) offers massive acceleration for NVIDIA hardware. Part of the RAPIDS ecosystem, it provides a scikit-learn-compatible API for GPU-accelerated ML. By leveraging GPU parallelism, it processes k-NN queries at throughput impossible for CPU-only libraries.

Conclusion

Understanding underlying compute primitives—in this case, Nearest Neighbor search—is critical for building high-performance AI systems. To achieve the full algorithmic potential and extreme speedups required for projects like autoVI, it’s sometimes necessary to break out of high-level interfaces. By reaching for lower-level primitives like cuML or Faiss, you bypass interface logic and overhead designed for general-purpose use cases. In high-stakes visual inspection or large-scale embedding search, this deep dive is often what makes the difference between a prototype and a production-grade solution that scales to millions of data points in real-time.