Databases 23 May 2025

Vector search performance improvements - Binary Quantisation & Overquerying - How E6data does it

#databases Problems with existing vector searches:

Vectors with large dimensions take up large amount of space
Floating point calculations are very slow.

Quntisation is a process of re-representing a vector at a lower resolution, and one way to do that in binary quantisation. Binary quantisation is a way where it converts the floating point array into an array of binary numbers.

So if we have a dimension of 4,

4 floating point numbers = 32 bits * 4

4 binary numbers = 4 bits

32x compression, but this compression is also lossy. The key thing is that:

Hamming distance of binary quatised vectors is directly proportional to cosine similarity of full resolution vectors.
If the cosine similarity is low for full resolution vector, then the hamming distance of that same binary quantised vectors will also be low.
This way, querying is much faster

But accuracy still takes a hit, example: If you want to find top 3, a full resolution consine search will fetch those within the top 3 filter. But with binary quantised vectors, the top 3 might only be found when you query the top 20. The top 20 should be queried and then within those 20, top 3 would be found by calculating cosine similarity on their full resolution vectors. This is called overquerying (at least what E6data calls it).

primary search on quantised vectors -> Increase the search space, query for more vectors -> rerank the result vectors by calculating cosine similarity using their full resolution vectors -> limit by top n