Databases 23 May 2025

Vector search performance improvements - Binary Quantisation & Overquerying - How E6data does it

#databases Problems with existing vector searches:

  • Vectors with large dimensions take up large amount of space
  • Floating point calculations are very slow.

Quntisation is a process of re-representing a vector at a lower resolution, and one way to do that in binary quantisation. Binary quantisation is a way where it converts the floating point array into an array of binary numbers.

So if we have a dimension of 4,

4 floating point numbers = 32 bits * 4

4 binary numbers = 4 bits

32x compression, but this compression is also lossy. The key thing is that:

  • Hamming distance of binary quatised vectors is directly proportional to cosine similarity of full resolution vectors.
  • If the cosine similarity is low for full resolution vector, then the hamming distance of that same binary quantised vectors will also be low.
  • This way, querying is much faster

But accuracy still takes a hit, example: If you want to find top 3, a full resolution consine search will fetch those within the top 3 filter. But with binary quantised vectors, the top 3 might only be found when you query the top 20. The top 20 should be queried and then within those 20, top 3 would be found by calculating cosine similarity on their full resolution vectors. This is called overquerying (at least what E6data calls it).

primary search on quantised vectors -> Increase the search space, query for more vectors -> rerank the result vectors by calculating cosine similarity using their full resolution vectors -> limit by top n