databases Problems with existing vector searches:
- Vectors with large dimensions take up large amount of space
- Floating point calculations are very slow.
Quntisation is a process of re-representing a vector at a lower resolution, and one way to do that in binary quantisation. Binary quantisation is a way where it converts the floating point array into an array of binary numbers.
So if we have a dimension of 4,
4 floating point numbers = 32 bits * 4
4 binary numbers = 4 bits
32x compression, but this compression is also lossy. The key thing is that:
- Hamming distance of binary quatised vectors is directly proportional to cosine similarity of full resolution vectors.
- If the cosine similarity is low for full resolution vector, then the hamming distance of that same binary quantised vectors will also be low.
- This way, querying is much faster
But accuracy still takes a hit, example: If you want to find top 3, a full resolution consine search will fetch those within the top 3 filter. But with binary quantised vectors, the top 3 might only be found when you query the top 20. The top 20 should be queried and then within those 20, top 3 would be found by calculating cosine similarity on their full resolution vectors. This is called overquerying (at least what E6data calls it).
primary search on quantised vectors → Increase the search space, query for more vectors → rerank the result vectors by calculating cosine similarity using their full resolution vectors → limit by top n