Class ES818BinaryQuantizedVectorsFormat

java.lang.Object
org.apache.lucene.codecs.KnnVectorsFormat
org.apache.lucene.codecs.hnsw.FlatVectorsFormat
org.elasticsearch.index.codec.vectors.es818.ES818BinaryQuantizedVectorsFormat
All Implemented Interfaces:
org.apache.lucene.util.NamedSPILoader.NamedSPI

public class ES818BinaryQuantizedVectorsFormat extends org.apache.lucene.codecs.hnsw.FlatVectorsFormat
Copied from Lucene, replace with Lucene's implementation sometime after Lucene 10 Codec for encoding/decoding binary quantized vectors The binary quantization format used here is a per-vector optimized scalar quantization. Also see OptimizedScalarQuantizer. Some of key features are:
  • Estimating the distance between two vectors using their centroid normalized distance. This requires some additional corrective factors, but allows for centroid normalization to occur.
  • Optimized scalar quantization to bit level of centroid normalized vectors.
  • Asymmetric quantization of vectors, where query vectors are quantized to half-byte precision (normalized to the centroid) and then compared directly against the single bit quantized vectors in the index.
  • Transforming the half-byte quantized query vectors in such a way that the comparison with single bit vectors can be done with bit arithmetic.
The format is stored in two files:

.veb (vector data) file

Stores the binary quantized vectors in a flat format. Additionally, it stores each vector's corrective factors. At the end of the file, additional information is stored for vector ordinal to centroid ordinal mapping and sparse vector information.

  • For each vector:
    • [byte] the binary quantized values, each byte holds 8 bits.
    • [float] the optimized quantiles and an additional similarity dependent corrective factor.
    • short the sum of the quantized components
  • After the vectors, sparse vector information keeping track of monotonic blocks.

.vemb (vector metadata) file

Stores the metadata for the vectors. This includes the number of vectors, the number of dimensions, and file offset information.

  • int the field number
  • int the vector encoding ordinal
  • int the vector similarity ordinal
  • vint the vector dimensions
  • vlong the offset to the vector data in the .veb file
  • vlong the length of the vector data in the .veb file
  • vint the number of vectors
  • [float] the centroid
  • float the centroid square magnitude
  • The sparse vector information, if required, mapping vector ordinal to doc ID
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
     
    static final String
     

    Fields inherited from class org.apache.lucene.codecs.KnnVectorsFormat

    DEFAULT_MAX_DIMENSIONS, EMPTY
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new instance with the default number of vectors per cluster.
  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.lucene.codecs.hnsw.FlatVectorsReader
    fieldsReader(org.apache.lucene.index.SegmentReadState state)
     
    org.apache.lucene.codecs.hnsw.FlatVectorsWriter
    fieldsWriter(org.apache.lucene.index.SegmentWriteState state)
     
    int
     
     

    Methods inherited from class org.apache.lucene.codecs.KnnVectorsFormat

    availableKnnVectorsFormats, forName, getName, reloadKnnVectorsFormat

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

  • Constructor Details

    • ES818BinaryQuantizedVectorsFormat

      public ES818BinaryQuantizedVectorsFormat()
      Creates a new instance with the default number of vectors per cluster.
  • Method Details

    • fieldsWriter

      public org.apache.lucene.codecs.hnsw.FlatVectorsWriter fieldsWriter(org.apache.lucene.index.SegmentWriteState state) throws IOException
      Specified by:
      fieldsWriter in class org.apache.lucene.codecs.hnsw.FlatVectorsFormat
      Throws:
      IOException
    • fieldsReader

      public org.apache.lucene.codecs.hnsw.FlatVectorsReader fieldsReader(org.apache.lucene.index.SegmentReadState state) throws IOException
      Specified by:
      fieldsReader in class org.apache.lucene.codecs.hnsw.FlatVectorsFormat
      Throws:
      IOException
    • getMaxDimensions

      public int getMaxDimensions(String fieldName)
      Overrides:
      getMaxDimensions in class org.apache.lucene.codecs.hnsw.FlatVectorsFormat
    • toString

      public String toString()
      Overrides:
      toString in class Object