Class ESVectorUtil

java.lang.Object
org.elasticsearch.simdvec.ESVectorUtil

public class ESVectorUtil extends Object
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static int
    andBitCount(byte[] a, byte[] b)
    AND bit count computed over signed bytes.
    static void
    calculateOSQGridPoints(float[] target, int[] quantize, int points, float[] pts)
    Calculate the grid points for optimized-scalar quantization
    static float
    calculateOSQLoss(float[] target, float lowerInterval, float upperInterval, int points, float norm2, float lambda, int[] quantize)
    Calculate the loss for optimized-scalar quantization for the given parameteres
    static void
    centerAndCalculateOSQStatsDp(float[] target, float[] centroid, float[] centered, float[] stats)
    Center the target vector and calculate the optimized-scalar quantization statistics
    static void
    centerAndCalculateOSQStatsEuclidean(float[] target, float[] centroid, float[] centered, float[] stats)
    Center the target vector and calculate the optimized-scalar quantization statistics
    getES91Int4VectorsScorer(org.apache.lucene.store.IndexInput input, int dimension)
     
    getES91OSQVectorsScorer(org.apache.lucene.store.IndexInput input, int dimension)
     
    getES92Int7VectorsScorer(org.apache.lucene.store.IndexInput input, int dimension)
     
    static long
    ipByteBinByte(byte[] q, byte[] d)
     
    static int
    ipByteBit(byte[] q, byte[] d)
    Compute the inner product of two vectors, where the query vector is a byte vector and the document vector is a bit vector.
    static float
    ipFloatBit(float[] q, byte[] d)
    Compute the inner product of two vectors, where the query vector is a float vector and the document vector is a bit vector.
    static float
    ipFloatByte(float[] q, byte[] d)
    Compute the inner product of two vectors, where the query vector is a float vector and the document vector is a byte vector.
    static void
    packAsBinary(int[] vector, byte[] packed)
    Packs the provided int array populated with "0" and "1" values into a byte array.
    static int
    quantizeVectorWithIntervals(float[] vector, int[] destination, float lowInterval, float upperInterval, byte bit)
    Optimized-scalar quantization of the provided vector to the provided destination array.
    static float
    soarDistance(float[] v1, float[] centroid, float[] originalResidual, float soarLambda, float rnorm)
    calculates the soar distance for a vector and a centroid
    static void
    soarDistanceBulk(float[] v1, float[] c0, float[] c1, float[] c2, float[] c3, float[] originalResidual, float soarLambda, float rnorm, float[] distances)
    Bulk computation of the soar distance for a vector to four centroids
    static void
    squareDistanceBulk(float[] q, float[] v0, float[] v1, float[] v2, float[] v3, float[] distances)
    Bulk computation of square distances between a query vector and four vectors.Result is stored in the provided distances array.
    static void
    subtract(float[] v1, float[] v2, float[] result)
    Calculates the difference between two vectors and stores the result in a third vector.
    static void
    transposeHalfByte(int[] q, byte[] quantQueryByte)
    The idea here is to organize the query vector bits such that the first bit of every dimension is in the first set dimensions bits, or (dimensions/8) bytes.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • ESVectorUtil

      public ESVectorUtil()
  • Method Details

    • getES91OSQVectorsScorer

      public static ES91OSQVectorsScorer getES91OSQVectorsScorer(org.apache.lucene.store.IndexInput input, int dimension) throws IOException
      Throws:
      IOException
    • getES91Int4VectorsScorer

      public static ES91Int4VectorsScorer getES91Int4VectorsScorer(org.apache.lucene.store.IndexInput input, int dimension) throws IOException
      Throws:
      IOException
    • getES92Int7VectorsScorer

      public static ES92Int7VectorsScorer getES92Int7VectorsScorer(org.apache.lucene.store.IndexInput input, int dimension) throws IOException
      Throws:
      IOException
    • ipByteBinByte

      public static long ipByteBinByte(byte[] q, byte[] d)
    • ipByteBit

      public static int ipByteBit(byte[] q, byte[] d)
      Compute the inner product of two vectors, where the query vector is a byte vector and the document vector is a bit vector. This will return the sum of the query vector values using the document vector as a mask. When comparing the bits with the bytes, they are done in "big endian" order. For example, if the byte vector is [1, 2, 3, 4, 5, 6, 7, 8] and the bit vector is [0b10000000], the inner product will be 1.0.
      Parameters:
      q - the query vector
      d - the document vector
      Returns:
      the inner product of the two vectors
    • ipFloatBit

      public static float ipFloatBit(float[] q, byte[] d)
      Compute the inner product of two vectors, where the query vector is a float vector and the document vector is a bit vector. This will return the sum of the query vector values using the document vector as a mask. When comparing the bits with the floats, they are done in "big endian" order. For example, if the float vector is [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0] and the bit vector is [0b10000000], the inner product will be 1.0.
      Parameters:
      q - the query vector
      d - the document vector
      Returns:
      the inner product of the two vectors
    • ipFloatByte

      public static float ipFloatByte(float[] q, byte[] d)
      Compute the inner product of two vectors, where the query vector is a float vector and the document vector is a byte vector.
      Parameters:
      q - the query vector
      d - the document vector
      Returns:
      the inner product of the two vectors
    • andBitCount

      public static int andBitCount(byte[] a, byte[] b)
      AND bit count computed over signed bytes. Copied from Lucene's XOR implementation
      Parameters:
      a - bytes containing a vector
      b - bytes containing another vector, of the same dimension
      Returns:
      the value of the AND bit count of the two vectors
    • calculateOSQLoss

      public static float calculateOSQLoss(float[] target, float lowerInterval, float upperInterval, int points, float norm2, float lambda, int[] quantize)
      Calculate the loss for optimized-scalar quantization for the given parameteres
      Parameters:
      target - The vector being quantized, assumed to be centered
      lowerInterval - The lower interval value for which to calculate the loss
      upperInterval - The upper interval value for which to calculate the loss
      points - the quantization points
      norm2 - The norm squared of the target vector
      lambda - The lambda parameter for controlling anisotropic loss calculation
      quantize - array to store the computed quantize vector.
      Returns:
      The loss for the given parameters
    • calculateOSQGridPoints

      public static void calculateOSQGridPoints(float[] target, int[] quantize, int points, float[] pts)
      Calculate the grid points for optimized-scalar quantization
      Parameters:
      target - The vector being quantized, assumed to be centered
      quantize - The quantize vector which should have at least the target vector length
      points - the quantization points
      pts - The array to store the grid points, must be of length 5
    • centerAndCalculateOSQStatsEuclidean

      public static void centerAndCalculateOSQStatsEuclidean(float[] target, float[] centroid, float[] centered, float[] stats)
      Center the target vector and calculate the optimized-scalar quantization statistics
      Parameters:
      target - The vector being quantized
      centroid - The centroid of the target vector
      centered - The destination of the centered vector, will be overwritten
      stats - The array to store the statistics, must be of length 5
    • centerAndCalculateOSQStatsDp

      public static void centerAndCalculateOSQStatsDp(float[] target, float[] centroid, float[] centered, float[] stats)
      Center the target vector and calculate the optimized-scalar quantization statistics
      Parameters:
      target - The vector being quantized
      centroid - The centroid of the target vector
      centered - The destination of the centered vector, will be overwritten
      stats - The array to store the statistics, must be of length 6
    • subtract

      public static void subtract(float[] v1, float[] v2, float[] result)
      Calculates the difference between two vectors and stores the result in a third vector.
      Parameters:
      v1 - the first vector
      v2 - the second vector
      result - the result vector, must be the same length as the input vectors
    • soarDistance

      public static float soarDistance(float[] v1, float[] centroid, float[] originalResidual, float soarLambda, float rnorm)
      calculates the soar distance for a vector and a centroid
      Parameters:
      v1 - the vector
      centroid - the centroid
      originalResidual - the residual with the actually nearest centroid
      soarLambda - the lambda parameter
      rnorm - distance to the nearest centroid
      Returns:
      the soar distance
    • quantizeVectorWithIntervals

      public static int quantizeVectorWithIntervals(float[] vector, int[] destination, float lowInterval, float upperInterval, byte bit)
      Optimized-scalar quantization of the provided vector to the provided destination array.
      Parameters:
      vector - the vector to quantize
      destination - the array to store the result
      lowInterval - the minimum value, lower values in the original array will be replaced by this value
      upperInterval - the maximum value, bigger values in the original array will be replaced by this value
      bit - the number of bits to use for quantization, must be between 1 and 8
      Returns:
      return the sum of all the elements of the resulting quantized vector.
    • squareDistanceBulk

      public static void squareDistanceBulk(float[] q, float[] v0, float[] v1, float[] v2, float[] v3, float[] distances)
      Bulk computation of square distances between a query vector and four vectors.Result is stored in the provided distances array.
      Parameters:
      q - the query vector
      v0 - the first vector
      v1 - the second vector
      v2 - the third vector
      v3 - the fourth vector
      distances - an array to store the computed square distances, must have length 4
      Throws:
      IllegalArgumentException - if the dimensions of the vectors do not match or if the distances array does not have length 4
    • soarDistanceBulk

      public static void soarDistanceBulk(float[] v1, float[] c0, float[] c1, float[] c2, float[] c3, float[] originalResidual, float soarLambda, float rnorm, float[] distances)
      Bulk computation of the soar distance for a vector to four centroids
      Parameters:
      v1 - the vector
      c0 - the first centroid
      c1 - the second centroid
      c2 - the third centroid
      c3 - the fourth centroid
      originalResidual - the residual with the actually nearest centroid
      soarLambda - the lambda parameter
      rnorm - distance to the nearest centroid
      distances - an array to store the computed soar distances, must have length 4
    • packAsBinary

      public static void packAsBinary(int[] vector, byte[] packed)
      Packs the provided int array populated with "0" and "1" values into a byte array.
      Parameters:
      vector - the int array to pack, must contain only "0" and "1" values.
      packed - the byte array to store the packed result, must be large enough to hold the packed data.
    • transposeHalfByte

      public static void transposeHalfByte(int[] q, byte[] quantQueryByte)
      The idea here is to organize the query vector bits such that the first bit of every dimension is in the first set dimensions bits, or (dimensions/8) bytes. The second, third, and fourth bits are in the second, third, and fourth set of dimensions bits, respectively. This allows for direct bitwise comparisons with the stored index vectors through summing the bitwise results with the relative required bit shifts.
      Parameters:
      q - the query vector, assumed to be half-byte quantized with values between 0 and 15
      quantQueryByte - the byte array to store the transposed query vector.