java.lang.Object
org.elasticsearch.simdvec.ESVectorUtil
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic intandBitCount(byte[] a, byte[] b) AND bit count computed over signed bytes.static voidcalculateOSQGridPoints(float[] target, int[] quantize, int points, float[] pts) Calculate the grid points for optimized-scalar quantizationstatic floatcalculateOSQLoss(float[] target, float lowerInterval, float upperInterval, int points, float norm2, float lambda, int[] quantize) Calculate the loss for optimized-scalar quantization for the given parameteresstatic voidcenterAndCalculateOSQStatsDp(float[] target, float[] centroid, float[] centered, float[] stats) Center the target vector and calculate the optimized-scalar quantization statisticsstatic voidcenterAndCalculateOSQStatsEuclidean(float[] target, float[] centroid, float[] centered, float[] stats) Center the target vector and calculate the optimized-scalar quantization statisticsstatic ES91Int4VectorsScorergetES91Int4VectorsScorer(org.apache.lucene.store.IndexInput input, int dimension) static ES91OSQVectorsScorergetES91OSQVectorsScorer(org.apache.lucene.store.IndexInput input, int dimension) static ES92Int7VectorsScorergetES92Int7VectorsScorer(org.apache.lucene.store.IndexInput input, int dimension) static longipByteBinByte(byte[] q, byte[] d) static intipByteBit(byte[] q, byte[] d) Compute the inner product of two vectors, where the query vector is a byte vector and the document vector is a bit vector.static floatipFloatBit(float[] q, byte[] d) Compute the inner product of two vectors, where the query vector is a float vector and the document vector is a bit vector.static floatipFloatByte(float[] q, byte[] d) Compute the inner product of two vectors, where the query vector is a float vector and the document vector is a byte vector.static voidpackAsBinary(int[] vector, byte[] packed) Packs the provided int array populated with "0" and "1" values into a byte array.static intquantizeVectorWithIntervals(float[] vector, int[] destination, float lowInterval, float upperInterval, byte bit) Optimized-scalar quantization of the provided vector to the provided destination array.static floatsoarDistance(float[] v1, float[] centroid, float[] originalResidual, float soarLambda, float rnorm) calculates the soar distance for a vector and a centroidstatic voidsoarDistanceBulk(float[] v1, float[] c0, float[] c1, float[] c2, float[] c3, float[] originalResidual, float soarLambda, float rnorm, float[] distances) Bulk computation of the soar distance for a vector to four centroidsstatic voidsquareDistanceBulk(float[] q, float[] v0, float[] v1, float[] v2, float[] v3, float[] distances) Bulk computation of square distances between a query vector and four vectors.Result is stored in the provided distances array.static voidsubtract(float[] v1, float[] v2, float[] result) Calculates the difference between two vectors and stores the result in a third vector.static voidtransposeHalfByte(int[] q, byte[] quantQueryByte) The idea here is to organize the query vector bits such that the first bit of every dimension is in the first set dimensions bits, or (dimensions/8) bytes.
-
Constructor Details
-
ESVectorUtil
public ESVectorUtil()
-
-
Method Details
-
getES91OSQVectorsScorer
public static ES91OSQVectorsScorer getES91OSQVectorsScorer(org.apache.lucene.store.IndexInput input, int dimension) throws IOException - Throws:
IOException
-
getES91Int4VectorsScorer
public static ES91Int4VectorsScorer getES91Int4VectorsScorer(org.apache.lucene.store.IndexInput input, int dimension) throws IOException - Throws:
IOException
-
getES92Int7VectorsScorer
public static ES92Int7VectorsScorer getES92Int7VectorsScorer(org.apache.lucene.store.IndexInput input, int dimension) throws IOException - Throws:
IOException
-
ipByteBinByte
public static long ipByteBinByte(byte[] q, byte[] d) -
ipByteBit
public static int ipByteBit(byte[] q, byte[] d) Compute the inner product of two vectors, where the query vector is a byte vector and the document vector is a bit vector. This will return the sum of the query vector values using the document vector as a mask. When comparing the bits with the bytes, they are done in "big endian" order. For example, if the byte vector is [1, 2, 3, 4, 5, 6, 7, 8] and the bit vector is [0b10000000], the inner product will be 1.0.- Parameters:
q- the query vectord- the document vector- Returns:
- the inner product of the two vectors
-
ipFloatBit
public static float ipFloatBit(float[] q, byte[] d) Compute the inner product of two vectors, where the query vector is a float vector and the document vector is a bit vector. This will return the sum of the query vector values using the document vector as a mask. When comparing the bits with the floats, they are done in "big endian" order. For example, if the float vector is [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0] and the bit vector is [0b10000000], the inner product will be 1.0.- Parameters:
q- the query vectord- the document vector- Returns:
- the inner product of the two vectors
-
ipFloatByte
public static float ipFloatByte(float[] q, byte[] d) Compute the inner product of two vectors, where the query vector is a float vector and the document vector is a byte vector.- Parameters:
q- the query vectord- the document vector- Returns:
- the inner product of the two vectors
-
andBitCount
public static int andBitCount(byte[] a, byte[] b) AND bit count computed over signed bytes. Copied from Lucene's XOR implementation- Parameters:
a- bytes containing a vectorb- bytes containing another vector, of the same dimension- Returns:
- the value of the AND bit count of the two vectors
-
calculateOSQLoss
public static float calculateOSQLoss(float[] target, float lowerInterval, float upperInterval, int points, float norm2, float lambda, int[] quantize) Calculate the loss for optimized-scalar quantization for the given parameteres- Parameters:
target- The vector being quantized, assumed to be centeredlowerInterval- The lower interval value for which to calculate the lossupperInterval- The upper interval value for which to calculate the losspoints- the quantization pointsnorm2- The norm squared of the target vectorlambda- The lambda parameter for controlling anisotropic loss calculationquantize- array to store the computed quantize vector.- Returns:
- The loss for the given parameters
-
calculateOSQGridPoints
public static void calculateOSQGridPoints(float[] target, int[] quantize, int points, float[] pts) Calculate the grid points for optimized-scalar quantization- Parameters:
target- The vector being quantized, assumed to be centeredquantize- The quantize vector which should have at least the target vector lengthpoints- the quantization pointspts- The array to store the grid points, must be of length 5
-
centerAndCalculateOSQStatsEuclidean
public static void centerAndCalculateOSQStatsEuclidean(float[] target, float[] centroid, float[] centered, float[] stats) Center the target vector and calculate the optimized-scalar quantization statistics- Parameters:
target- The vector being quantizedcentroid- The centroid of the target vectorcentered- The destination of the centered vector, will be overwrittenstats- The array to store the statistics, must be of length 5
-
centerAndCalculateOSQStatsDp
public static void centerAndCalculateOSQStatsDp(float[] target, float[] centroid, float[] centered, float[] stats) Center the target vector and calculate the optimized-scalar quantization statistics- Parameters:
target- The vector being quantizedcentroid- The centroid of the target vectorcentered- The destination of the centered vector, will be overwrittenstats- The array to store the statistics, must be of length 6
-
subtract
public static void subtract(float[] v1, float[] v2, float[] result) Calculates the difference between two vectors and stores the result in a third vector.- Parameters:
v1- the first vectorv2- the second vectorresult- the result vector, must be the same length as the input vectors
-
soarDistance
public static float soarDistance(float[] v1, float[] centroid, float[] originalResidual, float soarLambda, float rnorm) calculates the soar distance for a vector and a centroid- Parameters:
v1- the vectorcentroid- the centroidoriginalResidual- the residual with the actually nearest centroidsoarLambda- the lambda parameterrnorm- distance to the nearest centroid- Returns:
- the soar distance
-
quantizeVectorWithIntervals
public static int quantizeVectorWithIntervals(float[] vector, int[] destination, float lowInterval, float upperInterval, byte bit) Optimized-scalar quantization of the provided vector to the provided destination array.- Parameters:
vector- the vector to quantizedestination- the array to store the resultlowInterval- the minimum value, lower values in the original array will be replaced by this valueupperInterval- the maximum value, bigger values in the original array will be replaced by this valuebit- the number of bits to use for quantization, must be between 1 and 8- Returns:
- return the sum of all the elements of the resulting quantized vector.
-
squareDistanceBulk
public static void squareDistanceBulk(float[] q, float[] v0, float[] v1, float[] v2, float[] v3, float[] distances) Bulk computation of square distances between a query vector and four vectors.Result is stored in the provided distances array.- Parameters:
q- the query vectorv0- the first vectorv1- the second vectorv2- the third vectorv3- the fourth vectordistances- an array to store the computed square distances, must have length 4- Throws:
IllegalArgumentException- if the dimensions of the vectors do not match or if the distances array does not have length 4
-
soarDistanceBulk
public static void soarDistanceBulk(float[] v1, float[] c0, float[] c1, float[] c2, float[] c3, float[] originalResidual, float soarLambda, float rnorm, float[] distances) Bulk computation of the soar distance for a vector to four centroids- Parameters:
v1- the vectorc0- the first centroidc1- the second centroidc2- the third centroidc3- the fourth centroidoriginalResidual- the residual with the actually nearest centroidsoarLambda- the lambda parameterrnorm- distance to the nearest centroiddistances- an array to store the computed soar distances, must have length 4
-
packAsBinary
public static void packAsBinary(int[] vector, byte[] packed) Packs the provided int array populated with "0" and "1" values into a byte array.- Parameters:
vector- the int array to pack, must contain only "0" and "1" values.packed- the byte array to store the packed result, must be large enough to hold the packed data.
-
transposeHalfByte
public static void transposeHalfByte(int[] q, byte[] quantQueryByte) The idea here is to organize the query vector bits such that the first bit of every dimension is in the first set dimensions bits, or (dimensions/8) bytes. The second, third, and fourth bits are in the second, third, and fourth set of dimensions bits, respectively. This allows for direct bitwise comparisons with the stored index vectors through summing the bitwise results with the relative required bit shifts.- Parameters:
q- the query vector, assumed to be half-byte quantized with values between 0 and 15quantQueryByte- the byte array to store the transposed query vector.
-