Module org.elasticsearch.server
Class HierarchicalKMeans
java.lang.Object
org.elasticsearch.index.codec.vectors.cluster.HierarchicalKMeans
An implementation of the hierarchical k-means algorithm that better partitions data than naive k-means
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final floatstatic final intstatic final intstatic final intstatic final int -
Method Summary
Modifier and TypeMethodDescriptioncluster(org.apache.lucene.index.FloatVectorValues vectors, int targetSize) clusters the set of vectors by starting with a rough number of partitions and then recursively refining those lastly a pass is made to adjust nearby neighborhoods and add an extra assignment per vector to nearby neighborhoodsstatic HierarchicalKMeansofConcurrent(int dimension, org.apache.lucene.search.TaskExecutor executor, int numWorkers) static HierarchicalKMeansofConcurrent(int dimension, org.apache.lucene.search.TaskExecutor executor, int numWorkers, int maxIterations, int samplesPerCluster, int clustersPerNeighborhood, float soarLambda) static HierarchicalKMeansofSerial(int dimension) static HierarchicalKMeansofSerial(int dimension, int maxIterations, int samplesPerCluster, int clustersPerNeighborhood, float soarLambda)
-
Field Details
-
MAXK
public static final int MAXK- See Also:
-
MAX_ITERATIONS_DEFAULT
public static final int MAX_ITERATIONS_DEFAULT- See Also:
-
SAMPLES_PER_CLUSTER_DEFAULT
public static final int SAMPLES_PER_CLUSTER_DEFAULT- See Also:
-
DEFAULT_SOAR_LAMBDA
public static final float DEFAULT_SOAR_LAMBDA- See Also:
-
NO_SOAR_ASSIGNMENT
public static final int NO_SOAR_ASSIGNMENT- See Also:
-
-
Method Details
-
ofSerial
-
ofSerial
public static HierarchicalKMeans ofSerial(int dimension, int maxIterations, int samplesPerCluster, int clustersPerNeighborhood, float soarLambda) -
ofConcurrent
public static HierarchicalKMeans ofConcurrent(int dimension, org.apache.lucene.search.TaskExecutor executor, int numWorkers) -
ofConcurrent
public static HierarchicalKMeans ofConcurrent(int dimension, org.apache.lucene.search.TaskExecutor executor, int numWorkers, int maxIterations, int samplesPerCluster, int clustersPerNeighborhood, float soarLambda) -
cluster
public KMeansResult cluster(org.apache.lucene.index.FloatVectorValues vectors, int targetSize) throws IOException clusters the set of vectors by starting with a rough number of partitions and then recursively refining those lastly a pass is made to adjust nearby neighborhoods and add an extra assignment per vector to nearby neighborhoods- Parameters:
vectors- the vectors to clustertargetSize- the rough number of vectors that should be attached to a cluster- Returns:
- the centroids and the vectors assignments and SOAR (spilled from nearby neighborhoods) assignments
- Throws:
IOException- is thrown if vectors is inaccessible
-