java.lang.Object
org.elasticsearch.index.codec.vectors.cluster.HierarchicalKMeans

public class HierarchicalKMeans extends Object
An implementation of the hierarchical k-means algorithm that better partitions data than naive k-means
  • Field Details

  • Method Details

    • ofSerial

      public static HierarchicalKMeans ofSerial(int dimension)
    • ofSerial

      public static HierarchicalKMeans ofSerial(int dimension, int maxIterations, int samplesPerCluster, int clustersPerNeighborhood, float soarLambda)
    • ofConcurrent

      public static HierarchicalKMeans ofConcurrent(int dimension, org.apache.lucene.search.TaskExecutor executor, int numWorkers)
    • ofConcurrent

      public static HierarchicalKMeans ofConcurrent(int dimension, org.apache.lucene.search.TaskExecutor executor, int numWorkers, int maxIterations, int samplesPerCluster, int clustersPerNeighborhood, float soarLambda)
    • cluster

      public KMeansResult cluster(org.apache.lucene.index.FloatVectorValues vectors, int targetSize) throws IOException
      clusters the set of vectors by starting with a rough number of partitions and then recursively refining those lastly a pass is made to adjust nearby neighborhoods and add an extra assignment per vector to nearby neighborhoods
      Parameters:
      vectors - the vectors to cluster
      targetSize - the rough number of vectors that should be attached to a cluster
      Returns:
      the centroids and the vectors assignments and SOAR (spilled from nearby neighborhoods) assignments
      Throws:
      IOException - is thrown if vectors is inaccessible