java.lang.Object
org.elasticsearch.index.codec.vectors.cluster.HierarchicalKMeans

public class HierarchicalKMeans extends Object
An implementation of the hierarchical k-means algorithm that better partitions data than naive k-means
  • Field Details

  • Constructor Details

    • HierarchicalKMeans

      public HierarchicalKMeans(int dimension)
    • HierarchicalKMeans

      public HierarchicalKMeans(int dimension, int maxIterations, int samplesPerCluster, int clustersPerNeighborhood, float soarLambda)
  • Method Details

    • cluster

      public KMeansResult cluster(org.apache.lucene.index.FloatVectorValues vectors, int targetSize) throws IOException
      clusters or moreso partitions the set of vectors by starting with a rough number of partitions and then recursively refining those lastly a pass is made to adjust nearby neighborhoods and add an extra assignment per vector to nearby neighborhoods
      Parameters:
      vectors - the vectors to cluster
      targetSize - the rough number of vectors that should be attached to a cluster
      Returns:
      the centroids and the vectors assignments and SOAR (spilled from nearby neighborhoods) assignments
      Throws:
      IOException - is thrown if vectors is inaccessible