Class HybridDigest

All Implemented Interfaces:
Closeable, AutoCloseable, org.apache.lucene.util.Accountable, Releasable

public class HybridDigest extends AbstractTDigest
Uses a SortingDigest implementation under the covers for small sample populations, then switches to MergingDigest. The SortingDigest is perfectly accurate and the fastest implementation for up to millions of samples, at the cost of increased memory footprint as it tracks all samples. Conversely, the MergingDigest pre-allocates its memory (tens of KBs) and provides better performance for hundreds of millions of samples and more, while accuracy stays bounded to 0.1-1% for most cases. This hybrid approach provides the best of both worlds, i.e. speedy and accurate percentile calculations for small populations with bounded memory allocation and acceptable speed and accuracy for larger ones.
  • Field Summary

    Fields inherited from class org.elasticsearch.tdigest.TDigest

    scale

    Fields inherited from interface org.apache.lucene.util.Accountable

    NULL_ACCOUNTABLE
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    add(double x, long w)
    Adds a sample to a histogram.
    void
    add(TDigest other)
    Add all of the centroids of another TDigest to this one.
    int
    Returns the number of bytes required to encode this TDigest using #asBytes().
    double
    cdf(double x)
    Returns the fraction of all points added which are ≤ x.
    int
     
    A Collection that lets you go through the centroids in ascending order by mean.
    void
     
    void
    Re-examines a t-digest to determine whether some centroids are redundant.
    double
    Returns the current compression factor.
    double
     
    double
     
    double
    quantile(double q)
    Returns an estimate of a cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
    long
     
    void
    reserve(long size)
    Prepare internal structure for loading the requested number of samples.
    long
    Returns the number of points that have been added to this TDigest.

    Methods inherited from class org.elasticsearch.tdigest.TDigest

    add, createAvlTreeDigest, createHybridDigest, createMergingDigest, createSortingDigest, setScaleFunction

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.lucene.util.Accountable

    getChildResources
  • Method Details

    • ramBytesUsed

      public long ramBytesUsed()
    • add

      public void add(double x, long w)
      Description copied from class: TDigest
      Adds a sample to a histogram.
      Specified by:
      add in class TDigest
      Parameters:
      x - The value to add.
      w - The weight of this point.
    • add

      public void add(TDigest other)
      Description copied from class: TDigest
      Add all of the centroids of another TDigest to this one.
      Overrides:
      add in class AbstractTDigest
      Parameters:
      other - The other TDigest
    • reserve

      public void reserve(long size)
      Description copied from class: TDigest
      Prepare internal structure for loading the requested number of samples.
      Overrides:
      reserve in class TDigest
      Parameters:
      size - number of samples to be loaded
    • compress

      public void compress()
      Description copied from class: TDigest
      Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space. The cost is roughly the same as adding as many data points as there are centroids. This is typically < 10 * compression, but could be as high as 100 * compression. This is a destructive operation that is not thread-safe.
      Specified by:
      compress in class TDigest
    • size

      public long size()
      Description copied from class: TDigest
      Returns the number of points that have been added to this TDigest.
      Specified by:
      size in class TDigest
      Returns:
      The sum of the weights on all centroids.
    • cdf

      public double cdf(double x)
      Description copied from class: TDigest
      Returns the fraction of all points added which are ≤ x. Points that are exactly equal get half credit (i.e. we use the mid-point rule)
      Specified by:
      cdf in class TDigest
      Parameters:
      x - The cutoff for the cdf.
      Returns:
      The fraction of all data which is less or equal to x.
    • quantile

      public double quantile(double q)
      Description copied from class: TDigest
      Returns an estimate of a cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
      Specified by:
      quantile in class TDigest
      Parameters:
      q - The desired fraction
      Returns:
      The smallest value x such that cdf(x) ≥ q
    • centroids

      public Collection<Centroid> centroids()
      Description copied from class: TDigest
      A Collection that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
      Specified by:
      centroids in class TDigest
      Returns:
      The centroids in the form of a Collection.
    • compression

      public double compression()
      Description copied from class: TDigest
      Returns the current compression factor.
      Specified by:
      compression in class TDigest
      Returns:
      The compression factor originally used to set up the TDigest.
    • centroidCount

      public int centroidCount()
      Specified by:
      centroidCount in class TDigest
    • getMin

      public double getMin()
      Overrides:
      getMin in class TDigest
    • getMax

      public double getMax()
      Overrides:
      getMax in class TDigest
    • byteSize

      public int byteSize()
      Description copied from class: TDigest
      Returns the number of bytes required to encode this TDigest using #asBytes().
      Specified by:
      byteSize in class TDigest
      Returns:
      The number of bytes required.
    • close

      public void close()