- All Implemented Interfaces:
Closeable,AutoCloseable,org.apache.lucene.util.Accountable,Releasable
This can be very fast because the cost of sorting and merging is amortized over several insertion. If we keep N centroids total and have the input array is k long, then the amortized cost is something like
N/k + log k
These costs even out when N/k = log k. Balancing costs is often a good place to start in optimizing an algorithm. For different values of compression factor, the following table shows estimated asymptotic values of N and suggested values of k:
| Compression | N | k |
| 50 | 78 | 25 |
| 100 | 157 | 42 |
| 200 | 314 | 73 |
The virtues of this kind of t-digest implementation include:
- No allocation is required after initialization
- The data structure automatically compresses existing centroids when possible
- No Java object overhead is incurred for centroids since data is kept in primitive arrays
The current implementation takes the liberty of using ping-pong buffers for implementing the merge resulting in a substantial memory penalty, but the complexity of an in place merge was not considered as worthwhile since even with the overhead, the memory cost is less than 40 bytes per centroid which is much less than half what the AVLTreeDigest uses and no dynamic allocation is required at all.
-
Field Summary
FieldsModifier and TypeFieldDescriptionbooleanbooleanstatic final booleanFields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE -
Method Summary
Modifier and TypeMethodDescriptionvoidadd(double x, long w) Adds a sample to a histogram.intbyteSize()Returns the number of bytes required to encode this TDigest using #asBytes().doublecdf(double x) Returns the fraction of all points added which are ≤ x.intACollectionthat lets you go through the centroids in ascending order by mean.voidclose()voidcompress()Merges any pending inputs and compresses the data down to the public setting.doubleReturns the current compression factor.doublequantile(double q) Returns an estimate of a cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.longvoidsetScaleFunction(ScaleFunction scaleFunction) longsize()Returns the number of points that have been added to this TDigest.toString()Methods inherited from class org.elasticsearch.tdigest.AbstractTDigest
addMethods inherited from class org.elasticsearch.tdigest.TDigest
add, createAvlTreeDigest, createHybridDigest, createMergingDigest, createSortingDigest, getMax, getMin, reserveMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
Field Details
-
useAlternatingSort
public boolean useAlternatingSort -
useTwoLevelCompression
public boolean useTwoLevelCompression -
useWeightLimit
public static final boolean useWeightLimit- See Also:
-
-
Method Details
-
ramBytesUsed
public long ramBytesUsed() -
add
public void add(double x, long w) Description copied from class:TDigestAdds a sample to a histogram. -
compress
public void compress()Merges any pending inputs and compresses the data down to the public setting. Note that this typically loses a bit of precision and thus isn't a thing to be doing all the time. It is best done only when we want to show results to the outside world. -
size
public long size()Description copied from class:TDigestReturns the number of points that have been added to this TDigest. -
cdf
public double cdf(double x) Description copied from class:TDigestReturns the fraction of all points added which are ≤ x. Points that are exactly equal get half credit (i.e. we use the mid-point rule) -
quantile
public double quantile(double q) Description copied from class:TDigestReturns an estimate of a cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff. -
centroidCount
public int centroidCount()- Specified by:
centroidCountin classTDigest
-
centroids
Description copied from class:TDigestACollectionthat lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest. -
compression
public double compression()Description copied from class:TDigestReturns the current compression factor.- Specified by:
compressionin classTDigest- Returns:
- The compression factor originally used to set up the TDigest.
-
getScaleFunction
-
setScaleFunction
- Overrides:
setScaleFunctionin classTDigest
-
byteSize
public int byteSize()Description copied from class:TDigestReturns the number of bytes required to encode this TDigest using #asBytes(). -
toString
-
close
public void close()
-