Module org.elasticsearch.compute
Class MultivalueDedupe
java.lang.Object
org.elasticsearch.compute.operator.mvdedupe.MultivalueDedupe
Utilities to remove duplicates from multivalued fields.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordResult of calling "hash" on a multivalue dedupe. -
Method Summary
Modifier and TypeMethodDescriptionstatic BatchEncoderbatchEncoder(Block block, int batchSize, boolean allowDirectEncoder) Build aBatchEncoderwhich deduplicates values at each position and then encodes the results into awhich can be used for things like hashing many fields together.invalid reference
byte[]static BlockdedupeToBlockAdaptive(Block block, BlockFactory blockFactory) Remove duplicate values from each position and write the results to aBlockusing an adaptive algorithm based on the size of the input list.static BlockdedupeToBlockUsingCopyAndSort(Block block, BlockFactory blockFactory) Remove duplicate values from each position and write the results to aBlockusing an algorithm that sorts all values.static BlockdedupeToBlockUsingCopyMissing(Block block, BlockFactory blockFactory) Remove duplicate values from each position and write the results to aBlockusing an algorithm with very low overhead butn^2case complexity for larger.evaluator(ElementType elementType, EvalOperator.ExpressionEvaluator.Factory field) Build andEvalOperator.ExpressionEvaluatorthat deduplicates values using an adaptive algorithm based on the size of the input list.
-
Method Details
-
dedupeToBlockAdaptive
Remove duplicate values from each position and write the results to aBlockusing an adaptive algorithm based on the size of the input list. -
dedupeToBlockUsingCopyMissing
Remove duplicate values from each position and write the results to aBlockusing an algorithm with very low overhead butn^2case complexity for larger. PreferdedupeToBlockAdaptive(org.elasticsearch.compute.data.Block, org.elasticsearch.compute.data.BlockFactory)which picks based on the number of elements at each position. -
dedupeToBlockUsingCopyAndSort
Remove duplicate values from each position and write the results to aBlockusing an algorithm that sorts all values. It has a higher overhead for small numbers of values at each position thandedupeToBlockUsingCopyMissing(org.elasticsearch.compute.data.Block, org.elasticsearch.compute.data.BlockFactory)for large numbers of values the performance is dominated by then*log nsort. PreferdedupeToBlockAdaptive(org.elasticsearch.compute.data.Block, org.elasticsearch.compute.data.BlockFactory)unless you need the results sorted. -
evaluator
public static EvalOperator.ExpressionEvaluator.Factory evaluator(ElementType elementType, EvalOperator.ExpressionEvaluator.Factory field) Build andEvalOperator.ExpressionEvaluatorthat deduplicates values using an adaptive algorithm based on the size of the input list. -
batchEncoder
Build aBatchEncoderwhich deduplicates values at each position and then encodes the results into awhich can be used for things like hashing many fields together.invalid reference
byte[]
-