Class BlockHash
- All Implemented Interfaces:
Closeable,AutoCloseable,SeenGroupIds,org.elasticsearch.core.Releasable
- Direct Known Subclasses:
CategorizeBlockHash,CategorizePackedValuesBlockHash,TimeSeriesBlockHash
GROUP BY expressions.
A row is always in at least one bucket so the results are never null.
null valued key columns will map to some integer bucket id.
If none of key columns are multivalued then the output is always an
IntVector. If any of the key are multivalued then a row is
in a bucket for each value. If more than one key is multivalued then
the row is in the combinatorial explosion of all value combinations.
Luckily for the number of values rows can only be in each bucket once.
Unluckily, it's the responsibility of BlockHash to remove those
duplicates.
These classes typically delegate to some combination of BytesRefHash,
LongHash, LongLongHash, Int3Hash. They don't
technically have to be hash tables, so long as they
implement the deduplication semantics above and vend integer ids.
The integer ids are assigned to offsets into arrays of aggregation states
so its permissible to have gaps in the ints. But large gaps are a bad
idea because they'll waste space in the aggregations that use these
positions. For example, BooleanBlockHash assigns 0 to
null, 1 to false, and 1 to true
and that's fine and simple and good because it'll never
leave a big gap, even if we never see null.
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from interface org.elasticsearch.compute.aggregation.SeenGroupIds
SeenGroupIds.Empty, SeenGroupIds.Range -
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionabstract voidadd(Page page, GroupingAggregatorFunction.AddInput addInput) Add all values for the "group by" columns in the page to the hash and pass the ordinals to the providedGroupingAggregatorFunction.AddInput.static BlockHashbuild(List<BlockHash.GroupSpec> groups, BlockFactory blockFactory, int emitBatchSize, boolean allowBrokenOptimizations) Creates a specialized hash table that maps one or moreBlocks to ids.static BlockHashbuildCategorizeBlockHash(List<BlockHash.GroupSpec> groups, AggregatorMode aggregatorMode, BlockFactory blockFactory, AnalysisRegistry analysisRegistry, int emitBatchSize) Builds a BlockHash for the Categorize grouping function.static BlockHashbuildPackedValuesBlockHash(List<BlockHash.GroupSpec> groups, BlockFactory blockFactory, int emitBatchSize) Temporary method to build aPackedValuesBlockHash.abstract Block[]getKeys()Returns aBlockthat contains all the keys that are inserted byadd(org.elasticsearch.compute.data.Page, org.elasticsearch.compute.aggregation.GroupingAggregatorFunction.AddInput).static longhashOrdToGroup(long ord) Convert the result of callingLongHashorLongLongHashorBytesRefHashor similar to a group ordinal.static longhashOrdToGroupNullReserved(long ord) Convert the result of callingLongHashorLongLongHashorBytesRefHashor similar to a group ordinal, reserving0for null.abstract org.elasticsearch.core.ReleasableIterator<IntBlock> lookup(Page page, ByteSizeValue targetBlockSize) Lookup all values for the "group by" columns in the page to the hash and return anIteratorof the values.abstract IntVectornonEmpty()The grouping ids that are not empty.abstract BitArrayseenGroupIds(BigArrays bigArrays) The grouping ids that have been seen already.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.elasticsearch.core.Releasable
close
-
Field Details
-
blockFactory
-
-
Method Details
-
add
Add all values for the "group by" columns in the page to the hash and pass the ordinals to the providedGroupingAggregatorFunction.AddInput.This call will not
Releasable.close()addInput. -
lookup
public abstract org.elasticsearch.core.ReleasableIterator<IntBlock> lookup(Page page, ByteSizeValue targetBlockSize) Lookup all values for the "group by" columns in the page to the hash and return anIteratorof the values. The sum ofBlock.getPositionCount()for all blocks returned by the iterator will equalPage.getPositionCount()but will "target" a size oftargetBlockSize.The returned
ReleasableIteratormay retain a reference toBlocks inside thePage. Close it to release those references. -
getKeys
Returns aBlockthat contains all the keys that are inserted byadd(org.elasticsearch.compute.data.Page, org.elasticsearch.compute.aggregation.GroupingAggregatorFunction.AddInput).Keys must be in the same order as the IDs returned by
nonEmpty(). -
nonEmpty
The grouping ids that are not empty. We use this because some block hashes reserve space for grouping ids and then don't end up using them. For example,BooleanBlockHashdoes this by always assigningfalseto0andtrueto1. It's only after collection when we know if there actually were anytrueorfalsevalues received.IDs must be in the same order as the keys returned by
getKeys(). -
seenGroupIds
Description copied from interface:SeenGroupIdsThe grouping ids that have been seen already. ThisBitArrayis kept and mutated by the caller so make a copy if it's something you need your own copy of it.- Specified by:
seenGroupIdsin interfaceSeenGroupIds
-
build
public static BlockHash build(List<BlockHash.GroupSpec> groups, BlockFactory blockFactory, int emitBatchSize, boolean allowBrokenOptimizations) Creates a specialized hash table that maps one or moreBlocks to ids.- Parameters:
emitBatchSize- maximum batch size to be emitted when handling combinatorial explosion of groups caused by multivalued fieldsallowBrokenOptimizations- true to allow optimizations with bad null handling. We will fix their null handling and remove this flag, but we need to disable these in production until we can. And this lets us continue to compile and test them.
-
buildPackedValuesBlockHash
public static BlockHash buildPackedValuesBlockHash(List<BlockHash.GroupSpec> groups, BlockFactory blockFactory, int emitBatchSize) Temporary method to build aPackedValuesBlockHash. -
buildCategorizeBlockHash
public static BlockHash buildCategorizeBlockHash(List<BlockHash.GroupSpec> groups, AggregatorMode aggregatorMode, BlockFactory blockFactory, AnalysisRegistry analysisRegistry, int emitBatchSize) Builds a BlockHash for the Categorize grouping function. -
hashOrdToGroup
public static long hashOrdToGroup(long ord) Convert the result of callingLongHashorLongLongHashorBytesRefHashor similar to a group ordinal. These hashes return negative numbers if the value that was added has already been seen. We don't use that and convert it back to the positive ord. -
hashOrdToGroupNullReserved
public static long hashOrdToGroupNullReserved(long ord) Convert the result of callingLongHashorLongLongHashorBytesRefHashor similar to a group ordinal, reserving0for null.
-