- All Known Implementing Classes:
AbstractBooleansBlockLoader,AbstractBytesRefsFromOrdsBlockLoader,AbstractDoublesFromDocValuesBlockLoader,AbstractIntsFromDocValuesBlockLoader,AbstractLongsFromDocValuesBlockLoader,AbstractShapeGeometryFieldMapper.AbstractShapeGeometryFieldType.BoundsBlockLoader,BlockDocValuesReader.DocValuesBlockLoader,BlockLoader.Delegating,BlockSourceReader.BooleansBlockLoader,BlockSourceReader.BytesRefsBlockLoader,BlockSourceReader.DenseVectorBlockLoader,BlockSourceReader.DoublesBlockLoader,BlockSourceReader.GeometriesBlockLoader,BlockSourceReader.IntsBlockLoader,BlockSourceReader.IpsBlockLoader,BlockSourceReader.LongsBlockLoader,BlockStoredFieldsReader.BytesFromBytesRefsBlockLoader,BlockStoredFieldsReader.BytesFromStringsBlockLoader,BlockStoredFieldsReader.IdBlockLoader,BlockStoredFieldsReader.StoredFieldsBlockLoader,BooleansBlockLoader,BytesRefsFromBinaryBlockLoader,BytesRefsFromCustomBinaryBlockLoader,BytesRefsFromOrdsBlockLoader,DenseVectorBlockLoader,DenseVectorFromBinaryBlockLoader,DoublesBlockLoader,FallbackSyntheticSourceBlockLoader,IntsBlockLoader,LongsBlockLoader,MvMaxBooleansBlockLoader,MvMaxBytesRefsFromOrdsBlockLoader,MvMaxDoublesFromDocValuesBlockLoader,MvMaxIntsFromDocValuesBlockLoader,MvMaxLongsFromDocValuesBlockLoader,MvMinBooleansBlockLoader,MvMinBytesRefsFromOrdsBlockLoader,MvMinDoublesFromDocValuesBlockLoader,MvMinIntsFromDocValuesBlockLoader,MvMinLongsFromDocValuesBlockLoader,SourceFieldBlockLoader,TimeSeriesMetadataFieldBlockLoader,Utf8CodePointsFromOrdsBlockLoader
Think of a Block as an array of values for a sequence of lucene documents. That's
almost true! For the purposes of implementing BlockLoader, it's close enough.
The compute engine operates on arrays because the good folks that build CPUs have
spent the past 40 years making them really really good at running tight loops over
arrays of data. So we play along with the CPU and make arrays.
How to implement
There are a lot of interesting choices hiding in here to make getting those arrays out of lucene work well:
-
doc_valuesare already on disk in array-like structures so we prefer to just copy them into an array in one loop insideBlockLoader.ColumnAtATimeReader. Well, not entirely array-like.doc_valuesare designed to be read in non-descending order (think0, 1, 1, 4, 9) and will fail if they are read truly randomly. This lets the doc values implementations have some chunking/compression/magic on top of the array-like on disk structure. The caller manages this, always puttingBlockLoader.Docsin non-descending order. ExtendBlockDocValuesReaderto implement all this. -
All stored
storedfields for each document are stored on disk together, compressed with a general purpose compression algorithm like Zstd. Blocks of documents are compressed together to get a better compression ratio. Just like doc values, we read them in non-descending order. Unlike doc values, we read all fields for a document at once. Because reading one requires decompressing them all. We do this by returningnullfromcolumnAtATimeReader(org.apache.lucene.index.LeafReaderContext)to signal that we can't load the whole column at once. Instead, we implement aBlockLoader.RowStrideReaderwhich the caller will call once for each doc. ExtendBlockStoredFieldsReaderto implement all this. -
Fields loaded from
_sourceare an extra special case ofstoredfields._sourceitself is just another stored field, compressed in chunks with all the other stored fields. It's the original bytes sent when indexing the document. Thinkjsonoryaml. When we need fields from_sourcewe get it from the stored fields reader infrastructure and then explode it into aMaprepresenting the originaljsonand theBlockLoader.RowStrideReaderimplementation grabs the parts of thejsonit needs. ExtendBlockSourceReaderto implement all this. -
Synthetic
_sourcecomplicates this further by storing fields in somewhat unexpected places, but is otherwise like astoredfield reader. UseFallbackSyntheticSourceBlockLoaderto implement all this.
How many to implement
Generally reads are faster from doc_values, slower from stored fields,
and even slower from _source. If we get to chose, we pick doc_values.
But we work with what's on disk and that's a product of the field type and what the user's
configured. Picking the optimal choice given what's on disk is the responsibility of each
field's MappedFieldType.blockLoader(org.elasticsearch.index.mapper.MappedFieldType.BlockLoaderContext) method. The more configurable the field's
storage strategies the more BlockLoaders you have to implement to integrate it
with ESQL. It can get to be a lot. Sorry.
For a field to be supported by ESQL fully it has to be loadable if it was configured to be
stored in any way. It's possible to turn off storage entirely by turning off
doc_values and _source and stored fields. In that case, it's
acceptable to return BlockLoader.ConstantNullsReader. User turned the field off, best we can do
is null.
We also sometimes want to "push" executing some ESQL functions into the block loader itself.
Usually we do this when it's a ton faster. See the docs for BlockLoaderExpression
for why and how we do this.
For example, long fields implement these block loaders:
-
LongsBlockLoaderto read fromdoc_values. -
BlockSourceReader.LongsBlockLoaderto read from_source. -
A specially configured
FallbackSyntheticSourceBlockLoaderto read synthetic_source. -
MvMinLongsFromDocValuesBlockLoaderto readMV_MIN(long_field)fromdoc_values. -
MvMaxLongsFromDocValuesBlockLoaderto readMV_MAX(long_field)fromdoc_values.
NOTE: We can't read from longs from stored fields which is a
bug, but maybe not
a terrible one because it's very uncommon to configure long to be stored
but to disable _source and doc_values. Nothing's perfect. Especially
code.
Why is BlockLoader.AllReader?
When we described how to read from doc_values we said we prefer
to use BlockLoader.ColumnAtATimeReader. But some callers don't support reading column-at-a-time
and need to read row-by-row. So we also need an implementation of BlockLoader.RowStrideReader
that reads from doc_values. Usually it's most convenient to implement both of those
in the same class. BlockLoader.AllReader is an interface for those sorts of classes, and
you'll see it in the doc_values code frequently.
Why is rowStrideStoredFieldSpec()?
When decompressing stored fields lucene can skip stored field that aren't used. They
still have to be decompressed, but they aren't turned into java objects which saves a fair bit
of work. If you don't need any stored fields return StoredFieldsSpec.NO_REQUIREMENTS.
Otherwise, return what you need.
Thread safety
Instances of this class must be immutable and thread safe. Instances of
BlockLoader.ColumnAtATimeReader and BlockLoader.RowStrideReader are all mutable and can only
be accessed by one thread at a time but may be passed between threads.
See implementations BlockLoader.Reader.canReuse(int) for how that's handled. "Normal" java objects
don't need to do anything special to be kicked from thread to thread - the transfer itself
establishes a happens-before relationship that makes everything you need visible.
But Lucene's readers aren't "normal" java objects and sometimes need to be rebuilt if we
shift threads.
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic interfacestatic interfacestatic interfaceA columnar representation of homogenous data.static interfaceBuilds block "builders" for loading data into blocks for the compute engine.static interfacestatic interfaceA builder for typed values.static interfacestatic interfacestatic classImplementation ofBlockLoader.ColumnAtATimeReaderandBlockLoader.RowStrideReaderthat always loadsnull.static classstatic interfaceA list of documents to load.static interfacestatic interfacestatic interfacestatic interfacestatic interfacestatic interfaceAn interface for readers that attempt to load all document values in a column-at-a-time fashion.static interfacestatic interfacestatic interfaceSpecialized builder for collecting dense arrays of BytesRef values.static interfaceSpecialized builder for collecting dense arrays of double values.static interfaceSpecialized builder for collecting dense arrays of double values.static interfaceSpecialized builder for collecting dense arrays of long values.static interfacestatic interfacestatic interfacestatic interface -
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionbuilder(BlockLoader.BlockFactory factory, int expectedCount) TheBlockLoader.Builderfor data of this type.columnAtATimeReader(org.apache.lucene.index.LeafReaderContext context) Build a column-at-a-time reader.static BlockLoaderconstantBytes(org.apache.lucene.util.BytesRef value) Load blocks with onlyvalue.default BlockLoader.Blockconvert(BlockLoader.Block block) In support of 'Union Types', we sometimes desire that Blocks loaded from source are immediately converted in some way.org.apache.lucene.index.SortedSetDocValuesordinals(org.apache.lucene.index.LeafReaderContext context) Load ordinals for the provided context.rowStrideReader(org.apache.lucene.index.LeafReaderContext context) Build a row-by-row reader.Whatstoredfields are needed by this reader.booleanDoes this loader support loading bytes via callingordinals(org.apache.lucene.index.LeafReaderContext).
-
Field Details
-
CONSTANT_NULLS
Load blocks with only null.
-
-
Method Details
-
builder
TheBlockLoader.Builderfor data of this type. Called when loading from a multi-segment or unsorted block. -
columnAtATimeReader
@Nullable BlockLoader.ColumnAtATimeReader columnAtATimeReader(org.apache.lucene.index.LeafReaderContext context) throws IOException Build a column-at-a-time reader. May returnnullif the underlying storage needs to be loaded row-by-row. Callers should try this first, only falling back torowStrideReader(org.apache.lucene.index.LeafReaderContext)if this returnsnullor if they can't load column-at-a-time themselves.- Throws:
IOException
-
rowStrideReader
BlockLoader.RowStrideReader rowStrideReader(org.apache.lucene.index.LeafReaderContext context) throws IOException Build a row-by-row reader. Must never returnnull, evan if the underlying storage prefers to be loaded column-at-a-time. Some callers simply can't load column-at-a-time so all implementations must support this method.- Throws:
IOException
-
rowStrideStoredFieldSpec
StoredFieldsSpec rowStrideStoredFieldSpec()Whatstoredfields are needed by this reader. -
supportsOrdinals
boolean supportsOrdinals()Does this loader support loading bytes via callingordinals(org.apache.lucene.index.LeafReaderContext). -
ordinals
org.apache.lucene.index.SortedSetDocValues ordinals(org.apache.lucene.index.LeafReaderContext context) throws IOException Load ordinals for the provided context.- Throws:
IOException
-
convert
In support of 'Union Types', we sometimes desire that Blocks loaded from source are immediately converted in some way. Typically, this would be a type conversion, or an encoding conversion.- Parameters:
block- original block loaded from source- Returns:
- converted block (or original if no conversion required)
-
constantBytes
Load blocks with onlyvalue.
-