Class SpatialDocValuesExtraction


This rule is responsible for marking spatial fields to be extracted from doc-values instead of source values. This is a very specific optimization that is only used in the context of spatial aggregations. Normally spatial fields are extracted from source values because this maintains original precision, but is very slow. Simply loading from doc-values loses precision for points, and loses the geometry topological information for shapes. For this reason we only consider loading from doc values under very specific conditions:
  • The spatial data is consumed by a spatial aggregation (eg. ST_CENTROIDS_AGG, negating the need for precision.
  • This aggregation is planned to run on the data node, so the doc-values Blocks are never transmit to the coordinator node.
  • The data node index in question has doc-values stored for the field in question.
While we do not support transmitting spatial doc-values to the coordinator node, it is still important on the data node to ensure that all spatial functions that will receive these doc-values are aware of this fact. For this reason, if the above conditions are met, we need to make four edits to the local physical plan to consistently support spatial doc-values:
  • The spatial aggregation function itself is marked using withDocValues() to enable its toEvaluator() method to produce the correct doc-values aware Evaluator functions.
  • Any spatial functions called within EVAL commands before the doc-values are consumed by the aggregation also need to be marked using withDocValues() so their evaluators are correct.
  • Any spatial functions used within filters, WHERE commands, are similarly marked for the same reason.
  • The FieldExtractExec that will extract the field is marked with withDocValuesAttributes(...) so that it calls the FieldType.blockReader() method with the correct FieldExtractPreference
The question has been raised why the spatial functions need to know if they are using doc-values or not. At first glance one might perceive ES|QL functions as being logical planning only constructs, reflecting only the intent of the user. This, however, is not true. The ES|QL functions all contain the runtime implementation of the functions behaviour, in the form of one or more static methods, as well as a toEvaluator() instance method that is used to generates Block traversal code to call these runtime implementations, based on some internal state of the instance of the function. In most cases this internal state contains information determined during the logical planning phase, such as the field name and type, and whether it is a literal and can be folded. In the case of spatial functions, the internal state also contains information about whether the function is using doc-values or not. This knowledge is determined in the class being described here, and is only determined during local physical planning on each data node. This is because the decision to use doc-values is based on the local data node's index configuration, and the local physical plan is the only place where this information is available. This also means that the knowledge of the usage of doc-values does not need to be serialized between nodes, and is only used locally.