Package org.elasticsearch.xpack.esql.expression.function.aggregate
Guide to adding new aggregate function
-
Aggregation functions are more complex than scalar functions, so it’s a good idea to discuss
the new function with the ESQL team before starting to implement it.
You may also discuss its implementation, as aggregations may require special performance considerations.
-
To learn the basics about making functions, check
org.elasticsearch.xpack.esql.expression.function.scalar.It has the guide to making a simple function, which should be a good base to start doing aggregations.
-
Pick one of the csv-spec files in
x-pack/plugin/esql/qa/testFixtures/src/main/resources/and add a test for the function you want to write. These files are roughly themed but there isn’t a strong guiding principle in the organization. -
Rerun the
CsvTestsand watch your new test fail. -
Find an aggregate function in this package similar to the one you are working on and copy it to build
yours.
Your function might extend from the available abstract classes. Check the javadoc of each before using them:
-
AggregateFunction: The base class for aggregates -
NumericAggregate: Aggregation for numeric values -
SpatialAggregateFunction: Aggregation for spatial values
-
-
Fill the required methods in your new function. Check their JavaDoc for more information.
Here are some of the important ones:
-
Constructor: Review the constructor annotations, and make sure to add the correct types and descriptions.
FunctionInfo, for the constructor itselfParam, for the function parameters
-
resolveType: Check the metadata of your function parameters. This may include types, whether they are foldable or not, or their possible values. -
dataType: This will return the datatype of your function. May be based on its current parameters. -
Implement
SurrogateExpression, and its requiredSurrogateExpression.surrogate()method.It’s used to be able to fold the aggregation when it receives only literals, or when the aggregation can be simplified.
ToAggregator(More information about aggregators below). The only case when this interface is not required is when it always returns another function in its surrogate. -
Constructor: Review the constructor annotations, and make sure to add the correct types and descriptions.
-
To introduce your aggregation to the engine:
-
Add it to
org.elasticsearch.xpack.esql.planner.AggregateMapper. Check all usages of other aggregations there, and replicate the logic. -
Implement serialization for your aggregation by implementing
NamedWriteable.getWriteableName(),Writeable.writeTo(org.elasticsearch.common.io.stream.StreamOutput), and a deserializing constructor. Then add anNamedWriteableRegistry.Entryconstant and add that constant to the list inAggregateWritables.getNamedWriteables(). -
Do the same with
EsqlFunctionRegistry.
-
Add it to
Creating aggregators for your function
Aggregators contain the core logic of your aggregation. That is, how to combine values, what to store, how to process data, etc.
-
Copy an existing aggregator to use as a base. You'll usually make one per type. Check other classes to see the naming pattern.
You can find them in
org.elasticsearch.compute.aggregation.Note that some aggregators are autogenerated, so they live in different directories. The base is
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/ -
The methods in the aggregator will define how it will work:
- Adding the `type init()` method will autogenerate the code to manage the state, using your returned value as the initial value for each group.
-
Adding the `type initSingle()` or `type initGrouping()` methods will use the state object you return there instead.
You will also have to provide `evaluateIntermediate()` and `evaluateFinal()` methods this way.
-
If it's also a
GroupingAggregator, you should provide the same methods as commented before:- Add an `initGrouping()`, unless you're using the `init()` method
- Add all the other methods, with the state parameter of the type of your `initGrouping()`.
-
Make a test for your aggregator.
You can copy an existing one from
x-pack/plugin/esql/compute/src/test/java/org/elasticsearch/compute/aggregation/.Tests extending from
org.elasticsearch.compute.aggregation.AggregatorFunctionTestCasewill already include most required cases. You should only need to fill the required abstract methods. -
Check the Javadoc of the
AggregatorandGroupingAggregatorannotations. Add/Modify them on your aggregator. -
The
AggregatorJavaDoc explains the static methods you should add. -
After implementing the required methods (Even if they have a dummy implementation),
run the CsvTests to generate some extra required classes.
One of them will be the
AggregatorFunctionSupplierfor your aggregator. Find it by its name (<Aggregation-name><Type>AggregatorFunctionSupplier), and return it in thetoSuppliermethod in your function, under the correct type condition. - Now, complete the implementation of the aggregator, until the tests pass!
StringTemplates
Making an aggregator per type may be repetitive. To avoid code duplication, we use StringTemplates:
-
Create a new StringTemplate file.
Use another as a reference, like
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/X-TopAggregator.java.st. -
Add the template scripts to
x-pack/plugin/esql/compute/build.gradle.You can also see there which variables you can use, and which types are currently supported.
-
After completing your template, run the generation with
./gradlew :x-pack:plugin:esql:compute:compileJava.You may need to tweak some import orders per type so they don’t raise warnings.
-
ClassesClassDescriptionA type of
Functionthat takes multiple values and extracts a single value out of them.Basic wrapper for expressions declared with a nested filter (typically in stats).Aggregate function that receives a numeric, signed field, and returns a single double value.All spatial aggregate functions extend this class to enable the planning of reading from doc values for higher performance.Calculate spatial centroid of all geo_point or cartesian point values of a field in matching documents.Calculate spatial extent of all values of a field in matching documents.Sum all values of a field in matching documents.An internal aggregate function that always emits intermediate (or partial) output regardless of the aggregate mode.