Package org.elasticsearch.xpack.esql.expression.function.aggregate


package org.elasticsearch.xpack.esql.expression.function.aggregate
Functions that aggregate values, with or without grouping within buckets. Used in `STATS` and similar commands.

Guide to adding new aggregate function

  1. Aggregation functions are more complex than scalar functions, so it’s a good idea to discuss the new function with the ESQL team before starting to implement it.

    You may also discuss its implementation, as aggregations may require special performance considerations.

  2. To learn the basics about making functions, check org.elasticsearch.xpack.esql.expression.function.scalar.

    It has the guide to making a simple function, which should be a good base to start doing aggregations.

  3. Pick one of the csv-spec files in x-pack/plugin/esql/qa/testFixtures/src/main/resources/ and add a test for the function you want to write. These files are roughly themed but there isn’t a strong guiding principle in the organization.
  4. Rerun the CsvTests and watch your new test fail.
  5. Find an aggregate function in this package similar to the one you are working on and copy it to build yours. Your function might extend from the available abstract classes. Check the javadoc of each before using them:
  6. Fill the required methods in your new function. Check their JavaDoc for more information. Here are some of the important ones:
    • Constructor: Review the constructor annotations, and make sure to add the correct types and descriptions.
    • resolveType: Check the metadata of your function parameters. This may include types, whether they are foldable or not, or their possible values.
    • dataType: This will return the datatype of your function. May be based on its current parameters.
    • Implement SurrogateExpression, and its required SurrogateExpression.surrogate() method.

      It’s used to be able to fold the aggregation when it receives only literals, or when the aggregation can be simplified.

    Finally, implement ToAggregator (More information about aggregators below). The only case when this interface is not required is when it always returns another function in its surrogate.
  7. To introduce your aggregation to the engine:

Creating aggregators for your function

Aggregators contain the core logic of your aggregation. That is, how to combine values, what to store, how to process data, etc.

  1. Copy an existing aggregator to use as a base. You'll usually make one per type. Check other classes to see the naming pattern. You can find them in org.elasticsearch.compute.aggregation.

    Note that some aggregators are autogenerated, so they live in different directories. The base is x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/

  2. The methods in the aggregator will define how it will work:
    • Adding the `type init()` method will autogenerate the code to manage the state, using your returned value as the initial value for each group.
    • Adding the `type initSingle()` or `type initGrouping()` methods will use the state object you return there instead.

      You will also have to provide `evaluateIntermediate()` and `evaluateFinal()` methods this way.

    Depending on the way you use, adapt your `combine*()` methods to receive one or other type as their first parameters.
  3. If it's also a GroupingAggregator, you should provide the same methods as commented before:
    • Add an `initGrouping()`, unless you're using the `init()` method
    • Add all the other methods, with the state parameter of the type of your `initGrouping()`.
  4. Make a test for your aggregator. You can copy an existing one from x-pack/plugin/esql/compute/src/test/java/org/elasticsearch/compute/aggregation/.

    Tests extending from org.elasticsearch.compute.aggregation.AggregatorFunctionTestCase will already include most required cases. You should only need to fill the required abstract methods.

  5. Check the Javadoc of the Aggregator and GroupingAggregator annotations. Add/Modify them on your aggregator.
  6. The Aggregator JavaDoc explains the static methods you should add.
  7. After implementing the required methods (Even if they have a dummy implementation), run the CsvTests to generate some extra required classes.

    One of them will be the AggregatorFunctionSupplier for your aggregator. Find it by its name (<Aggregation-name><Type>AggregatorFunctionSupplier), and return it in the toSupplier method in your function, under the correct type condition.

  8. Now, complete the implementation of the aggregator, until the tests pass!

StringTemplates

Making an aggregator per type may be repetitive. To avoid code duplication, we use StringTemplates:

  1. Create a new StringTemplate file. Use another as a reference, like x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/X-TopAggregator.java.st.
  2. Add the template scripts to x-pack/plugin/esql/compute/build.gradle.

    You can also see there which variables you can use, and which types are currently supported.

  3. After completing your template, run the generation with ./gradlew :x-pack:plugin:esql:compute:compileJava.

    You may need to tweak some import orders per type so they don’t raise warnings.