All Implemented Interfaces:
NamedWriteable, Writeable, Resolvable, EvaluatorMapper
Direct Known Subclasses:
Atan2, Case, Chunk, CIDRMatch, Clamp, ClampMax, ClampMin, Coalesce, Concat, Contains, CopySign, DateParse, Decay, EndsWith, EsqlConfigurationFunction, ExtractHistogramComponent, FromAggregateMetricDouble, Greatest, Hash, HistogramPercentile, Hypot, In, IpPrefix, Least, Left, Locate, Log, MvAppend, MvPercentile, MvPSeriesWeightedSum, MvSlice, MvSort, MvZip, NetworkDirection, Pow, Repeat, Replace, Right, Round, RoundTo, Scalb, StartsWith, Substring, ToIp, UnaryScalarFunction

public abstract class EsqlScalarFunction extends ScalarFunction implements EvaluatorMapper
A ScalarFunction is a Function that makes one output value per input row. It operates on a whole Page of inputs at a time, building a Block of results.

You see them in the language everywhere:

  • | EVAL foo_msg = CONCAT("foo ", message)
  • | EVAL foo_msg = a + b
  • | WHERE STARTS_WITH(a, "rabbit")
  • | WHERE a == b
  • | STATS AGG BY ----> a + b <---- this is a scalar
  • | STATS AGG(----> a + b <---- this is a scalar)

Let's work the example of CONCAT("foo ", message). It's called with a Page of inputs and resolves both of its parameters, yielding a constant block containing "foo " and a Block of strings containing message. It can expect to receive thousands of message values in that block. Then it builds and returns the block "foo <message>".


   foo | message | result
   --- | ------- | ----------
   foo | bar     | foo bar
   foo | longer  | foo longer
   ... a thousand rows ...
   foo | baz     | foo baz
 

It does this once per input Page.

We have a guide for writing these in the javadoc for org.elasticsearch.xpack.esql.expression.function.scalar.

Optimizations

Scalars are a huge part of the language, and we have a ton of different classes of optimizations for them that exist on a performance spectrum:


  Better         Load Less and
 than O(rows)     Run Faster               Run Faster                 Page-at-a-time     Tuple-at-a-time
     |----------------|-------------------------|------------------------------|-------------------|
     ^  ^  ^     ^    ^      ^                  ^           ^    ^   ^     ^   ^      ^            ^
    CF LT ET    FP   BL     MBL                SE          NO  SIMD RR    VD EVAL    EVE         CASE
 

CF: Constant Folding


   | EVAL a = CONCAT("some ", "words")
 

The fastest way to run a scalar, now and forever, is to run it at compile time. Turn it into a constant and propagate it throughout the query. This is called "constant folding" and all scalars, when their arguments are constants, are "folded" to a constant.

LT: Lucene's TopN


     FROM index METADATA _score
   | WHERE title:"cat"
   | SORT _score DESC
   | LIMIT 10
 

     FROM index
   | EVAL distance = ST_DISTANCE(point, "POINT(12.5683 55.6761)")
   | SORT distance ASC
   | LIMIT 10
 

Fundamentally, Lucene is a tuple-at-a-time engine that flows the min-competitive sort key back into the index iteration process, allowing it to skip huge swaths of documents. It has quite a few optimizations that soften the blow of it being tuple-at-a-time, so these days "push to a lucene topn" is the fastest way you are going to run a scalar function. For that to work it has to be a SORT key and all the filters have to be pushable to lucene and lucene has to know how to run the function natively. See PushTopNToSource.

ET: Engine TopN (HYPOTHETICAL)


     FROM index METADATA _score
   | WHERE title:"cat"
   | WHERE a < j + LENGTH(candy) // <--- anything un-pushable
   | SORT _score DESC
   | LIMIT 10
 

If ESQL's TopNOperator exposed the min-competitive information (see above), and we fed it back into the lucene query operators then we too could do better than O(matching_rows) for queries sorting on the results of a scalar. This is like the LT but without as many limitations. Lucene has a 20-year head start on us optimizing TopN, so we should continue to use them when See issue.

BL: Push to BlockLoader


     FROM index
   | EVAL s = V_COSINE(dense_vector, [0, 1, 2])
   | SORT s desc
   | LIMIT 10
 

     FROM index
   | STATS SUM(LENGTH(message)) // Length is pushed to the BlockLoader
 

Some functions can take advantage of the on-disk structures to run very fast and should be "fused" into field loading using BlockLoaderExpression. Functions like V_COSINE can use the vector search index to compute the result. Functions like MV_MIN can use the doc_values encoding mechanism to save a ton of work. Functions like the upcoming ST_SIMPLIFY benefit from this by saving huge numbers of allocations even if they can't link into the doc_values format. We do this by building a BlockLoader for each FUNCTION x FIELD_TYPE x storage mechanism combination so we can get as much speed as possible.

MBL: Push to a "mother ship" BlockLoader (HYPOTHETICAL)


     FROM index
   | STATS SUM(LENGTH(message)), // All of these are pushed to a single BlockLoader
           SUM(SUBSTRING(message, 0, 4)),
        BY trail = SUBSTRING(message, 10, 3)
 

Pushing functions to a BlockLoader can involve building a ton of distinct BlockLoaders. Which involves a ton of code and testing and, well, work. But it's worth it if you are applying a single function to a field and every single cycle counts. Both of these cry out for a more OO-style solution where you build a "mother ship" BlockLoader that operates on, say FIELD_TYPE x storage mechanism and then runs a list of FUNCTION operations. In some cases this is a bad idea, which is why we haven't built it yet. But in plenty of cases it's fine. And, sometimes, we should be fine skipping the special purpose block loader in favor of the mother ship. We'd spent a few more cycles on each load, but the maintenance advantage is likely worth it for some functions.

EVAL: Page-at-a-time evaluation

ESQL evaluates whole pages at once, generally walking a couple of arrays in parallel building a result array. This makes which bits are the "hot path" very obvious - they are the loops that walk these arrays. We put the "slower" stuff outside those loops:

  • scratch allocations
  • profiling

VD: Vector Dispatch

In Elasticsearch it's normal for fields to sometimes be null or multivalued. There are no constraints on the schema preventing this and, as a search engine, it's pretty normal to model things as multivalued fields. We rarely know that a field can only be single-valued when we're planning a query.

It's much faster to run a scalar when we know that all of its inputs are single valued and non-null. So every scalar function that uses the code generation keyed by the Evaluator, ConvertEvaluator, and MvEvaluator annotations builds two paths:

  • The slower "Block" path that supports nulls and multivalued fields
  • The faster "Vector" path that supports only single-valued, non-null fields

NO: Native Ordinal Evaluation


     FROM index
   | STATS MAX(foo) BY TO_UPPER(verb)
 

keyword and ip fields load their byte[] shaped values as a lookup table, called "ordinals" because Lucene uses that word for it. Some of our functions, like TO_UPPER, process the lookup table itself instead of processing each position. This is especially important when grouping on the field because the hashing done by the aggregation code also operates on the lookup table.

SE: Sorted Execution


     FROM index
   | STATS SUM(MV_DEDUPE(file_size))
 

Some functions can operate on multivalued fields much faster if their inputs are sorted. And inputs loaded from doc_values are sorted by default. Sometimes even sorted AND deduplicated. We store this information on each block in Block.MvOrdering.

NOTE: Functions that can take advantage of this sorting also tend to be NOOPs for single-valued inputs. So they benefit hugely from "Vector Dispatch".

SIMD: Single Instruction Multiple Data instructions


     FROM index
   | STATS MAX(lhs + rhs)
 

Through a combination of "Page-at-a-time evaluation", and "Vector Dispatch" we often end up with at least one path that can be turned into a sequence of SIMD instructions. These are about as fast as you can go and still be `O(matching_rows)`. A lot of scalars don't lend themselves perfectly to SIMD, but we make sure those that do can take that route.

RR: Range Rewrite


     FROM index
   | STATS COUNT(*) BY DATE_TRUNC(1 DAY, @timestamp)
 

Functions like DATE_TRUNC can be quite slow, especially when they are using a time zone. It can be much faster if it knows the range of dates that it's operating on. And we do know that on the data node! We use that information to rewrite the possibly-slow DATE_TRUNC to the always fast ROUND_TO, which rounds down to fixed rounding points.

At the moment this is only done for DATE_TRUNC which is a very common function, but is technically possible for anything that could benefit from knowing the range up front.

FP: Filter Pushdown


     FROM index
   | STATS COUNT(*) BY DATE_TRUNC(1 DAY, @timestamp)
 

If the "Range Rewrite" optimization works, we can sometimes further push the resulting ROUND_TO into a sequence of filters. If you are just counting documents then this can use the LuceneCountOperator which can count the number of matching documents directly from the cache, technically being faster than O(num_hits), but only in ideal circumstances. If we can't push the count then it's still very very fast. See PR.

EVE: Expensive Variable Evaluator


     FROM index
   | EVAL ts = DATE_PARSE(SUBSTRING(message, 1, 10), date_format_from_the_index)
 

Functions like DATE_PARSE need to build something "expensive" per input row, like a DateFormatter. But, often, the expensive thing is constant. In the example above the date format comes from the index, but that's quite contrived. These functions generally run in the form:


     FROM index
   | EVAL ts = DATE_PARSE(SUBSTRING(message, 1, 10), "ISO8601")
 

These generally have special case evaluators that don't construct the format for each row. The others are "expensive variable evaluators" and we avoid them when we can.

CASE: CASE is evaluated row-by-row


     FROM index
   | EVAL f = CASE(d > 0, n / d, 0)
 

     FROM index
   | EVAL f = COALESCE(d, 1 / j)
 

CASE and COALESCE short circuit. In the top example above, that means we don't run n / d unless d > 0. That prevents us from emitting warnings for dividing by 0. In the second example, we don't run 1 / j unless d is null. In the worst case, we manage this by running row-by-row which is super slow. Especially because the engine was designed for page-at-a-time execution.

In the best case COALESCE can see that an input is either all-null or all-non-null. Then it never falls back to row-by-row evaluation and is quite fast.

CASE has a similar optimization: For each incoming Page, if the condition evaluates to a constant, then it executes the corresponding "arm" Page-at-a-time. Also! If the "arms" are "fast" and can't throw warnings, then CASE can execute "eagerly" - evaluating all three arguments and just plucking values back and forth. The "eager" CASE evaluator is effectively the same as any other page-at-a-time evaluator.