Package org.elasticsearch.xpack.esql
package org.elasticsearch.xpack.esql
The ES|QL query language.
Overview
ES|QL is a typed query language which consists of many small languages separated by the|
character. Like this:
FROM foo
| WHERE a > 1
| STATS m=MAX(j)
| SORT m ASC
| LIMIT 10
Here the FROM, WHERE, STATS, SORT, and LIMIT keywords
enable the mini-language for selecting indices, filtering documents, calculate aggregates,
sorting results, and limiting the number of results respectively.
Language Design Goals
In designing ES|QL we have some principals and rules of thumb:- Don't waste people's time
- Progress over perfection
- Design for Elasticsearch
- Be inspired by the best
Don't waste people's time
- Queries should not fail at runtime. Instead we should return a
warningandnull. - It is ok to fail a query up front at analysis time. Just not after it's started.
- It is better if things can be made to work.
- But genuinely confusing requests require the query writing to make a choice.
As you can see this is a real tight rope, but we try to follow the rules above in order. Examples:
- If
TO_DATETIMEreceives an invalid date at runtime, it emits a WARNING. - If
DATE_EXTRACTreceives an invalid extract configuration at query parsing time it fails to start the query. 1 + 3.2promotes both sides to adouble.1 + "32"fails at query compile time and the query writer must decide to either writeCONCAT(TO_STRING(1), "32")or1 + TO_INT("32").
Progress over perfection
- Stability is super important for released features.
- But we need to experiment and get feedback. So mark features
experimentalwhen there's any question about how they should work. - Experimental features shouldn't live forever because folks will get tired of waiting and use them in production anyway. We don't officially support them in production but we will feel bad if they break.
Design for Elasticsearch
We must design the language for Elasticsearch, celebrating its advantages smoothing out its and quirks.doc_valuessometimes sorts field values and sometimes sorts and removes duplicates. We couldn't hide this even if we want to and most folks are ok with it. ES|QL has to be useful in those cases.- Multivalued fields are very easy to index in Elasticsearch so they should be easy to read in ES|QL. They should be easy to work with in ES|QL too, but we haven't gotten that far yet.
Be inspired by the best
We'll frequently have lots of different choices on how to implement a feature. We should talk and figure out the best way for us, especially considering Elasticsearch's advantages and quirks. But we should also look to our data-access-forebears:- PostgreSQL is the GOAT SQL implementation. It's a joy to use for everything but dates. Use DB Fiddle to link to syntax examples.
- Oracle is pretty good about dates. It's fine about a lot of things but PostgreSQL is better.
- MS SQL Server has a silly name but its documentation is wonderful.
- SPL is super familiar to our users, and is a piped query language.
Major Components
Compute Engine
org.elasticsearch.compute - The compute engine drives query execution
Block- fundamental unit of data. Operations vectorize over blocks.Page- Data is broken up into pages (which are collections of blocks) to manage size in memory
Core Classes
org.elasticsearch.xpack.esql.core - Core Classes
EsqlSession- Connects all major components and contains the high-level code for query executionDataType- ES|QL is a typed language, and all the supported data types are listed in this collection.Expression- Expression is the basis for all functions in ES|QL, but see alsoEvaluatorMapperEsqlFunctionRegistry- Resolves function names to function implementations.SyncandasyncHTTP API entry points
Query Planner
The query planner encompasses the logic of how to serve a query. Essentially, this covers everything from the output of the Antlr parser through to the actual computations and lucene operations.
Two key concepts in the planner layer:
- Logical vs Physical optimization - Logical optimizations refer to things that can be done strictly based on the structure of the query, while Physical optimizations take into account information about the index or indices the query will execute against
- Local vs non-local operations - "local" refers to operations happening on the data nodes, while non-local operations generally happen on the coordinating node and can apply to all participating nodes in the query
Query Planner Steps
LogicalPlanBuildertranslates from Antlr data structures to our data structuresPreAnalyzerfinds involved indicesAnalyzerresolves referencesVerifierdoes type checkingLogicalPlanOptimizerapplies many optimizationsMappertranslates logical plans to phyisical plansPhysicalPlanOptimizer- decides what plan fragments to send to which data nodesLocalLogicalPlanOptimizerapplies index-specific optimizations, and reapplies top level logical optimizationsLocalPhysicalPlanOptimizerLucene push down and similarLocalExecutionPlannerCreates the compute engine objects to carry out the query
Guides
Code generation
ES|QL uses two kinds of code generation which is uses mostly to monomorphize tight loops. That process would require a lot of copy-and-paste with small tweaks and some of us have copy-and-paste blindness so instead we use code generation.- When possible we use StringTemplate to build
Java files. These files typically look like
X-Blah.java.stand are typically used for things like the differentBlocktypes and their subclasses and aggregation state. The templates themselves are easy to read and edit. This process is appropriate for cases where you just have to copy and paste something and change a few lines here and there. Seebuild.gradlefor the code generators. - When that doesn't work, we use
Annotation processing and JavaPoet to build the Java files.
These files are typically the inner loops for
EvalOperator.ExpressionEvaluatororAggregatorFunction. The code generation is much more difficult to write and debug but much, much, much, much more flexible. The degree of control we have during this code generation is amazing but it is much harder to debug failures. See files inorg.elasticsearch.compute.genfor the code generators.
-
ClassDescriptionA "column" from a
tableprovided in the request.