Enum Class DataType

java.lang.Object
java.lang.Enum<DataType>
org.elasticsearch.xpack.esql.core.type.DataType
All Implemented Interfaces:
Serializable, Comparable<DataType>, Constable

public enum DataType extends Enum<DataType>
This enum represents data types the ES|QL query processing layer is able to interact with in some way. This includes fully representable types (e.g. LONG, numeric types which we promote (e.g. SHORT) or fold into other types (e.g. DATE_PERIOD) early in the processing pipeline, types for internal use cases (e.g. PARTIAL_AGG), and types which the language doesn't support, but require special handling anyway (e.g. OBJECT)

Process for adding a new data type

Note: it is not expected that all the following steps be done in a single PR. Use capabilities to gate tests as you go, and use as many PRs as you think appropriate. New data types are complex, and smaller PRs will make reviews easier.
  • Create a new feature flag for the type in EsqlCorePlugin. We recommend developing the data type over a series of smaller PRs behind a feature flag; even for relatively simple data types.
  • Add a capability to EsqlCapabilities related to the new type, and gated by the feature flag you just created. Again, using the feature flag is preferred over snapshot-only. As development progresses, you may need to add more capabilities related to the new type, e.g. for supporting specific functions. This is fine, and expected.
  • Create a new CSV test file for the new type. You'll either need to create a new data file as well, or add values of the new type to and existing data file. See CsvTestDataLoader for creating a new data set.
  • In the new CSV test file, start adding basic functionality tests. These should include reading and returning values, both from indexed data and from the ROW command. It should also include functions that support "every" type, such as Case or MvFirst.
  • Add the new type to the CsvTestUtils#Type enum, if it isn't already there. You also need to modify CsvAssert to support reading values of the new type.
  • At this point, the CSV tests should fail with a sensible ES|QL error message. Make sure they're failing in ES|QL, not in the test framework.
  • Add the new data type to this enum. This will cause a bunch of compile errors for switch statements throughout the code. Resolve those as appropriate. That is the main way in which the new type will be tied into the framework.
  • Add the new type to the UNDER_CONSTRUCTION collection. This is used by the test framework to disable some checks around how functions report their supported types, which would otherwise generate a lot of noise while the type is still in development.
  • Add typed data generators to TestCaseSupplier, and make sure all functions that support the new type have tests for it.
  • Work to support things all types should do. Equality and the "typeless" MV functions (MvFirst, MvLast, and MvCount) should work for most types. Case and Coalesce should also support all types. If the type has a natural ordering, make sure to test sorting and the other binary comparisons. Make sure these functions all have CSV tests that run against indexed data.
  • Add conversion functions as appropriate. Almost all types should support ToString, and should have a "ToType" function that accepts a string. There may be other logical conversions depending on the nature of the type. Make sure to add the conversion function to the TYPE_TO_CONVERSION_FUNCTION map in EsqlDataTypeConverter. Make sure the conversion functions have CSV tests that run against indexed data.
  • Support the new type in aggregations that are type independent. This includes Values, Count, and Count Distinct. Make sure there are CSV tests against indexed data for these.
  • Support other functions and aggregations as appropriate, making sure to included CSV tests.
  • Consider how the type will interact with other types. For example, if the new type is numeric, it may be good for it to be comparable with other numbers. Supporting this may require new logic in EsqlDataTypeConverter#commonType, individual function type checking, the verifier rules, or other places. We suggest starting with CSV tests and seeing where they fail.
There are some additional steps that should be taken when removing the feature flag and getting ready for a release:
  • Ensure the capabilities for this type are always enabled
  • Remove the type from the UNDER_CONSTRUCTION collection
  • Fix new test failures related to declared function types
  • Make sure to run the full test suite locally via gradle to generate the function type tables and helper files with the new type. Ensure all the functions that support the type have appropriate docs for it.
  • If appropriate, remove the type from the ESQL limitations list of unsupported types.
  • Enum Constant Details

    • UNSUPPORTED

      public static final DataType UNSUPPORTED
      Fields of this type are unsupported by any functions and are always rendered as null in the response.
    • NULL

      public static final DataType NULL
      Fields that are always null, usually created with constant null values.
    • BOOLEAN

      public static final DataType BOOLEAN
      Fields that can either be true or false.
    • COUNTER_LONG

      public static final DataType COUNTER_LONG
      64-bit signed numbers labeled as metric counters in time-series indices. Although stored internally as numeric fields, they represent cumulative metrics and must not be treated as regular numeric fields. Therefore, we define them differently and separately from their parent numeric field. These fields are strictly for use in retrieval from indices, rate aggregation, and casting to their parent numeric type.
    • COUNTER_INTEGER

      public static final DataType COUNTER_INTEGER
      32-bit signed numbers labeled as metric counters in time-series indices. Although stored internally as numeric fields, they represent cumulative metrics and must not be treated as regular numeric fields. Therefore, we define them differently and separately from their parent numeric field. These fields are strictly for use in retrieval from indices, rate aggregation, and casting to their parent numeric type.
    • COUNTER_DOUBLE

      public static final DataType COUNTER_DOUBLE
      64-bit floating point numbers labeled as metric counters in time-series indices. Although stored internally as numeric fields, they represent cumulative metrics and must not be treated as regular numeric fields. Therefore, we define them differently and separately from their parent numeric field. These fields are strictly for use in retrieval from indices, rate aggregation, and casting to their parent numeric type.
    • LONG

      public static final DataType LONG
      64-bit signed numbers loaded as a java long.
    • INTEGER

      public static final DataType INTEGER
      32-bit signed numbers loaded as a java int.
    • UNSIGNED_LONG

      public static final DataType UNSIGNED_LONG
      64-bit unsigned numbers packed into a java long.
    • DOUBLE

      public static final DataType DOUBLE
      64-bit floating point number loaded as a java double.
    • SHORT

      public static final DataType SHORT
      16-bit signed numbers widened on load to INTEGER. Values of this type never escape type resolution and functions, operators, and results should never encounter one.
    • BYTE

      public static final DataType BYTE
      8-bit signed numbers widened on load to INTEGER. Values of this type never escape type resolution and functions, operators, and results should never encounter one.
    • FLOAT

      public static final DataType FLOAT
      32-bit floating point numbers widened on load to DOUBLE. Values of this type never escape type resolution and functions, operators, and results should never encounter one.
    • HALF_FLOAT

      public static final DataType HALF_FLOAT
      16-bit floating point numbers widened on load to DOUBLE. Values of this type never escape type resolution and functions, operators, and results should never encounter one.
    • SCALED_FLOAT

      public static final DataType SCALED_FLOAT
      Signed 64-bit fixed point numbers converted on load to a DOUBLE. Values of this type never escape type resolution and functions, operators, and results should never encounter one.
    • KEYWORD

      public static final DataType KEYWORD
      String fields that are analyzed when the document is received but never cut into more than one token. ESQL always loads these after-analysis. Generally ESQL uses keyword fields as raw strings. So things like TO_STRING will make a keyword field.
    • TEXT

      public static final DataType TEXT
      String fields that are analyzed when the document is received and may be cut into more than one token. Generally ESQL only sees text fields when loaded from the index and ESQL will load these fields without analysis. The MATCH operator can be used to query these fields with analysis.
    • DATETIME

      public static final DataType DATETIME
      Millisecond precision date, stored as a 64-bit signed number.
    • DATE_NANOS

      public static final DataType DATE_NANOS
      Nanosecond precision date, stored as a 64-bit signed number.
    • IP

      public static final DataType IP
      IP addresses. IPv4 address are always embedded in IPv6. These flow through the compute engine as fixed length, 16 byte BytesRefs.
    • VERSION

      public static final DataType VERSION
      A version encoded in a way that sorts using semver.
    • OBJECT

      public static final DataType OBJECT
    • SOURCE

      public static final DataType SOURCE
    • DATE_PERIOD

      public static final DataType DATE_PERIOD
    • TIME_DURATION

      public static final DataType TIME_DURATION
    • GEO_POINT

      public static final DataType GEO_POINT
    • CARTESIAN_POINT

      public static final DataType CARTESIAN_POINT
    • CARTESIAN_SHAPE

      public static final DataType CARTESIAN_SHAPE
    • GEO_SHAPE

      public static final DataType GEO_SHAPE
    • DOC_DATA_TYPE

      public static final DataType DOC_DATA_TYPE
      Fields with this type represent a Lucene doc id. This field is a bit magic in that:
      • One copy of it is always added at the start of every query
      • It is implicitly dropped before being returned to the user
      • It is not "target-able" by any functions
      • Users shouldn't know it's there at all
      • It is used as an input for things that interact with Lucene like loading field values
    • TSID_DATA_TYPE

      public static final DataType TSID_DATA_TYPE
      Fields with this type represent values from the TimeSeriesIdFieldMapper. Every document in IndexMode.TIME_SERIES index will have a single value for this field and the segments themselves are sorted on this value.
    • PARTIAL_AGG

      public static final DataType PARTIAL_AGG
      Fields with this type are the partial result of running a non-time-series aggregation inside alongside time-series aggregations. These fields are not parsable from the mapping and should be hidden from users.
    • AGGREGATE_METRIC_DOUBLE

      public static final DataType AGGREGATE_METRIC_DOUBLE
    • DENSE_VECTOR

      public static final DataType DENSE_VECTOR
      Fields with this type are dense vectors, represented as an array of double values.
  • Field Details

    • UNDER_CONSTRUCTION

      public static final Map<DataType,FeatureFlag> UNDER_CONSTRUCTION
      Types that are actively being built. These types are not returned from Elasticsearch if their associated FeatureFlag is disabled. They aren't included in generated documentation. And the tests don't check that sending them to a function produces a sane error message.
  • Method Details

    • values

      public static DataType[] values()
      Returns an array containing the constants of this enum class, in the order they are declared.
      Returns:
      an array containing the constants of this enum class, in the order they are declared
    • valueOf

      public static DataType valueOf(String name)
      Returns the enum constant of this class with the specified name. The string must match exactly an identifier used to declare an enum constant in this class. (Extraneous whitespace characters are not permitted.)
      Parameters:
      name - the name of the enum constant to be returned.
      Returns:
      the enum constant with the specified name
      Throws:
      IllegalArgumentException - if this enum class has no constant with the specified name
      NullPointerException - if the argument is null
    • types

      public static Collection<DataType> types()
    • stringTypes

      public static Collection<DataType> stringTypes()
    • fromTypeName

      public static DataType fromTypeName(String name)
      Resolve a type from a name. This name is sometimes user supplied, like in the case of ::<typename> and is sometimes the name used over the wire, like in readFrom(String).
    • fromEs

      public static DataType fromEs(String name)
    • fromJava

      public static DataType fromJava(Object value)
    • isUnsupported

      public static boolean isUnsupported(DataType from)
    • isString

      public static boolean isString(DataType t)
    • isPrimitiveAndSupported

      public static boolean isPrimitiveAndSupported(DataType t)
    • isPrimitive

      public static boolean isPrimitive(DataType t)
    • isNull

      public static boolean isNull(DataType t)
    • isNullOrNumeric

      public static boolean isNullOrNumeric(DataType t)
    • isDateTime

      public static boolean isDateTime(DataType type)
    • isNullOrTimeDuration

      public static boolean isNullOrTimeDuration(DataType t)
    • isNullOrDatePeriod

      public static boolean isNullOrDatePeriod(DataType t)
    • isTemporalAmount

      public static boolean isTemporalAmount(DataType t)
    • isNullOrTemporalAmount

      public static boolean isNullOrTemporalAmount(DataType t)
    • isDateTimeOrTemporal

      public static boolean isDateTimeOrTemporal(DataType t)
    • isDateTimeOrNanosOrTemporal

      public static boolean isDateTimeOrNanosOrTemporal(DataType t)
    • isMillisOrNanos

      public static boolean isMillisOrNanos(DataType t)
    • areCompatible

      public static boolean areCompatible(DataType left, DataType right)
    • isRepresentable

      public static boolean isRepresentable(DataType t)
      Supported types that can be contained in a block.
    • isCounter

      public static boolean isCounter(DataType t)
    • isSpatialPoint

      public static boolean isSpatialPoint(DataType t)
    • isSpatialShape

      public static boolean isSpatialShape(DataType t)
    • isSpatialGeo

      public static boolean isSpatialGeo(DataType t)
    • isSpatial

      public static boolean isSpatial(DataType t)
    • isSortable

      public static boolean isSortable(DataType t)
    • nameUpper

      public String nameUpper()
    • typeName

      public String typeName()
    • esType

      public String esType()
    • esNameIfPossible

      public String esNameIfPossible()
      Return the Elasticsearch field name of this type if there is one, otherwise return the ESQL specific name.
    • outputType

      public String outputType()
      The name we give to types on the response.
    • isWholeNumber

      public boolean isWholeNumber()
      True if the type represents a "whole number", as in, does not have a decimal part.
    • isRationalNumber

      public boolean isRationalNumber()
      True if the type represents a "rational number", as in, does have a decimal part.
    • isNumeric

      public boolean isNumeric()
      Does this data type represent any number?
    • estimatedSize

      public Optional<Integer> estimatedSize()
      Returns:
      the estimated size, in bytes, of this data type. If there's no reasonable way to estimate the size, the optional will be empty.
    • hasDocValues

      public boolean hasDocValues()
    • isCounter

      public boolean isCounter()
      true if this is a TSDB counter, false otherwise.
    • widenSmallNumeric

      public DataType widenSmallNumeric()
      If this is a "small" numeric type this contains the type ESQL will widen it into, otherwise this returns this.
    • counter

      public DataType counter()
      If this is a representable numeric this will be the counter "version" of this numeric, otherwise this is null.
    • writeTo

      public void writeTo(StreamOutput out) throws IOException
      Throws:
      IOException
    • readFrom

      public static DataType readFrom(StreamInput in) throws IOException
      Throws:
      IOException
    • readFrom

      public static DataType readFrom(String name) throws IOException
      Resolve a DataType from a name read from a StreamInput.
      Throws:
      IOException - on an unknown dataType
    • namesAndAliases

      public static Set<String> namesAndAliases()
    • fromNameOrAlias

      public static DataType fromNameOrAlias(String typeName)
    • noText

      public DataType noText()
    • isDate

      public boolean isDate()
    • suggestedCast

      public static DataType suggestedCast(Set<DataType> originalTypes)