Class TSDBDocValuesEncoder

java.lang.Object
org.elasticsearch.index.codec.tsdb.TSDBDocValuesEncoder

public final class TSDBDocValuesEncoder extends Object
This class provides encoding and decoding of doc values using the following schemes:
  • delta encoding: encodes numeric fields in such a way to store the initial value and the difference between the initial value and all subsequent values. Delta values normally require much less bits than the original 32 or 64 bits.
  • offset encoding: encodes numeric fields in such a way to store values in range [0, max - min] instead of [min, max]. Reducing the range makes delta encoding much more effective since numbers in range [0, max - min] require less bits than values in range [min, max].
  • gcd encoding: encodes numeric fields in such a way to store values divided by their Greatest Common Divisor. Diving values by their GCD reduces values magnitude making delta encoding much more effective as a result of the fact that dividing a number by another number reduces its magnitude and, as a result, the bits required to represent it.
  • (f)or encoding: encodes numeric fields in such a way to store the initial value and then the XOR between each value and the previous one, making delta encoding much more effective. Values sharing common values for higher bits will require less bits when delta encoded. This is expected to be effective especially with floating point values sharing a common exponent and sign bit.
Notice that encoding and decoding are written in a nested way, for instance deltaEncode(int, int, long[], org.apache.lucene.store.DataOutput) calling removeOffset(int, int, long[], org.apache.lucene.store.DataOutput) and so on. This allows us to easily introduce new encoding schemes or remove existing (non-effective) encoding schemes in a backward-compatible way. A token is used as a bitmask to represent which encoding is applied and allows us to detect the applied encoding scheme at decoding time. This encoding and decoding scheme is meant to work on blocks of 128 values. Larger block sizes incur a decoding penalty when random access to doc values is required since a full block must be decoded. Of course, decoding follows the opposite order with respect to encoding.
  • Constructor Summary

    Constructors
    Constructor
    Description
    TSDBDocValuesEncoder(int numericBlockSize)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    decode(org.apache.lucene.store.DataInput in, long[] out)
    Decode longs that have been encoded with encode(long[], org.apache.lucene.store.DataOutput).
    void
    decodeOrdinals(org.apache.lucene.store.DataInput in, long[] out, int bitsPerOrd)
     
    void
    encode(long[] in, org.apache.lucene.store.DataOutput out)
    Encode the given longs using a combination of delta-coding, GCD factorization and bit packing.
    void
    encodeOrdinals(long[] in, org.apache.lucene.store.DataOutput out, int bitsPerOrd)
    Optimizes for encoding sorted fields where we expect a block to mostly either be the same value or to make a transition from one value to a second one.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • TSDBDocValuesEncoder

      public TSDBDocValuesEncoder(int numericBlockSize)
  • Method Details

    • encode

      public void encode(long[] in, org.apache.lucene.store.DataOutput out) throws IOException
      Encode the given longs using a combination of delta-coding, GCD factorization and bit packing.
      Throws:
      IOException
    • encodeOrdinals

      public void encodeOrdinals(long[] in, org.apache.lucene.store.DataOutput out, int bitsPerOrd) throws IOException
      Optimizes for encoding sorted fields where we expect a block to mostly either be the same value or to make a transition from one value to a second one.

      The header is a vlong where the number of trailing ones defines the encoding strategy:

      • 0: single run
      • 1: two runs
      • 2: bit-packed
      • 3: cycle
      Throws:
      IOException
    • decodeOrdinals

      public void decodeOrdinals(org.apache.lucene.store.DataInput in, long[] out, int bitsPerOrd) throws IOException
      Throws:
      IOException
    • decode

      public void decode(org.apache.lucene.store.DataInput in, long[] out) throws IOException
      Decode longs that have been encoded with encode(long[], org.apache.lucene.store.DataOutput).
      Throws:
      IOException