java.lang.Object
org.elasticsearch.xpack.core.ml.inference.preprocessing.CustomWordEmbedding
All Implemented Interfaces:
org.apache.lucene.util.Accountable, NamedWriteable, Writeable, org.elasticsearch.xcontent.ToXContent, org.elasticsearch.xcontent.ToXContentObject, LenientlyParsedPreProcessor, PreProcessor, StrictlyParsedPreProcessor, NamedXContentObject

public class CustomWordEmbedding extends Object implements LenientlyParsedPreProcessor, StrictlyParsedPreProcessor
This is a pre-processor that embeds text into a numerical vector. It calculates a set of features based on script type, ngram hashes, and most common script values. The features are then concatenated with specific quantization scales and weights into a vector of length 80. This is a fork and a port of: https://github.com/google/cld3/blob/06f695f1c8ee530104416aab5dcf2d6a1414a56a/src/embedding_network.cc
  • Field Details

    • MAX_STRING_SIZE_IN_BYTES

      public static final int MAX_STRING_SIZE_IN_BYTES
      See Also:
    • NAME

      public static final org.elasticsearch.xcontent.ParseField NAME
    • FIELD

      public static final org.elasticsearch.xcontent.ParseField FIELD
    • DEST_FIELD

      public static final org.elasticsearch.xcontent.ParseField DEST_FIELD
    • EMBEDDING_WEIGHTS

      public static final org.elasticsearch.xcontent.ParseField EMBEDDING_WEIGHTS
    • EMBEDDING_QUANT_SCALES

      public static final org.elasticsearch.xcontent.ParseField EMBEDDING_QUANT_SCALES
  • Constructor Details

    • CustomWordEmbedding

      public CustomWordEmbedding(StreamInput in) throws IOException
      Throws:
      IOException
    • CustomWordEmbedding

      public CustomWordEmbedding(short[][] embeddingsQuantScales, byte[][] embeddingsWeights, String fieldName, String destField)
  • Method Details

    • fromXContentStrict

      public static CustomWordEmbedding fromXContentStrict(org.elasticsearch.xcontent.XContentParser parser)
    • fromXContentLenient

      public static CustomWordEmbedding fromXContentLenient(org.elasticsearch.xcontent.XContentParser parser)
    • inputFields

      public List<String> inputFields()
      Description copied from interface: PreProcessor
      The expected input fields
      Specified by:
      inputFields in interface PreProcessor
    • outputFields

      public List<String> outputFields()
      Specified by:
      outputFields in interface PreProcessor
      Returns:
      The resulting output fields. It is imperative that the order is consistent between calls.
    • process

      public void process(Map<String,Object> fields)
      Description copied from interface: PreProcessor
      Process the given fields and their values and return the modified map. NOTE: The passed map object is mutated directly
      Specified by:
      process in interface PreProcessor
      Parameters:
      fields - The fields and their values to process
    • reverseLookup

      public Map<String,String> reverseLookup()
      Specified by:
      reverseLookup in interface PreProcessor
      Returns:
      Reverse lookup map to match resulting features to their original feature name
    • isCustom

      public boolean isCustom()
      Specified by:
      isCustom in interface PreProcessor
      Returns:
      Is the pre-processor a custom one provided by the user, or automatically created? This changes how feature importance is calculated, as fields generated by custom processors get individual feature importance calculations.
    • getOutputFieldType

      public String getOutputFieldType(String outputField)
      Specified by:
      getOutputFieldType in interface PreProcessor
    • ramBytesUsed

      public long ramBytesUsed()
      Specified by:
      ramBytesUsed in interface org.apache.lucene.util.Accountable
    • getWriteableName

      public String getWriteableName()
      Specified by:
      getWriteableName in interface NamedWriteable
    • writeTo

      public void writeTo(StreamOutput out) throws IOException
      Specified by:
      writeTo in interface Writeable
      Throws:
      IOException
    • getName

      public String getName()
      Specified by:
      getName in interface NamedXContentObject
      Returns:
      The name of the XContentObject that is to be serialized
    • toXContent

      public org.elasticsearch.xcontent.XContentBuilder toXContent(org.elasticsearch.xcontent.XContentBuilder builder, org.elasticsearch.xcontent.ToXContent.Params params) throws IOException
      Specified by:
      toXContent in interface org.elasticsearch.xcontent.ToXContent
      Throws:
      IOException
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object