Module org.elasticsearch.xcore
Class CustomWordEmbedding
java.lang.Object
org.elasticsearch.xpack.core.ml.inference.preprocessing.CustomWordEmbedding
- All Implemented Interfaces:
org.apache.lucene.util.Accountable,NamedWriteable,Writeable,org.elasticsearch.xcontent.ToXContent,org.elasticsearch.xcontent.ToXContentObject,LenientlyParsedPreProcessor,PreProcessor,StrictlyParsedPreProcessor,NamedXContentObject
public class CustomWordEmbedding
extends Object
implements LenientlyParsedPreProcessor, StrictlyParsedPreProcessor
This is a pre-processor that embeds text into a numerical vector.
It calculates a set of features based on script type, ngram hashes, and most common script values.
The features are then concatenated with specific quantization scales and weights into a vector of length 80.
This is a fork and a port of: https://github.com/google/cld3/blob/06f695f1c8ee530104416aab5dcf2d6a1414a56a/src/embedding_network.cc
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from interface org.elasticsearch.xpack.core.ml.inference.preprocessing.PreProcessor
PreProcessor.PreProcessorParseContextNested classes/interfaces inherited from interface org.elasticsearch.xcontent.ToXContent
org.elasticsearch.xcontent.ToXContent.DelegatingMapParams, org.elasticsearch.xcontent.ToXContent.MapParams, org.elasticsearch.xcontent.ToXContent.ParamsNested classes/interfaces inherited from interface org.elasticsearch.common.io.stream.Writeable
Writeable.Reader<V>, Writeable.Writer<V> -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final org.elasticsearch.xcontent.ParseFieldstatic final org.elasticsearch.xcontent.ParseFieldstatic final org.elasticsearch.xcontent.ParseFieldstatic final org.elasticsearch.xcontent.ParseFieldstatic final intstatic final org.elasticsearch.xcontent.ParseFieldFields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLEFields inherited from interface org.elasticsearch.xcontent.ToXContent
EMPTY, EMPTY_PARAMS -
Constructor Summary
ConstructorsConstructorDescriptionCustomWordEmbedding(short[][] embeddingsQuantScales, byte[][] embeddingsWeights, String fieldName, String destField) -
Method Summary
Modifier and TypeMethodDescriptionbooleanstatic CustomWordEmbeddingfromXContentLenient(org.elasticsearch.xcontent.XContentParser parser) static CustomWordEmbeddingfromXContentStrict(org.elasticsearch.xcontent.XContentParser parser) getName()getOutputFieldType(String outputField) inthashCode()The expected input fieldsbooleanisCustom()voidProcess the given fields and their values and return the modified map.longorg.elasticsearch.xcontent.XContentBuildertoXContent(org.elasticsearch.xcontent.XContentBuilder builder, org.elasticsearch.xcontent.ToXContent.Params params) voidwriteTo(StreamOutput out) Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.lucene.util.Accountable
getChildResourcesMethods inherited from interface org.elasticsearch.xcontent.ToXContentObject
isFragment
-
Field Details
-
MAX_STRING_SIZE_IN_BYTES
public static final int MAX_STRING_SIZE_IN_BYTES- See Also:
-
NAME
public static final org.elasticsearch.xcontent.ParseField NAME -
FIELD
public static final org.elasticsearch.xcontent.ParseField FIELD -
DEST_FIELD
public static final org.elasticsearch.xcontent.ParseField DEST_FIELD -
EMBEDDING_WEIGHTS
public static final org.elasticsearch.xcontent.ParseField EMBEDDING_WEIGHTS -
EMBEDDING_QUANT_SCALES
public static final org.elasticsearch.xcontent.ParseField EMBEDDING_QUANT_SCALES
-
-
Constructor Details
-
CustomWordEmbedding
- Throws:
IOException
-
CustomWordEmbedding
-
-
Method Details
-
fromXContentStrict
public static CustomWordEmbedding fromXContentStrict(org.elasticsearch.xcontent.XContentParser parser) -
fromXContentLenient
public static CustomWordEmbedding fromXContentLenient(org.elasticsearch.xcontent.XContentParser parser) -
inputFields
Description copied from interface:PreProcessorThe expected input fields- Specified by:
inputFieldsin interfacePreProcessor
-
outputFields
- Specified by:
outputFieldsin interfacePreProcessor- Returns:
- The resulting output fields. It is imperative that the order is consistent between calls.
-
process
Description copied from interface:PreProcessorProcess the given fields and their values and return the modified map. NOTE: The passed map object is mutated directly- Specified by:
processin interfacePreProcessor- Parameters:
fields- The fields and their values to process
-
reverseLookup
- Specified by:
reverseLookupin interfacePreProcessor- Returns:
- Reverse lookup map to match resulting features to their original feature name
-
isCustom
public boolean isCustom()- Specified by:
isCustomin interfacePreProcessor- Returns:
- Is the pre-processor a custom one provided by the user, or automatically created? This changes how feature importance is calculated, as fields generated by custom processors get individual feature importance calculations.
-
getOutputFieldType
- Specified by:
getOutputFieldTypein interfacePreProcessor
-
ramBytesUsed
public long ramBytesUsed()- Specified by:
ramBytesUsedin interfaceorg.apache.lucene.util.Accountable
-
getWriteableName
- Specified by:
getWriteableNamein interfaceNamedWriteable
-
writeTo
- Specified by:
writeToin interfaceWriteable- Throws:
IOException
-
getName
- Specified by:
getNamein interfaceNamedXContentObject- Returns:
- The name of the XContentObject that is to be serialized
-
toXContent
public org.elasticsearch.xcontent.XContentBuilder toXContent(org.elasticsearch.xcontent.XContentBuilder builder, org.elasticsearch.xcontent.ToXContent.Params params) throws IOException - Specified by:
toXContentin interfaceorg.elasticsearch.xcontent.ToXContent- Throws:
IOException
-
equals
-
hashCode
public int hashCode()
-