Module org.elasticsearch.xcore
Class Tokenization
java.lang.Object
org.elasticsearch.xpack.core.ml.inference.trainedmodel.Tokenization
- All Implemented Interfaces:
NamedWriteable,Writeable,org.elasticsearch.xcontent.ToXContent,org.elasticsearch.xcontent.ToXContentObject,NamedXContentObject
- Direct Known Subclasses:
BertJapaneseTokenization,BertTokenization,DebertaV2Tokenization,MPNetTokenization,RobertaTokenization,XLMRobertaTokenization
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordstatic enumNested classes/interfaces inherited from interface org.elasticsearch.xcontent.ToXContent
org.elasticsearch.xcontent.ToXContent.DelegatingMapParams, org.elasticsearch.xcontent.ToXContent.MapParams, org.elasticsearch.xcontent.ToXContent.ParamsNested classes/interfaces inherited from interface org.elasticsearch.common.io.stream.Writeable
Writeable.Reader<V>, Writeable.Writer<V> -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final org.elasticsearch.xcontent.ParseFieldprotected final booleanstatic final org.elasticsearch.xcontent.ParseFieldprotected final intprotected final intstatic final org.elasticsearch.xcontent.ParseFieldprotected final Tokenization.Truncatestatic final org.elasticsearch.xcontent.ParseFieldstatic final intstatic final org.elasticsearch.xcontent.ParseFieldprotected final booleanFields inherited from interface org.elasticsearch.xcontent.ToXContent
EMPTY, EMPTY_PARAMS -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic BertTokenizationbooleanbooleanabstract StringintintgetSpan()inthashCode()intorg.elasticsearch.xcontent.XContentBuildertoXContent(org.elasticsearch.xcontent.XContentBuilder builder, org.elasticsearch.xcontent.ToXContent.Params params) Return a copy of this with the tokenizer span settings updatedstatic voidvalidateSpanAndMaxSequenceLength(int maxSequenceLength, int span) static voidvalidateSpanAndTruncate(Tokenization.Truncate truncate, Integer span) voidbooleanvoidwriteTo(StreamOutput out) Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.elasticsearch.common.io.stream.NamedWriteable
getWriteableNameMethods inherited from interface org.elasticsearch.xpack.core.ml.utils.NamedXContentObject
getNameMethods inherited from interface org.elasticsearch.xcontent.ToXContentObject
isFragment
-
Field Details
-
DO_LOWER_CASE
public static final org.elasticsearch.xcontent.ParseField DO_LOWER_CASE -
WITH_SPECIAL_TOKENS
public static final org.elasticsearch.xcontent.ParseField WITH_SPECIAL_TOKENS -
MAX_SEQUENCE_LENGTH
public static final org.elasticsearch.xcontent.ParseField MAX_SEQUENCE_LENGTH -
TRUNCATE
public static final org.elasticsearch.xcontent.ParseField TRUNCATE -
SPAN
public static final org.elasticsearch.xcontent.ParseField SPAN -
DEFAULT_MAX_SEQUENCE_LENGTH
public static final int DEFAULT_MAX_SEQUENCE_LENGTH- See Also:
-
UNSET_SPAN_VALUE
public static final int UNSET_SPAN_VALUE- See Also:
-
doLowerCase
protected final boolean doLowerCase -
withSpecialTokens
protected final boolean withSpecialTokens -
maxSequenceLength
protected final int maxSequenceLength -
truncate
-
span
protected final int span
-
-
Constructor Details
-
Tokenization
- Throws:
IOException
-
-
Method Details
-
createDefault
-
updateWindowSettings
Return a copy of this with the tokenizer span settings updated- Parameters:
update- The settings to update- Returns:
- An updated Tokenization
-
writeTo
- Specified by:
writeToin interfaceWriteable- Throws:
IOException
-
getMaskToken
-
toXContent
public org.elasticsearch.xcontent.XContentBuilder toXContent(org.elasticsearch.xcontent.XContentBuilder builder, org.elasticsearch.xcontent.ToXContent.Params params) throws IOException - Specified by:
toXContentin interfaceorg.elasticsearch.xcontent.ToXContent- Throws:
IOException
-
validateSpanAndMaxSequenceLength
public static void validateSpanAndMaxSequenceLength(int maxSequenceLength, int span) -
validateSpanAndTruncate
public static void validateSpanAndTruncate(@Nullable Tokenization.Truncate truncate, @Nullable Integer span) -
equals
-
hashCode
public int hashCode() -
doLowerCase
public boolean doLowerCase() -
withSpecialTokens
public boolean withSpecialTokens() -
maxSequenceLength
public int maxSequenceLength() -
getTruncate
-
getSpan
public int getSpan() -
getMaxSequenceLength
public int getMaxSequenceLength() -
validateVocabulary
-