Module org.elasticsearch.xcore
Class CategorizationAnalyzerConfig
java.lang.Object
org.elasticsearch.xpack.core.ml.job.config.CategorizationAnalyzerConfig
- All Implemented Interfaces:
Writeable,org.elasticsearch.xcontent.ToXContent,org.elasticsearch.xcontent.ToXContentFragment
public class CategorizationAnalyzerConfig
extends Object
implements org.elasticsearch.xcontent.ToXContentFragment, Writeable
Configuration for the categorization analyzer.
The syntax is a subset of what can be supplied to the
_analyze endpoint.
To summarize, the first option is to specify the name of an out-of-the-box analyzer:
"categorization_analyzer" : "standard"
The second option is to specify a custom analyzer by combining the char_filters, tokenizer
and token_filters fields. In turn, each of these can be specified as the name of an out-of-the-box
one or as an object defining a custom one. For example:
"char_filters" : [
"html_strip",
{ "type" : "pattern_replace", "pattern": "SQL: .*" }
],
"tokenizer" : "thai",
"token_filters" : [
"lowercase",
{ "type" : "pattern_replace", "pattern": "^[0-9].*" }
]
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from interface org.elasticsearch.xcontent.ToXContent
org.elasticsearch.xcontent.ToXContent.DelegatingMapParams, org.elasticsearch.xcontent.ToXContent.MapParams, org.elasticsearch.xcontent.ToXContent.ParamsNested classes/interfaces inherited from interface org.elasticsearch.common.io.stream.Writeable
Writeable.Reader<V>, Writeable.Writer<V> -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final org.elasticsearch.xcontent.ParseFieldstatic final org.elasticsearch.xcontent.ParseFieldstatic final intstatic final org.elasticsearch.xcontent.ParseFieldstatic final org.elasticsearch.xcontent.ParseFieldFields inherited from interface org.elasticsearch.xcontent.ToXContent
EMPTY, EMPTY_PARAMS -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionasMap(org.elasticsearch.xcontent.NamedXContentRegistry xContentRegistry) Get the categorization analyzer structured as a generic map.static CategorizationAnalyzerConfigbuildDefaultCategorizationAnalyzer(List<String> categorizationFilters) Create acategorization_analyzerthat mimics what the tokenizer and filters built into the original ML C++ code do.static CategorizationAnalyzerConfigbuildFromXContentFragment(org.elasticsearch.xcontent.XContentParser parser, boolean ignoreUnknownFields) Parse acategorization_analyzerfrom configuration or cluster state.static CategorizationAnalyzerConfigbuildFromXContentObject(org.elasticsearch.xcontent.XContentParser parser, boolean ignoreUnknownFields) This method is only used in the unit tests - in production code this config is always parsed as a fragment.static CategorizationAnalyzerConfigbuildStandardCategorizationAnalyzer(List<String> categorizationFilters) Create acategorization_analyzerthat will be used for newly created jobs where no categorization analyzer is explicitly provided.booleaninthashCode()org.elasticsearch.xcontent.XContentBuildertoXContent(org.elasticsearch.xcontent.XContentBuilder builder, org.elasticsearch.xcontent.ToXContent.Params params) voidwriteTo(StreamOutput out) Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.elasticsearch.xcontent.ToXContentFragment
isFragment
-
Field Details
-
CATEGORIZATION_ANALYZER
public static final org.elasticsearch.xcontent.ParseField CATEGORIZATION_ANALYZER -
TOKENIZER
public static final org.elasticsearch.xcontent.ParseField TOKENIZER -
TOKEN_FILTERS
public static final org.elasticsearch.xcontent.ParseField TOKEN_FILTERS -
CHAR_FILTERS
public static final org.elasticsearch.xcontent.ParseField CHAR_FILTERS -
MAX_TOKEN_COUNT
public static final int MAX_TOKEN_COUNT- See Also:
-
-
Constructor Details
-
CategorizationAnalyzerConfig
- Throws:
IOException
-
-
Method Details
-
buildFromXContentObject
public static CategorizationAnalyzerConfig buildFromXContentObject(org.elasticsearch.xcontent.XContentParser parser, boolean ignoreUnknownFields) throws IOException This method is only used in the unit tests - in production code this config is always parsed as a fragment.- Throws:
IOException
-
buildFromXContentFragment
public static CategorizationAnalyzerConfig buildFromXContentFragment(org.elasticsearch.xcontent.XContentParser parser, boolean ignoreUnknownFields) throws IOException Parse acategorization_analyzerfrom configuration or cluster state. A custom parser is needed due to the complexity of the format, with many elements able to be specified as either the name of a built-in element or an object containing a custom definition. The parser is strict when parsing config and lenient when parsing cluster state.- Throws:
IOException
-
buildDefaultCategorizationAnalyzer
public static CategorizationAnalyzerConfig buildDefaultCategorizationAnalyzer(List<String> categorizationFilters) Create acategorization_analyzerthat mimics what the tokenizer and filters built into the original ML C++ code do. This is the default analyzer for categorization to ensure that people upgrading from old versions get the same behaviour from their categorization jobs before and after upgrade.- Parameters:
categorizationFilters- Categorization filters (if any) from theanalysis_config.- Returns:
- The default categorization analyzer.
-
buildStandardCategorizationAnalyzer
public static CategorizationAnalyzerConfig buildStandardCategorizationAnalyzer(List<String> categorizationFilters) Create acategorization_analyzerthat will be used for newly created jobs where no categorization analyzer is explicitly provided. This analyzer differs from the default one in that it uses theml_standardtokenizer instead of theml_classictokenizer, and it only considers the first non-blank line of each message. This analyzer is not used for jobs that specify no categorization analyzer, as that would break jobs that were originally run in older versions. Instead, this analyzer is explicitly added to newly created jobs once the entire cluster is upgraded to version 7.14 or above.- Parameters:
categorizationFilters- Categorization filters (if any) from theanalysis_config.- Returns:
- The standard categorization analyzer.
-
writeTo
- Specified by:
writeToin interfaceWriteable- Throws:
IOException
-
getAnalyzer
-
getCharFilters
-
getTokenizer
-
getTokenFilters
-
toXContent
public org.elasticsearch.xcontent.XContentBuilder toXContent(org.elasticsearch.xcontent.XContentBuilder builder, org.elasticsearch.xcontent.ToXContent.Params params) throws IOException - Specified by:
toXContentin interfaceorg.elasticsearch.xcontent.ToXContent- Throws:
IOException
-
asMap
public Map<String,Object> asMap(org.elasticsearch.xcontent.NamedXContentRegistry xContentRegistry) throws IOException Get the categorization analyzer structured as a generic map. This can be used to provide the structure that the XContent serialization but as a Java map rather than text. Since it is created by round-tripping through text it is not particularly efficient and is expected to be used only rarely.- Throws:
IOException
-
equals
-
hashCode
public int hashCode()
-