Class CategorizationAnalyzerConfig

java.lang.Object
org.elasticsearch.xpack.core.ml.job.config.CategorizationAnalyzerConfig
All Implemented Interfaces:
Writeable, org.elasticsearch.xcontent.ToXContent, org.elasticsearch.xcontent.ToXContentFragment

public class CategorizationAnalyzerConfig extends Object implements org.elasticsearch.xcontent.ToXContentFragment, Writeable
Configuration for the categorization analyzer. The syntax is a subset of what can be supplied to the _analyze endpoint. To summarize, the first option is to specify the name of an out-of-the-box analyzer: "categorization_analyzer" : "standard" The second option is to specify a custom analyzer by combining the char_filters, tokenizer and token_filters fields. In turn, each of these can be specified as the name of an out-of-the-box one or as an object defining a custom one. For example: "char_filters" : [ "html_strip", { "type" : "pattern_replace", "pattern": "SQL: .*" } ], "tokenizer" : "thai", "token_filters" : [ "lowercase", { "type" : "pattern_replace", "pattern": "^[0-9].*" } ]
  • Field Details

    • CATEGORIZATION_ANALYZER

      public static final org.elasticsearch.xcontent.ParseField CATEGORIZATION_ANALYZER
    • TOKENIZER

      public static final org.elasticsearch.xcontent.ParseField TOKENIZER
    • TOKEN_FILTERS

      public static final org.elasticsearch.xcontent.ParseField TOKEN_FILTERS
    • CHAR_FILTERS

      public static final org.elasticsearch.xcontent.ParseField CHAR_FILTERS
    • MAX_TOKEN_COUNT

      public static final int MAX_TOKEN_COUNT
      See Also:
  • Constructor Details

  • Method Details

    • buildFromXContentObject

      public static CategorizationAnalyzerConfig buildFromXContentObject(org.elasticsearch.xcontent.XContentParser parser, boolean ignoreUnknownFields) throws IOException
      This method is only used in the unit tests - in production code this config is always parsed as a fragment.
      Throws:
      IOException
    • buildFromXContentFragment

      public static CategorizationAnalyzerConfig buildFromXContentFragment(org.elasticsearch.xcontent.XContentParser parser, boolean ignoreUnknownFields) throws IOException
      Parse a categorization_analyzer from configuration or cluster state. A custom parser is needed due to the complexity of the format, with many elements able to be specified as either the name of a built-in element or an object containing a custom definition. The parser is strict when parsing config and lenient when parsing cluster state.
      Throws:
      IOException
    • buildDefaultCategorizationAnalyzer

      public static CategorizationAnalyzerConfig buildDefaultCategorizationAnalyzer(List<String> categorizationFilters)
      Create a categorization_analyzer that mimics what the tokenizer and filters built into the original ML C++ code do. This is the default analyzer for categorization to ensure that people upgrading from old versions get the same behaviour from their categorization jobs before and after upgrade.
      Parameters:
      categorizationFilters - Categorization filters (if any) from the analysis_config.
      Returns:
      The default categorization analyzer.
    • buildStandardCategorizationAnalyzer

      public static CategorizationAnalyzerConfig buildStandardCategorizationAnalyzer(List<String> categorizationFilters)
      Create a categorization_analyzer that will be used for newly created jobs where no categorization analyzer is explicitly provided. This analyzer differs from the default one in that it uses the ml_standard tokenizer instead of the ml_classic tokenizer, and it only considers the first non-blank line of each message. This analyzer is not used for jobs that specify no categorization analyzer, as that would break jobs that were originally run in older versions. Instead, this analyzer is explicitly added to newly created jobs once the entire cluster is upgraded to version 7.14 or above.
      Parameters:
      categorizationFilters - Categorization filters (if any) from the analysis_config.
      Returns:
      The standard categorization analyzer.
    • writeTo

      public void writeTo(StreamOutput out) throws IOException
      Specified by:
      writeTo in interface Writeable
      Throws:
      IOException
    • getAnalyzer

      public String getAnalyzer()
    • getCharFilters

      public List<NameOrDefinition> getCharFilters()
    • getTokenizer

      public NameOrDefinition getTokenizer()
    • getTokenFilters

      public List<NameOrDefinition> getTokenFilters()
    • toXContent

      public org.elasticsearch.xcontent.XContentBuilder toXContent(org.elasticsearch.xcontent.XContentBuilder builder, org.elasticsearch.xcontent.ToXContent.Params params) throws IOException
      Specified by:
      toXContent in interface org.elasticsearch.xcontent.ToXContent
      Throws:
      IOException
    • asMap

      public Map<String,Object> asMap(org.elasticsearch.xcontent.NamedXContentRegistry xContentRegistry) throws IOException
      Get the categorization analyzer structured as a generic map. This can be used to provide the structure that the XContent serialization but as a Java map rather than text. Since it is created by round-tripping through text it is not particularly efficient and is expected to be used only rarely.
      Throws:
      IOException
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object