Class SentenceBoundaryChunker

java.lang.Object
org.elasticsearch.xpack.core.inference.chunking.SentenceBoundaryChunker
All Implemented Interfaces:
Chunker

public class SentenceBoundaryChunker extends Object implements Chunker
Split text into chunks aligned on sentence boundaries. The maximum chunk size is measured in words and controlled by maxNumberWordsPerChunk. Sentences are combined greedily until adding the next sentence would exceed maxNumberWordsPerChunk, at which point a new chunk is created. If an individual sentence is longer than maxNumberWordsPerChunk it is split on word boundary with overlap.
  • Constructor Details

    • SentenceBoundaryChunker

      public SentenceBoundaryChunker()
  • Method Details

    • chunk

      public List<Chunker.ChunkOffset> chunk(String input, ChunkingSettings chunkingSettings)
      Break the input text into small chunks on sentence boundaries.
      Specified by:
      chunk in interface Chunker
      Parameters:
      input - Text to chunk
      chunkingSettings - Chunking settings that define maxNumberWordsPerChunk
      Returns:
      The input text chunked
    • chunk

      public List<Chunker.ChunkOffset> chunk(String input, int maxNumberWordsPerChunk, boolean includePrecedingSentence)
      Break the input text into small chunks on sentence boundaries.
      Parameters:
      input - Text to chunk
      maxNumberWordsPerChunk - Maximum size of the chunk
      includePrecedingSentence - Include the previous sentence
      Returns:
      The input text offsets