Module org.elasticsearch.xcore
Class SentenceBoundaryChunker
java.lang.Object
org.elasticsearch.xpack.core.inference.chunking.SentenceBoundaryChunker
- All Implemented Interfaces:
Chunker
Split text into chunks aligned on sentence boundaries.
The maximum chunk size is measured in words and controlled
by
maxNumberWordsPerChunk. Sentences are combined
greedily until adding the next sentence would exceed
maxNumberWordsPerChunk, at which point a new chunk
is created. If an individual sentence is longer than
maxNumberWordsPerChunk it is split on word boundary with
overlap.-
Nested Class Summary
Nested classes/interfaces inherited from interface org.elasticsearch.xpack.core.inference.chunking.Chunker
Chunker.ChunkOffset -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionBreak the input text into small chunks on sentence boundaries.chunk(String input, ChunkingSettings chunkingSettings) Break the input text into small chunks on sentence boundaries.
-
Constructor Details
-
SentenceBoundaryChunker
public SentenceBoundaryChunker()
-
-
Method Details
-
chunk
Break the input text into small chunks on sentence boundaries. -
chunk
public List<Chunker.ChunkOffset> chunk(String input, int maxNumberWordsPerChunk, boolean includePrecedingSentence) Break the input text into small chunks on sentence boundaries.- Parameters:
input- Text to chunkmaxNumberWordsPerChunk- Maximum size of the chunkincludePrecedingSentence- Include the previous sentence- Returns:
- The input text offsets
-