Module org.elasticsearch.xcore
Class WordBoundaryChunker
java.lang.Object
org.elasticsearch.xpack.core.inference.chunking.WordBoundaryChunker
- All Implemented Interfaces:
Chunker
Breaks text into smaller strings or chunks on Word boundaries.
Whitespace is preserved and included in the start of the
following chunk not the end of the chunk. If the chunk ends
on a punctuation mark the punctuation is included in the
next chunk.
The overlap value must be > (chunkSize /2) to avoid the
complexity of tracking the start positions of multiple
chunks within the chunk.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.elasticsearch.xpack.core.inference.chunking.Chunker
Chunker.ChunkOffset -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionBreak the input text into small chunks as dictated by the chunking parameterschunk(String input, ChunkingSettings chunkingSettings) Break the input text into small chunks as dictated by the chunking parameters
-
Constructor Details
-
WordBoundaryChunker
public WordBoundaryChunker()
-
-
Method Details
-
chunk
Break the input text into small chunks as dictated by the chunking parameters -
chunk
Break the input text into small chunks as dictated by the chunking parameters- Parameters:
input- Text to chunkchunkSize- The number of words in each chunkoverlap- The number of words to overlap each chunk. Can be 0 but must be non-negative.- Returns:
- List of chunked text
-