Class BulkInferenceRunner
java.lang.Object
org.elasticsearch.xpack.esql.inference.bulk.BulkInferenceRunner
Implementation of bulk inference execution with throttling and concurrency control.
This runner limits the number of concurrent inference requests using a semaphore-based permit system. When all permits are exhausted, additional requests are queued and executed as permits become available.
Response processing is always executed in the ESQL worker thread pool to ensure consistent thread context and avoid thread safety issues with circuit breakers and other non-thread-safe components.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceFactory interface for creatingBulkInferenceRunnerinstances. -
Constructor Summary
ConstructorsConstructorDescriptionBulkInferenceRunner(Client client, int maxRunningTasks) Constructs a new throttled inference runner with the specified configuration. -
Method Summary
Modifier and TypeMethodDescriptionvoidexecuteBulk(BulkInferenceRequestIterator requests, Consumer<InferenceAction.Response> responseConsumer, ActionListener<Void> completionListener) Executes multiple inference requests in bulk with streaming response handling.voidexecuteBulk(BulkInferenceRequestIterator requests, ActionListener<List<InferenceAction.Response>> listener) Executes multiple inference requests in bulk and collects all responses.static BulkInferenceRunner.FactoryReturns the thread pool used for executing inference requests.
-
Constructor Details
-
BulkInferenceRunner
Constructs a new throttled inference runner with the specified configuration.- Parameters:
client- Client for executing inference requestsmaxRunningTasks- The maximum number of concurrent inference requests allowed
-
-
Method Details
-
executeBulk
public void executeBulk(BulkInferenceRequestIterator requests, ActionListener<List<InferenceAction.Response>> listener) Executes multiple inference requests in bulk and collects all responses.- Parameters:
requests- An iterator over the inference requests to executelistener- Called with the list of all responses in request order
-
executeBulk
public void executeBulk(BulkInferenceRequestIterator requests, Consumer<InferenceAction.Response> responseConsumer, ActionListener<Void> completionListener) Executes multiple inference requests in bulk with streaming response handling.This method orchestrates the entire bulk inference process: 1. Creates execution state to track progress and responses 2. Sets up response handling pipeline 3. Initiates asynchronous request processing
- Parameters:
requests- An iterator over the inference requests to executeresponseConsumer- Called for each successful inference response as they completecompletionListener- Called when all requests are complete or if any error occurs
-
threadPool
Returns the thread pool used for executing inference requests. -
factory
-