Class BulkInferenceRunner

java.lang.Object
org.elasticsearch.xpack.esql.inference.bulk.BulkInferenceRunner

public class BulkInferenceRunner extends Object
Implementation of bulk inference execution with throttling and concurrency control.

This runner limits the number of concurrent inference requests using a semaphore-based permit system. When all permits are exhausted, additional requests are queued and executed as permits become available.

Response processing is always executed in the ESQL worker thread pool to ensure consistent thread context and avoid thread safety issues with circuit breakers and other non-thread-safe components.

  • Constructor Details

    • BulkInferenceRunner

      public BulkInferenceRunner(Client client, int maxRunningTasks)
      Constructs a new throttled inference runner with the specified configuration.
      Parameters:
      client - Client for executing inference requests
      maxRunningTasks - The maximum number of concurrent inference requests allowed
  • Method Details

    • executeBulk

      public void executeBulk(BulkInferenceRequestIterator requests, ActionListener<List<InferenceAction.Response>> listener)
      Executes multiple inference requests in bulk and collects all responses.
      Parameters:
      requests - An iterator over the inference requests to execute
      listener - Called with the list of all responses in request order
    • executeBulk

      public void executeBulk(BulkInferenceRequestIterator requests, Consumer<InferenceAction.Response> responseConsumer, ActionListener<Void> completionListener)
      Executes multiple inference requests in bulk with streaming response handling.

      This method orchestrates the entire bulk inference process: 1. Creates execution state to track progress and responses 2. Sets up response handling pipeline 3. Initiates asynchronous request processing

      Parameters:
      requests - An iterator over the inference requests to execute
      responseConsumer - Called for each successful inference response as they complete
      completionListener - Called when all requests are complete or if any error occurs
    • threadPool

      public ThreadPool threadPool()
      Returns the thread pool used for executing inference requests.
    • factory

      public static BulkInferenceRunner.Factory factory(Client client)