Package org.elasticsearch.reservedstate


package org.elasticsearch.reservedstate
This package is responsible for managing reserved cluster state and handlers.

The purpose of reserved state is to update and persist various changes to cluster state, generated from an information source (eg a file), and ensure that those changes cannot then be overridden by anything other than that which owns those changes.

The cluster state changes themselves can be any modification to cluster state, and classes performing those changes are pluggable.

There are several main classes in this package and sub-packages:

  • FileSettingsService reads information from a settings file, deserializes it, and passes it to ReservedClusterStateService to process
  • ReservedClusterStateService takes deserialized information from FileSettingsService and calls various registered handlers to update cluster state with the information.
  • Implementations of ReservedClusterStateHandler take specific parts of the deserialized information and updates cluster state accordingly.
  • ReservedStateMetadata contains information on reserved state applicability, and is used to filter and prevent changes to cluster state that would override reserved state.
  • ActionWithReservedState helps REST handlers to detect operations that would override reserved state updates, and deny the request.

Operation overview

There are several steps to managing reserved state. The basic sequence of operations is:
  1. One or more changes to settings files are made. This is detected by FileSettingsService, the changes are deserialized, and the deserialized XContent is passed to ReservedClusterStateService.
  2. ReservedClusterStateService checks the overall metadata of the update to determine if it needs to be applied at all. If it does, it determines which ReservedClusterStateHandler implementations need to be called, based on which keys exist in the update state, and passes them the relevant information to generate a new cluster state (first doing a trial run to see if the update is actually valid).
  3. Metadata on the update is stored in cluster state for each handler, alongside the arbitrary changes done to cluster state by the applicable handlers.
  4. If there is a REST counterpart to a reserved state handler, the REST implementation calls ActionWithReservedState.validateForReservedState(java.util.Collection<org.elasticsearch.cluster.metadata.ReservedStateMetadata>, java.lang.String, java.util.Set<java.lang.String>, java.lang.String) to determine if the REST call will modify any information generated by the corresponding reserved state handler. If it does, the REST handler denies the request.
Importantly, each update to cluster state by a call to ReservedStateService.process is done atomically - either all updates from all registered and applicable handlers are applied, or none are.

Reserved state metadata keys

An important concept to understand is that reserved state is only reserved through the cooperation of REST handlers (or any other part of the system that could modify cluster state). A reserved state handler implementation can modify any aspect of cluster state - it is not up to the reserved state service to monitor that. It is therefore the responsibility of all other aspects of the system that could potentially modify that same state to cooperate with the handler implementation to block conflicting updates before they happen.

To help with this, a handler returns a set of arbitrary string keys alongside the updated cluster state, and these keys are stored in the reserved state metadata for that handler. No meaning is ascribed to those keys by the reserved state infrastructure, but it is expected that they represent or tag the cluster state changes in some meaningful way to that handler. Any REST handlers that could modify the same state needs to check if it is going to modify state corresponding to reserved metadata keys. If the key corresponding to the change it is going to make is present in the reserved state metadata, the request should be denied.

For example, if there is a reserved state handler to set index templates, a file setting could create index templates IT_1 and IT_2. As well as adding those templates to the set of templates already present in the cluster, the reserved state handler will set [IT_1, IT_2] as its reserved state metadata keys.

Later, if there is a REST request to modify IT_1 or IT_2, the REST handler should check those strings against the reserved metadata keys for the index template handler. As those keys are reserved, all requests to modify them via REST should be denied.

Project metadata handlers

There are two types of reserved state handlers - those that modify ClusterState as a whole, and those that modify ProjectMetadata, denoted by the S type parameter. Data for project-specific handlers can be specified in project-specific settings files, data for cluster state handlers can only be specified in the cluster settings file.

If a project-specific handler is specified in the cluster settings file, then that handler is used to modify the default project (which in most Elasticsearch clusters, is the only project in the system). Reserved state metadata is stored in the context of the source of the information (so, cluster-wide if it's from the cluster settings file, or in the ProjectMetadata if it's from a project settings file, regardless of the handler type used to process it).

Reserved state update details

Reserved state namespace

Every reserved state handler has a namespace that it operates under. This is used to scope all handlers and metadata stored in cluster state (although every namespace is checked for conflicts by ActionWithReservedState).

There is currently only one namespace defined by Elasticsearch itself, file_settings defined by FileSettingsService.NAMESPACE. Other namespaces may be defined by plugins and modules.

Reserved state version and compatibility

Every reserved state namespace also has a version associated with it. This is a simple integer, that should be incremented whenever a new change should be applied (eg a new version of the settings file is written). This is used to de-duplicate multiple calls to process, and to handle races that could occur between updates; to determine if the changes should actually result in modifications to cluster state, or if the cluster state already has those changes if the stored metadata version is greater than the version of the update.

Handler ordering

There may be a dependency between the execution of multiple handlers, for example if one handler requires structures to exist that are only created by another handler. This relationship can be represented by overriding the ReservedStateHandler.dependencies() and ReservedStateHandler.optionalDependencies() methods, to specify other handlers that must be registered and be run before this one, and ones that should be run before this one only if they are registered with the reserved state service.

Trial runs and errors

If invalid data is given to a REST endpoint, the HTTP response can indicate the problem and that the request was denied. No such response mechanism exists for information written to files. Furthermore, there is no opportunity to test changes; if a settings file causes invalid updates to cluster state, or a handler to throw exceptions, then there is also no way to roll back. To solve this, there is a space in the metadata for each reserved state namespace to store error information, which can be seen in a dump of cluster state.

Before reserved state handlers update the 'real' cluster state, a trial run is performed on whatever the current cluster state is at the time. If an exception is thrown at any point, or while deserializing update information, then the reserved state update is not applied. Instead the error metadata for that namespace is set in cluster state, and the cluster state is left as-is. If a subsequent update succeeds (ie the file data is corrected), then the error metadata is cleared.

There is always a small risk that the trial run will succeed, but applying the updates to the real cluster state fails, due to cluster state changing in the meantime, or a transient error in a handler. In that case, the error will be logged and reported just like any other asynchronous cluster update - but reserved state error metadata won't be written.