Module org.elasticsearch.server
Class IndexReshardingMetadata
java.lang.Object
org.elasticsearch.cluster.metadata.IndexReshardingMetadata
- All Implemented Interfaces:
Writeable,ToXContent,ToXContentFragment
IndexReshardingMetadata holds persistent state managing an in-flight index resharding operation
Resharding is changing the number of shards that make up an index, in place.
We currently only support splitting an index into an integer multiple of its current shard count,
e.g., going from 1 to 3 shards, or 2 to 4. This is because we route documents to shards by hash of
the document id modulo the shard count. Multiplying the shard count under this scheme lets us move
only the fraction of the documents that route to new shards while the rest stay where they were.
During a split, we create new shards and then migrate the documents that belong to the new shards
according to the routing function to those new shards. While we're moving documents, search requests
may be ongoing, or new documents may be indexed. There must not be ambiguity about whether the source
shard or the target shards are responsible for documents being indexed or searched while this handoff
is occurring, to ensure that we don't lose or double-count documents during the process. We prevent this
by maintaining the state of the split on the source and target shards, and making an atomic (from the point
of view of indexing and search requests) transition from having the source shard handle requests for documents
that belong to the target shard, to having the target shard handle them itself.
Before the handoff, the source shard has the entire document collection for both the source and target, and handles
indexing and search requests. After the handoff, documents that route to the target are handled by the target,
and the source does not necessarily have a complete view - it will be missing any documents that are indexed
to the target shard after handoff. Indeed, when the target becomes active, the source filters target documents
from its search results, so that they are not counted twice when the target shard is also searched. The handoff
is performed at the target by queueing incoming requests prior to entering handoff, waiting for the target to
be RUNNING, and then forwarding requests for the target shard to the target. Similarly, when the target first
becomes active it must filter out search results containing documents owned by the source shard, which may be
present if the target was created by copying the source shard's Lucene files.
To ensure that we always route requests to the correct shard, even in the case of failure of either source or
target shards during split, we preserve the transition point in persistent state until the split is complete, so
that when the source or target recovers, it can resync and route correctly based on that state. This class holds
the persistent state required to recover correctly, always maintaining the invariant that only the source shard
accepts indexing and search requests for the target prior to handoff, and only the target shard accepts them afterward.
The state we preserve is:
* The old and new shard counts for a resize operation, so that we can always identify which shards are sources
and which are targets during resharding. For example, old:2 new:6 implies that shard 1 is the source shard for
shards 3 and 5, and shard 2 is the source for shards 4 and 6.
* For each source shard, its current source state, which is either `SOURCE` or `DONE`.
- If a source shard may still contain data for any target shard then it is in state `SOURCE`.
- When all targets for a source have moved to `SPLIT` (see below), then the source deletes all documents from
its store that are now the responsibility of the target shards and transitions to `DONE`.
This isn't strictly required to be persistent for correctness, but it can save time on recovery
by allowing a DONE shard to skip interrogating targets and repeating cleanup.
* For each target shard, its current target state, which is one of `CLONE`, `HANDOFF`, `SPLIT`, or `DONE`.
- If the target has not yet copied all data from the source shard, then it is in `CLONE`.
- It moves to `HANDOFF` when it has copied all of its data from the source to indicate that it is now ready to
receive indexing actions, and starts RUNNING. After this point, the source may no longer contain the entire contents
of the target and must not index documents belonging to the target. But since search shards can't start up until
their corresponding index shards are active, search requests would fail if they routed to the target shard immediately
after handoff. So at HANDOFF, the source shards continue to service searches, but block refresh since they cannot
be guaranteed to have seen documents indexed after HANDOFF.
- When the target shard's corresponding search replica has started running, the target requests that the source filter
search results belonging to the target, and moves the target shard's state to `SPLIT`. The target's search replica
likewise filters documents not belonging to the target, which may be present due to the target bootstrapping by copying
the source's lucene files.
- Upon entering `SPLIT`, the target starts deleting all documents from its lucene store that do not belong to it. When that
is complete, it moves to `DONE` and removes filters for other shards, which are no longer necessary.
Note that each target shard's split operates independently and all may happen concurrently.
When all source shards have transitioned to `DONE`, the resize is complete and this metadata may be removed from cluster state.
We only allow at most a single resharding operation to be in flight for an index, so removing this metadata is a prerequisite
to beginning another resharding operation.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.elasticsearch.xcontent.ToXContent
ToXContent.DelegatingMapParams, ToXContent.MapParams, ToXContent.ParamsNested classes/interfaces inherited from interface org.elasticsearch.common.io.stream.Writeable
Writeable.Reader<V>, Writeable.Writer<V> -
Field Summary
Fields inherited from interface org.elasticsearch.xcontent.ToXContent
EMPTY, EMPTY_PARAMS -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbooleangetSplit()inthashCode()booleanisSplit()static booleanisSplitSource(ShardId shardId, IndexReshardingMetadata reshardingMetadata) static booleanisSplitTarget(ShardId shardId, IndexReshardingMetadata reshardingMetadata) static IndexReshardingMetadatanewSplitByMultiple(int shardCount, int multiple) Create resharding metadata representing a new split operation Split only supports updating an index to a multiple of its current shard countintinttoString()toXContent(XContentBuilder builder, ToXContent.Params params) transitionSplitSourceToNewState(ShardId shardId, IndexReshardingState.Split.SourceShardState newSourceState) transitionSplitTargetToNewState(ShardId shardId, IndexReshardingState.Split.TargetShardState newTargetState) voidwriteTo(StreamOutput out) Write this into the StreamOutput.Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface org.elasticsearch.xcontent.ToXContentFragment
isFragment
-
Constructor Details
-
IndexReshardingMetadata
- Throws:
IOException
-
-
Method Details
-
toXContent
public XContentBuilder toXContent(XContentBuilder builder, ToXContent.Params params) throws IOException - Specified by:
toXContentin interfaceToXContent- Throws:
IOException
-
writeTo
Description copied from interface:WriteableWrite this into the StreamOutput.- Specified by:
writeToin interfaceWriteable- Throws:
IOException
-
equals
-
hashCode
public int hashCode() -
toString
-
newSplitByMultiple
Create resharding metadata representing a new split operation Split only supports updating an index to a multiple of its current shard count- Parameters:
shardCount- the number of shards in the index at the start of the operationmultiple- the new shard count is shardCount * multiple- Returns:
- resharding metadata representing the start of the requested split
-
isSplitSource
public static boolean isSplitSource(ShardId shardId, @Nullable IndexReshardingMetadata reshardingMetadata) -
isSplitTarget
public static boolean isSplitTarget(ShardId shardId, @Nullable IndexReshardingMetadata reshardingMetadata) -
transitionSplitTargetToNewState
public IndexReshardingMetadata transitionSplitTargetToNewState(ShardId shardId, IndexReshardingState.Split.TargetShardState newTargetState) -
transitionSplitSourceToNewState
public IndexReshardingMetadata transitionSplitSourceToNewState(ShardId shardId, IndexReshardingState.Split.SourceShardState newSourceState) -
getSplit
- Returns:
- the split state of this metadata block, or throw IllegalArgumentException if this metadata doesn't represent a split
-
isSplit
public boolean isSplit() -
shardCountBefore
public int shardCountBefore()- Returns:
- the number of shards the index has at the start of this operation
-
shardCountAfter
public int shardCountAfter()- Returns:
- the number of shards that the index will have when resharding completes
-