Skip to content

[RFC] Introduce Pre-copy Merged Segment into Segment Replication #17528

@guojialiang92

Description

@guojialiang92

Is your feature request related to a problem? Please describe

Segment Replication copies the segment held by the primary shard to the replica shard, reducing the build segment overhead of the replica. One of the costs of doing this compared to document replication is the added visibility delay between the primary and the replica. The replica needs to wait for segment replication to complete before searching the doc in the segment.
Currently, the segment replication process includes two types of segments, one generated by merge and the other built by refresh for incremental indexing. Assuming a merged segment is 1GB with a transmission bandwidth of 50MB/s, the segment replication process including this segment will take at least 20s. If there are larger merge segments and more shards occupying the transmission bandwidth, the segment replication process will last longer.

Describe the solution you'd like

Introduction

This RFC introduces the optimization of segment replication, which uses Lucene's IndexWriter.IndexReaderWarmer to pre-copy the merged segment to the replica. It can effectively reduce the delay time before the documents are visible for searching in the replica.

Background

Lucene supports extending IndexWriter.IndexReaderWarmer. After the segment files of merge are generated, IndexWriter.IndexReaderWarmer will be called. The merged segment cannot be searched until the IndexWriter.IndexReaderWarmer is completed. If segment replication is enabled, the merged segment will only participate in the segment replication process after IndexWriter.IndexReaderWarmer is completed.

Proposed Solution

For easy understanding, let's first explain the current segment replication process. segment(_3.si) is generated by segment(_1.si) and segment(_2.si) merge, and segment(_4.si) is generated by refresh. During the segment replication process, they will be replicated together to the replica. If segment(_3.si) is large, the replication process will take a long time, and the docs contained in segment(_4.si) will be invisible to the replica for a long time.

Image

After introducing the Pre-copy Merged Segment, the primary will pre-copy segment(_3.si) to the replica.
Discuss two situations separately:
In the first case, pre-copy is done before segment replication. After refresh, segment(_3.si) and segment(_4.si) are copied to the replica through segment replication. Because the replica already holds the files of segment(_3.si) , these files will be reused during segment replication without network transmission.

Image

The second case is that the pre-copy is done after segment replication. After refresh, the segment(_3.si) is still not visible in the primary, and only the segment(_4.si) is copied to the replica through segment replication.

Image

In the above case, the segment replication process does not include segment(_3.si), reducing the time overhead of segment replication.

Implementation Approaches

Extend IndexReaderWarmer#warm. After the merge generates the files, process as follows:

  1. The primary dispatches the merged segment information to all replica nodes
  2. The replica pulls the required files from the primary
  3. After the replica receives all files or an exception occurs, it returns the response to the primary
  4. After the primary receives all replicas' response or times out, it completes the pre-copy merge segment process

Image

Failover

The worst case scenario is that the pre-copy merge segment process encounters an exception or timeout, which will fall back to the current situation. The merged segment is copied to the replica through segment replication.

Related component

Other

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Projects

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions