Skip to content

Merge: Avoid docstore stacking for small segments #1053

@PSeitz

Description

@PSeitz

In the merge code the doc store stacking avoids re-compression of blocks by stacking the blocks of the existing segments, which is great, since it is much faster.
But there maybe scenarios where we have many small committed segments. In that case, we should try to go for the slower block recreation until we have segments with large enough segments to stack them.

Example:
Segment 1..8 to merged, doc store blocks sizes:

|1kb block|2kb block|1kb block|3kb block|1kb block|2kb block|1kb block|3kb block|

Currently we would carry them over

|1kb block|2kb block|1kb block|3kb block|1kb block|2kb block|1kb block|3kb block|

In this case we would want them to be merged into one block

|14kb block|

Open Question, what should be the threshold to start stacking:
e.g. when we have on average 5 full blocks per segment, we could start stacking.

Complete Alternative: Have a global cross segment doc store.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions