shadowing/write_at_offset_stm: ensure_truncatable API#29358
Conversation
|
/dt |
Retry command for Build#79443please wait until all jobs are finished before running the slash command |
There was a problem hiding this comment.
Pull request overview
This PR addresses a bug where truncation does not propagate to a sink if the source start offset has moved ahead of the sink's high watermark. The solution introduces an ensure_truncatable() API that guarantees the log can be safely truncated by filling gaps with placeholder batches as needed.
Changes:
- Introduces
ensure_truncatable()API towrite_at_offset_stmthat ensures truncation safety by filling gaps with compaction placeholder batches - Refactors ghost batch creation functions into generic template functions that support both ghost batches and compaction placeholder batches
- Integrates the new API into the cluster link service to handle truncation propagation
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/v/model/batch_utils.h |
Refactored batch creation functions into templates supporting both ghost and compaction placeholder batches |
src/v/model/batch_utils.cc |
Moved function implementations to header as templates and added explicit instantiations |
src/v/storage/log_reader.cc |
Updated function call to use namespace-qualified model::make_ghost_batch |
src/v/kafka/server/write_at_offset_stm.h |
Added ensure_truncatable() method declaration and new error codes |
src/v/kafka/server/write_at_offset_stm.cc |
Implemented ensure_truncatable() method and custom formatter for error codes |
src/v/kafka/server/tests/write_at_offset_stm_test.cc |
Added comprehensive tests for the new ensure_truncatable() functionality |
src/v/cluster_link/service.cc |
Integrated ensure_truncatable() call before prefix truncation in cluster link service |
No logical changes, templatized the ghost batch utility on batch type
2c900ef to
9c2f86f
Compare
Retry command for Build#79523please wait until all jobs are finished before running the slash command |
c236c57 to
677f039
Compare
Retry command for Build#79567please wait until all jobs are finished before running the slash command |
677f039 to
bec53bf
Compare
| co_return; | ||
| } | ||
|
|
||
| auto truncate_offset = std::max(_start_offset, source_start_offset); |
There was a problem hiding this comment.
Retry command for Build#79672please wait until all jobs are finished before running the slash command |
Without this the sink may be having a wrong last_replicated_offset from the last inflight replication which has eventually failed. We will end up reset-ing source to wrong offset.
Prefix truncation is a rare op, it is ok to do it inline with replication. Now that we have the ability to do offset only fetches in the client, we don't need a periodic timer loop to do it.
The client was returning an empty batch when the start offset changed. These were getting filtered out in the data_queue, so the notification never reached the replicator and truncation didn’t happen. With this change, an empty batch is treated as valid data. It still gets filtered out by the replicator, but now it triggers prefix truncation This issue was masked until now because there was a periodic truncation async loop that invoked prefix_truncation on a timer. Now that the previous commit removed it, this issue surfaced.
bec53bf to
75892af
Compare
| if (!batches.empty()) [[likely]] { | ||
| _next = kafka::next_offset( | ||
| model::offset_cast(batches.back().last_offset())); | ||
| auto total_bytes = std::accumulate( |
There was a problem hiding this comment.
The _next offset is updated even when batches is empty (after the check). However, when batches is empty, the previous behavior didn't update _next. Consider whether this change in behavior is intentional. If batches is empty, _next should remain unchanged to maintain consistency with the previous implementation.
| co_return errc::success; | ||
| } | ||
| vlog( | ||
| _log.debug, |
|
/backport v25.3.x |
If the source start offset has moved past sink HWM, propagating truncation (today) would result in an error as we currently have no way to truncate above HWM.
This PR exposes ensure_truncatable() API that can be used to ensure that start_offset is past the offset that we are trying to truncate.
Fixes: CORE-15308
Backports Required
Release Notes
Bug Fixes