Skip to content

Fix case where messages can be written multiple times in Iceberg#29251

Merged
ballard26 merged 3 commits into
redpanda-data:devfrom
ballard26:datalake-dup-msg-writes
Jan 26, 2026
Merged

Fix case where messages can be written multiple times in Iceberg#29251
ballard26 merged 3 commits into
redpanda-data:devfrom
ballard26:datalake-dup-msg-writes

Conversation

@ballard26

@ballard26 ballard26 commented Jan 14, 2026

Copy link
Copy Markdown
Contributor

Callers of serde_parquet_writer::add_data_struct assume that the data struct wasn't successfully written to the file if writer_error::ok isn't returned. However, it's possible for _writer.write_row(...) to be successful then have some disk/memory reservation error. Prior to this PR these errors would result in the message being written twice as the callers would assume that the first write had failed.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

Bug Fixes

  • Fixes duplicate message writes in Iceberg when disk or memory reservation errors occur.

@ballard26 ballard26 requested review from andrwng and bharathv January 14, 2026 06:05
@ballard26 ballard26 changed the title Prevent messages from being written multiple times in Iceberg when reservation errors occur Fix case where messages can be written multiple times in Iceberg Jan 14, 2026
@ballard26 ballard26 force-pushed the datalake-dup-msg-writes branch from f3491fb to 2e688f2 Compare January 14, 2026 20:49
@ballard26 ballard26 marked this pull request as ready for review January 14, 2026 20:50
Copilot AI review requested due to automatic review settings January 14, 2026 20:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where Iceberg messages could be written multiple times when disk or memory reservation errors occur after a successful row write. The issue arose because serde_parquet_writer::add_data_struct would return an error after successfully writing a row, causing callers to retry the write.

Changes:

  • Modified error handling to defer reporting of post-write reservation errors to the next call
  • Added an _error field to track deferred errors
  • Added test infrastructure to inject OOM conditions for validation

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/v/datalake/serde_parquet_writer.h Added _error field to store deferred errors
src/v/datalake/serde_parquet_writer.cc Updated error handling to defer post-write errors and return success when row write succeeds
src/v/datalake/tests/test_data_writer.h Added OOM injection capability to noop_mem_tracker for testing
src/v/datalake/tests/serde_parquet_writer_test.cc Added test case validating correct behavior when OOM occurs after row write

Comment on lines +20 to +22
if (_error != writer_error::ok) {
co_return std::exchange(_error, writer_error::ok);
}

Copilot AI Jan 14, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential race condition if add_data_struct can be called concurrently. The check-and-exchange pattern on _error is not atomic, which could lead to lost errors or multiple coroutines seeing the same error. Consider adding synchronization or documenting that this method must not be called concurrently.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can never be called concurrently from different threads and therefore there is no need for std::exchange to be atomic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to std::exchange()? Just retain the _error state once it is set?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local_parquet_file_writer has a similar mechanism that prevents further writes once the serde_parquet_writer returns an error once. So I think the behavior will be the same whether or not we clear the error in serde_parquet_writer, at least for any user of local_parquet_file_writer. Will remove the exchange.

Comment on lines +18 to +19
// Hence, the current solution is to return those errors on the subsequent
// call to `add_data_struct`.

Copilot AI Jan 14, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should clarify what happens if the error is never consumed (e.g., if add_data_struct is not called again before finish()). Document whether finish() checks _error or if the caller needs to ensure one final call to add_data_struct to retrieve deferred errors.

Suggested change
// Hence, the current solution is to return those errors on the subsequent
// call to `add_data_struct`.
// Hence, the current solution is to defer those errors by storing them in
// `_error` and returning them on the subsequent call to `add_data_struct`.
// Callers must ensure that any deferred error is observed (either via a
// follow-up call to `add_data_struct` or by the finalization path, e.g.
// `finish()` inspecting `_error`) before treating the writer as
// successfully completed.

Copilot uses AI. Check for mistakes.
Comment thread src/v/datalake/tests/test_data_writer.h
@ballard26 ballard26 force-pushed the datalake-dup-msg-writes branch from 2e688f2 to 59fd83b Compare January 14, 2026 21:01
@tyson-redpanda tyson-redpanda added this to the v25.3.5 milestone Jan 14, 2026
@ballard26 ballard26 force-pushed the datalake-dup-msg-writes branch from 59fd83b to 2e51457 Compare January 14, 2026 22:41
@vbotbuildovich

Copy link
Copy Markdown
Collaborator

Retry command for Build#79055

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingMetricsTests.test_link_metrics

@vbotbuildovich

vbotbuildovich commented Jan 15, 2026

Copy link
Copy Markdown
Collaborator

CI test results

test results on build#79055
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkingMetricsTests test_link_metrics null integration https://buildkite.com/redpanda/redpanda/builds/79055#019bbebc-b592-475a-b597-715a62dbcf88 FLAKY 15/21 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0307, p0=0.0003, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingMetricsTests&test_method=test_link_metrics
test results on build#79607
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
TestReadReplicaService test_identical_lwms_after_delete_records {"cloud_storage_type": 1, "partition_count": 5} integration https://buildkite.com/redpanda/redpanda/builds/79607#019bf864-3212-42ed-8fa7-c7aceede37bd FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0057, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TestReadReplicaService&test_method=test_identical_lwms_after_delete_records

@ballard26

Copy link
Copy Markdown
Contributor Author

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingMetricsTests.test_link_metrics

@bharathv bharathv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just one minor question.

Comment on lines +20 to +22
if (_error != writer_error::ok) {
co_return std::exchange(_error, writer_error::ok);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to std::exchange()? Just retain the _error state once it is set?

@tyson-redpanda tyson-redpanda modified the milestones: v25.3.5, v25.3.x-next Jan 16, 2026
Callers of `serde_parquet_writer::add_data_struct` assume that the data
struct wasn't successfully written to the file if `writer_error::ok`
isn't returned. However, it's possible for `_writer.write_row(...)` to
be successful then have some disk/memory reservation error. Prior to
this commit these errors would result in the message being written
twice as the callers would assume that the first write had failed.
In a previous commit `serde_parquet_writer` was changed to always return
an `ok` result when the record is written. This resulted in the flush at
the translation_task level to no longer return a no data error. The unit
test is therefore modified in this commit to reflect the new behavior.
@ballard26 ballard26 force-pushed the datalake-dup-msg-writes branch from 2e51457 to 7f506e3 Compare January 26, 2026 03:21
@ballard26 ballard26 requested a review from bharathv January 26, 2026 03:21
@ballard26 ballard26 merged commit 8c654cc into redpanda-data:dev Jan 26, 2026
19 checks passed
@vbotbuildovich

Copy link
Copy Markdown
Collaborator

/backport v25.3.x

@vbotbuildovich

Copy link
Copy Markdown
Collaborator

/backport v25.2.x

@vbotbuildovich

Copy link
Copy Markdown
Collaborator

/backport v25.1.x

@vbotbuildovich

Copy link
Copy Markdown
Collaborator

Failed to create a backport PR to v25.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-29251-v25.1.x-724 remotes/upstream/v25.1.x
git cherry-pick -x c7c8c5c2e2 efc3a22e26 7f506e3c94

Workflow run logs.

@vbotbuildovich

Copy link
Copy Markdown
Collaborator

Failed to create a backport PR to v25.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-29251-v25.2.x-899 remotes/upstream/v25.2.x
git cherry-pick -x c7c8c5c2e2 efc3a22e26 7f506e3c94

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants