Skip to content

Fix truncated stdin backups by closing pipe before signaling done#1169

Closed
arska wants to merge 1 commit intomasterfrom
fix/stdin-backup-truncation-1109
Closed

Fix truncated stdin backups by closing pipe before signaling done#1169
arska wants to merge 1 commit intomasterfrom
fix/stdin-backup-truncation-1109

Conversation

@arska
Copy link
Copy Markdown
Member

@arska arska commented Mar 22, 2026

Summary

Fix a race condition causing application-aware backups (stdin backups from pod exec) to be truncated. Multiple users reported database dumps losing the final few MB of data.

Root cause

In restic/kubernetes/pod_exec.go, the pipe writer was closed after signaling done:

// BEFORE (buggy ordering)
defer stdoutWriter.Close()  // deferred - runs after done signal
done <- true                // signals backup to call cmd.Wait()

This created a race: triggerBackup() receives the done signal and calls cmd.Wait() on restic before the pipe writer is closed. Restic's stdin may not have received all data and EOF yet.

Fix

Close the pipe writer before signaling done:

// AFTER (fixed ordering)
stdoutWriter.Close()  // ensures all data + EOF reaches restic
done <- true          // now safe to wait for restic to finish

On stream errors, CloseWithError() propagates the error through the pipe, then os.Exit(1) ensures the backup pod fails hard (preserving existing behavior for error cases).

Who is affected

Anyone using application-aware backups (k8up.io/backupcommand annotation) with large outputs (100MB+). Confirmed by multiple users with PostgreSQL and MariaDB dumps.

Fixes #1109

Checklist

  • PR contains the label area:operator
  • Commits are signed off
  • I have not made any changes in the charts/ directory

Test plan

  • Build passes
  • Existing unit tests pass
  • e2e test Improved K8up operator monitoring #11 (annotated failure) passes — validates error handling still works
  • Test with a large (100MB+) database dump via backupcommand annotation

🤖 Generated with Claude Code

@arska arska requested a review from a team as a code owner March 22, 2026 15:59
@arska arska added bug Something isn't working area:operator labels Mar 22, 2026
@arska arska requested review from TheBigLee and tobru and removed request for a team March 22, 2026 15:59
@arska arska added bug Something isn't working area:operator labels Mar 22, 2026
@arska arska self-assigned this Mar 22, 2026
@arska arska force-pushed the fix/stdin-backup-truncation-1109 branch from 53a5d14 to a963f9a Compare March 22, 2026 21:35
@tobru tobru requested a review from Kidswiss March 23, 2026 10:14
@tobru
Copy link
Copy Markdown
Contributor

tobru commented Mar 23, 2026

@Kidswiss I better leave this up to you for a review

@arska arska force-pushed the fix/stdin-backup-truncation-1109 branch 2 times, most recently from b68672a to f3dd528 Compare March 23, 2026 13:41
Application-aware backups (stdin backups from pod exec) were being
truncated because the pipe writer was closed after signaling done
to the backup trigger. This created a race condition:

1. StreamWithContext finishes writing data to the pipe
2. done <- true signals the backup trigger to call cmd.Wait()
3. defer stdoutWriter.Close() hasn't executed yet
4. restic finishes before all data flows through the pipe

Fix: close the pipe writer before signaling done, ensuring restic's
stdin receives all data and EOF before the backup command exits.

Also improved error handling: use CloseWithError on stream failure
instead of os.Exit(1), propagating the error through the pipe to
the reader.

Fixes #1109

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Aarno Aukia <aarno.aukia@vshn.ch>
@arska arska force-pushed the fix/stdin-backup-truncation-1109 branch from f3dd528 to 4d25673 Compare March 23, 2026 13:57
@arska arska removed request for TheBigLee and tobru March 23, 2026 14:01
@bastjan
Copy link
Copy Markdown
Member

bastjan commented Mar 24, 2026

Does not fix the issue as the wait on done was unneeded. The code waits for the restic command completion which should only happen when the pipe is closed and read to completion.

Superseeded by #1183 which hopefully fixes the issue.

@bastjan bastjan closed this Mar 24, 2026
@bastjan bastjan deleted the fix/stdin-backup-truncation-1109 branch March 25, 2026 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:operator bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Application Aware Backups truncate database dumps 100M+

3 participants