Skip to content

Conversation

@thampiotr
Copy link
Contributor

@thampiotr thampiotr commented Dec 17, 2025

We have discovered some possible issues during the file rotation in loki.source.file.
This adds a stress test that reproduces the issue.
We currently set a very permissive required success rate, even as low as 97% of log lines to succeed. But as we introduce more fixes, we will want this to go up to 100% ideally.

go test -v -run "TestFileRotationStress" ./internal/component/loki/source/file/ 2>&1 | grep "Success rate" | column -t
rotation_stress_test.go:645:  [TestFileRotationStress_QuickSmoke/rename]        Success  rate:  99.66%   (582/584        lines,  2     dropped)  |  Required:  95.0%
rotation_stress_test.go:645:  [TestFileRotationStress_QuickSmoke/delete]        Success  rate:  99.31%   (573/577        lines,  4     dropped)  |  Required:  95.0%
rotation_stress_test.go:645:  [TestFileRotationStress_QuickSmoke/copytruncate]  Success  rate:  100.00%  (580/580        lines,  0     dropped)  |  Required:  97.0%
rotation_stress_test.go:689:  [TestFileRotationStress_HighVolume/delete]        Success  rate:  96.07%   (117448/122247  lines,  4799  dropped)  |  Required:  95.0%
rotation_stress_test.go:689:  [TestFileRotationStress_HighVolume/copytruncate]  Success  rate:  99.38%   (122117/122885  lines,  768   dropped)  |  Required:  97.0%
rotation_stress_test.go:689:  [TestFileRotationStress_HighVolume/rename]        Success  rate:  97.88%   (122673/125327  lines,  2654  dropped)  |  Required:  95.0%

@thampiotr thampiotr force-pushed the thampiotr/add-stress-test-for-loki-file-rotation branch from 4b49538 to d681829 Compare December 17, 2025 15:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces comprehensive stress tests for file rotation in the loki.source.file component to identify and reproduce issues during log file rotation. The tests simulate three common rotation strategies (rename, copytruncate, and delete) under varying load conditions and verify that logs are captured reliably.

Key changes:

  • Adds a new test file with infrastructure to simulate realistic file rotation scenarios
  • Implements stress tests at two intensity levels: QuickSmoke (fast, runs in all test modes) and HighVolume (60s duration, skipped in short mode)
  • Currently sets permissive success rate thresholds (95-97%) to allow for known issues, with plans to increase to 100% as fixes are implemented

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

}
return fmt.Errorf("stat failed: %w", err)
}
_ = info
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line assigns the info variable but never uses it. The underscore assignment suggests the value is intentionally discarded. Consider removing the info variable assignment entirely and using the blank identifier directly in the os.Stat call if the file info is not needed.

Copilot uses AI. Check for mistakes.
Comment on lines 251 to 252
// Small delay to simulate real-world scenario
time.Sleep(10 * time.Millisecond)
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delay of 10 milliseconds is hardcoded but the comment describes it as "simulating real-world scenario." Consider making this configurable through the testConfig structure so different test scenarios can vary the delay to test different timing conditions. This would improve test flexibility and make the purpose of the delay more explicit.

Copilot uses AI. Check for mistakes.
Comment on lines +628 to +634
gracePeriod := cfg.rotationInterval * 2
if gracePeriod < 2*time.Second {
gracePeriod = 2 * time.Second
}
if gracePeriod > 10*time.Second {
gracePeriod = 10 * time.Second
}
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gracePeriod calculation uses specific bounds (minimum 2 seconds, maximum 10 seconds) that appear to be arbitrary. These magic numbers should be extracted as named constants with explanatory comments about why these specific values were chosen. This would improve code maintainability and make it easier to adjust these values in the future if needed.

Copilot uses AI. Check for mistakes.
componentCancel()

// Wait a bit more for final log delivery
time.Sleep(500 * time.Millisecond)
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded sleep duration of 500 milliseconds is a magic number. Consider extracting this as a named constant with a descriptive name like finalLogDeliveryWait to make its purpose clearer and easier to adjust if needed.

Copilot uses AI. Check for mistakes.
}

// Give writers a moment to create initial files
time.Sleep(100 * time.Millisecond)
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded sleep duration of 100 milliseconds is a magic number. Consider extracting this as a named constant with a descriptive name like initialFileCreationWait to make its purpose clearer and easier to adjust if needed.

Suggested change
time.Sleep(100 * time.Millisecond)
const initialFileCreationWait = 100 * time.Millisecond
time.Sleep(initialFileCreationWait)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant