-
Notifications
You must be signed in to change notification settings - Fork 497
chore(loki.source.file): add file rotation stress test #5137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This reverts commit 859423c.
4b49538 to
d681829
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces comprehensive stress tests for file rotation in the loki.source.file component to identify and reproduce issues during log file rotation. The tests simulate three common rotation strategies (rename, copytruncate, and delete) under varying load conditions and verify that logs are captured reliably.
Key changes:
- Adds a new test file with infrastructure to simulate realistic file rotation scenarios
- Implements stress tests at two intensity levels: QuickSmoke (fast, runs in all test modes) and HighVolume (60s duration, skipped in short mode)
- Currently sets permissive success rate thresholds (95-97%) to allow for known issues, with plans to increase to 100% as fixes are implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.
| } | ||
| return fmt.Errorf("stat failed: %w", err) | ||
| } | ||
| _ = info |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line assigns the info variable but never uses it. The underscore assignment suggests the value is intentionally discarded. Consider removing the info variable assignment entirely and using the blank identifier directly in the os.Stat call if the file info is not needed.
| // Small delay to simulate real-world scenario | ||
| time.Sleep(10 * time.Millisecond) |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delay of 10 milliseconds is hardcoded but the comment describes it as "simulating real-world scenario." Consider making this configurable through the testConfig structure so different test scenarios can vary the delay to test different timing conditions. This would improve test flexibility and make the purpose of the delay more explicit.
| gracePeriod := cfg.rotationInterval * 2 | ||
| if gracePeriod < 2*time.Second { | ||
| gracePeriod = 2 * time.Second | ||
| } | ||
| if gracePeriod > 10*time.Second { | ||
| gracePeriod = 10 * time.Second | ||
| } |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gracePeriod calculation uses specific bounds (minimum 2 seconds, maximum 10 seconds) that appear to be arbitrary. These magic numbers should be extracted as named constants with explanatory comments about why these specific values were chosen. This would improve code maintainability and make it easier to adjust these values in the future if needed.
| componentCancel() | ||
|
|
||
| // Wait a bit more for final log delivery | ||
| time.Sleep(500 * time.Millisecond) |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded sleep duration of 500 milliseconds is a magic number. Consider extracting this as a named constant with a descriptive name like finalLogDeliveryWait to make its purpose clearer and easier to adjust if needed.
| } | ||
|
|
||
| // Give writers a moment to create initial files | ||
| time.Sleep(100 * time.Millisecond) |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded sleep duration of 100 milliseconds is a magic number. Consider extracting this as a named constant with a descriptive name like initialFileCreationWait to make its purpose clearer and easier to adjust if needed.
| time.Sleep(100 * time.Millisecond) | |
| const initialFileCreationWait = 100 * time.Millisecond | |
| time.Sleep(initialFileCreationWait) |
We have discovered some possible issues during the file rotation in loki.source.file.
This adds a stress test that reproduces the issue.
We currently set a very permissive required success rate, even as low as 97% of log lines to succeed. But as we introduce more fixes, we will want this to go up to 100% ideally.