-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Help to recover from corrupted levelqueue #24912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9b054c7
to
eb0570f
Compare
eb0570f
to
9cd00ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I get why this fixes the issue. Is it only the queue-recreation in RemoveAll
?
internal *levelqueue.UniqueQueue | ||
conn string | ||
cfg *BaseConfig | ||
internal atomic.Pointer[levelqueue.UniqueQueue] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why must this be atomic now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because RemoveAll
needs to re-create the levelqueue. The levelqueue is used by different goroutines.
assert.Len(t, keys, 5) | ||
|
||
// delete the queue item key, to corrupt the queue | ||
assert.NoError(t, db.Delete(itemKey, nil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does deleting an item corrupt the queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this PR's description:
if the keys in LevelDB went out-of-sync, the LevelQueue itself doesn't have the ability to recover, eg:
- LevelQueue.Len() reports 100
- LevelQueue.LPop() reports ErrNotFound = errors.New("no key found")
It's levelqueue's bug, not Gitea's. The levelqueue.LPop was written as this.
Yes, delete all corrupted data in leveldb, and re-create the levelqueue. |
* upstream/main: Test query must have "order by" explicitly to avoid unstable results (go-gitea#24963) Help to recover from corrupted levelqueue (go-gitea#24912) [skip ci] Updated translations via Crowdin Remove meta tags `theme-color` and `default-theme` (go-gitea#24960) Add dark mode to API Docs (go-gitea#24971) Update JS dependencies (go-gitea#24969) Replace Fomantic reset module with our own (go-gitea#24948) simple docs fixes: 'pull request' page (en-us & zh-tw) link path to 'issue-pull-request-templates' (go-gitea#24961) Remove reference to caddy v1 in docs (go-gitea#24962) Improve and fix bugs surrounding reactions (go-gitea#24760) Use `[git.config]` for reflog cleaning up (go-gitea#24958) Improve logger Pause handling (go-gitea#24946) Do not output "Trace" level logs from process manager by default (go-gitea#24952) Make the 500 page load themes (go-gitea#24953) [skip ci] Updated translations via Crowdin docs: remove an extraneous whitespace (go-gitea#24949) Show `bot` label next to username when rendering autor link if the user is a bot (go-gitea#24943) Improve some Forms (go-gitea#24878) Improve queue and logger context (go-gitea#24924) Fix ref type error (go-gitea#24941)
gitea.com experienced the corrupted LevelQueue bug again.
I think the problem is clear now: if the keys in LevelDB went out-of-sync, the LevelQueue itself doesn't have the ability to recover, eg:
So it needs to dive into the LevelDB to remove all keys to recover the corrupted LevelQueue.
More comments are in TestCorruptedLevelQueue.