Skip to content

Help to recover from corrupted levelqueue #24912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 29, 2023

Conversation

wxiaoguang
Copy link
Contributor

@wxiaoguang wxiaoguang commented May 24, 2023

gitea.com experienced the corrupted LevelQueue bug again.

I think the problem is clear now: if the keys in LevelDB went out-of-sync, the LevelQueue itself doesn't have the ability to recover, eg:

  • LevelQueue.Len() reports 100
  • LevelQueue.LPop() reports ErrNotFound = errors.New("no key found")

So it needs to dive into the LevelDB to remove all keys to recover the corrupted LevelQueue.

More comments are in TestCorruptedLevelQueue.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label May 24, 2023
@wxiaoguang wxiaoguang added this to the 1.20.0 milestone May 24, 2023
@wxiaoguang wxiaoguang force-pushed the help-corrupted-levelquque branch 3 times, most recently from 9b054c7 to eb0570f Compare May 24, 2023 13:31
@wxiaoguang wxiaoguang force-pushed the help-corrupted-levelquque branch from eb0570f to 9cd00ad Compare May 24, 2023 13:57
@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels May 25, 2023
Copy link
Member

@KN4CK3R KN4CK3R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I get why this fixes the issue. Is it only the queue-recreation in RemoveAll?

internal *levelqueue.UniqueQueue
conn string
cfg *BaseConfig
internal atomic.Pointer[levelqueue.UniqueQueue]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why must this be atomic now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because RemoveAll needs to re-create the levelqueue. The levelqueue is used by different goroutines.

assert.Len(t, keys, 5)

// delete the queue item key, to corrupt the queue
assert.NoError(t, db.Delete(itemKey, nil))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does deleting an item corrupt the queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this PR's description:

if the keys in LevelDB went out-of-sync, the LevelQueue itself doesn't have the ability to recover, eg:

  • LevelQueue.Len() reports 100
  • LevelQueue.LPop() reports ErrNotFound = errors.New("no key found")

It's levelqueue's bug, not Gitea's. The levelqueue.LPop was written as this.

@wxiaoguang
Copy link
Contributor Author

I don't think I get why this fixes the issue. Is it only the queue-recreation in RemoveAll?

Yes, delete all corrupted data in leveldb, and re-create the levelqueue.

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels May 28, 2023
@delvh delvh added the type/bug label May 28, 2023
@lunny lunny added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label May 29, 2023
@lunny lunny merged commit 84c8ab9 into go-gitea:main May 29, 2023
@GiteaBot GiteaBot removed the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label May 29, 2023
@wxiaoguang wxiaoguang deleted the help-corrupted-levelquque branch May 29, 2023 02:58
zjjhot added a commit to zjjhot/gitea that referenced this pull request May 29, 2023
* upstream/main:
  Test query must have "order by" explicitly to avoid unstable results (go-gitea#24963)
  Help to recover from corrupted levelqueue (go-gitea#24912)
  [skip ci] Updated translations via Crowdin
  Remove meta tags `theme-color` and `default-theme` (go-gitea#24960)
  Add dark mode to API Docs (go-gitea#24971)
  Update JS dependencies (go-gitea#24969)
  Replace Fomantic reset module with our own (go-gitea#24948)
  simple docs fixes: 'pull request' page (en-us & zh-tw) link path to 'issue-pull-request-templates' (go-gitea#24961)
  Remove reference to caddy v1 in docs (go-gitea#24962)
  Improve and fix bugs surrounding reactions (go-gitea#24760)
  Use `[git.config]` for reflog cleaning up (go-gitea#24958)
  Improve logger Pause handling (go-gitea#24946)
  Do not output "Trace" level logs from process manager by default (go-gitea#24952)
  Make the 500 page load themes (go-gitea#24953)
  [skip ci] Updated translations via Crowdin
  docs: remove an extraneous whitespace (go-gitea#24949)
  Show `bot` label next to username when rendering autor link if the user is a bot (go-gitea#24943)
  Improve some Forms (go-gitea#24878)
  Improve queue and logger context (go-gitea#24924)
  Fix ref type error (go-gitea#24941)
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Aug 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants