-
Notifications
You must be signed in to change notification settings - Fork 6
fix: Potential fix for #37 Mount locks up on listing bucket #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`(inode *Inode) sealDir()` may try to lock the parent via
- `(parent *Inode) removeExpired(from string)`
- `(parent *Inode) removeChildUnlocked(inode *Inode)`
at
```
for _, dh := range parent.dir.handles {
dh.mu.Lock()
dh.lastInternalOffset = -1
dh.mu.Unlock()
}
```
So, unlock the parent up front in order to avoid a dead-lock.
Caveat: I am not deep enough into the code to say this won't create other
problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR addresses a critical deadlock issue in the directory listing functionality where the mount could lock up when listing buckets. The deadlock occurred because listObjectsFlat() held a DirHandle mutex while calling sealDir(), which eventually tried to acquire the same mutex through removeChildUnlocked().
Key changes:
- Unlocks
dh.mubefore callingdh.inode.sealDir() - Re-locks
dh.muimmediately aftersealDir()completes - Prevents the circular locking dependency that caused mount lockups
The fix is minimal, targeted, and directly addresses the root cause identified in issue #37. While the author notes uncertainty about potential side effects, the change follows good locking practices by avoiding holding locks across function calls that may need the same locks.
Confidence Score: 4/5
- This PR is safe to merge with minimal risk - it fixes a real deadlock issue
- Score reflects a solid fix for a critical deadlock issue with minimal code changes. The solution is targeted and follows good concurrency practices, though the brief unlock/relock window could theoretically introduce race conditions that warrant careful testing.
- No files require special attention - the change in core/dir.go is well-targeted and minimal
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| core/dir.go | 4/5 | Fixes a potential deadlock in directory listing by unlocking DirHandle mutex before calling sealDir() |
Sequence Diagram
sequenceDiagram
participant ListThread as listObjectsFlat()
participant DH as DirHandle
participant Inode as Inode
participant Parent as Parent Inode
participant ChildDH as Child DirHandle
Note over ListThread: BEFORE FIX - Deadlock scenario
ListThread->>DH: dh.mu.Lock() (held)
ListThread->>Inode: dh.inode.sealDir()
Inode->>Inode: removeExpired("")
Inode->>Parent: removeChildUnlocked(child)
Parent->>ChildDH: dh.mu.Lock() (DEADLOCK!)
Note over ChildDH: Cannot acquire - already held by ListThread
Note over ListThread: AFTER FIX - Deadlock prevented
ListThread->>DH: dh.mu.Lock() (held)
ListThread->>DH: dh.mu.Unlock() (unlock before sealDir)
ListThread->>Inode: dh.inode.sealDir()
Inode->>Inode: removeExpired("")
Inode->>Parent: removeChildUnlocked(child)
Parent->>ChildDH: dh.mu.Lock() (SUCCESS - no conflict)
Parent->>ChildDH: dh.lastInternalOffset = -1
Parent->>ChildDH: dh.mu.Unlock()
ListThread->>DH: dh.mu.Lock() (re-acquire after sealDir)
1 file reviewed, no comments
| dh.inode.dir.listMarker = lastName | ||
| } | ||
| } else { | ||
| dh.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your diagnosis is correct that it is a classic case of lock order violation. But there is a race window here in your fix. Between unlocking and relocking dh.mu, the directory handle state could potentially be modified by another thread. It would be a good idea to check where the dh state can change while unlocked.
|
Can you try the fix in #39 ? |
I rebased your fix onto As a side note, I tried that problematic unlock/relock workaround after having seen a similar pattern in the existing code which you might want to review: Line 691 in 4e48555
Line 709 in 4e48555
|
Thanks. I will take a look at that too. Closing this PR. |
Fixed it in #43 |
(inode *Inode) sealDir()may try to lock the parent via(parent *Inode) removeExpired(from string)(parent *Inode) removeChildUnlocked(inode *Inode)atSo, unlock the parent up front in order to avoid a dead-lock.
Caveat: I am not deep enough into the code to say this won't create other problems.