Skip to content

fix: JetStream consumer lock leak on start sequence error#8230

Open
sanbricio wants to merge 1 commit into
nats-io:mainfrom
sanbricio:fix/jetstream-consumer-lock-leak
Open

fix: JetStream consumer lock leak on start sequence error#8230
sanbricio wants to merge 1 commit into
nats-io:mainfrom
sanbricio:fix/jetstream-consumer-lock-leak

Conversation

@sanbricio
Copy link
Copy Markdown

Summary

This PR fixes a lock leak in JetStream consumer creation.

In addConsumerWithAssignment, mset.mu is held while creating a consumer. In the direct/standalone path, if o.selectStartingSeqNo() returns an error, the function currently returns without releasing mset.mu.

This adds the missing mset.mu.Unlock() before returning the error.

Resolves #8229

Impact

Without this unlock, the stream mutex can remain locked after a starting sequence error, which may cause later operations on the same stream to block.

Changes

  • Add the missing mset.mu.Unlock() before returning from the selectStartingSeqNo() error path.

Detection

This issue was reported by goconcurrencylint.

Testing

  • Not added yet. This is a minimal error-path fix identified by static analysis.

Signed-off-by: Santiago Bricio sanbriciorojas11@gmail.com

@sanbricio sanbricio requested a review from a team as a code owner May 24, 2026 13:53
Copy link
Copy Markdown
Member

@MauriceVanVeen MauriceVanVeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Could you add sign-off to your commit? See also: CONTRIBUTING.md#sign-off

…hAssignment

Signed-off-by: Santiago Bricio Rojas <sanbriciorojas11@gmail.com>
@sanbricio sanbricio force-pushed the fix/jetstream-consumer-lock-leak branch from 69245ae to a1d54d3 Compare May 24, 2026 20:59
@sanbricio
Copy link
Copy Markdown
Author

@MauriceVanVeen Hi!, thanks for the fast review, sign-off added!

@sanbricio sanbricio requested a review from MauriceVanVeen May 24, 2026 21:03
@MauriceVanVeen
Copy link
Copy Markdown
Member

@claude, review.

@synadia-claude-reviewer
Copy link
Copy Markdown

synadia-claude-reviewer Bot commented May 25, 2026

Claude finished @MauriceVanVeen's task in 2m 37s —— View job


Code Review

  • Gather context and read changed code
  • Analyze the fix for correctness
  • Check for similar patterns or related issues
  • Post review findings

Fix is correct — the mutex leak is real and the placement of the new mset.mu.Unlock() is right.

mset.mu is locked at consumer.go:1073 and held throughout consumer initialization. Every other early-exit path after that lock (lines 1075, 1092, 1100, 1104, 1113, 1131, 1141, 1145, 1169, 1213, 1251, 1259, 1331, 1346) releases the mutex before returning. The selectStartingSeqNo() error path at line 1338 was the lone missing Unlock().


One additional concern — potential resource leak

All the other error returns that occur after the consumer object o is constructed (after ~line 1178) also call o.deleteWithoutAdvisory() to clean up:

// e.g. line 1213-1216
mset.mu.Unlock()
o.deleteWithoutAdvisory()
return nil, NewJSConsumerNameTooLongError(JSMaxNameLen)

// e.g. line 1251-1254
mset.mu.Unlock()
o.deleteWithoutAdvisory()
return nil, NewJSConsumerBadDurableNameError()

// e.g. line 1259-1263
mset.mu.Unlock()
o.deleteWithoutAdvisory()
return nil, NewJSConsumerStoreFailedError(err)

The selectStartingSeqNo() error path is past all of those, so at the point of the error:

  • o.client (line 1183) and o.sysc (line 1184) — internal JetStream clients — have been created and registered with accounts
  • o.ackMsgs IPQueue (line 1236) has been created
  • o.nextMsgReqs IPQueue (line 1242) may have been created (pull mode)
  • o.store (line 1265) may have been created for non-direct standalone consumers

Without o.deleteWithoutAdvisory(), those resources are leaked. The deleteWithoutAdvisory() path safely handles the case where o hasn't yet been registered in mset.consumersremoveConsumer just does a no-op delete.

Suggested fix: Fix this →

if err := o.selectStartingSeqNo(); err != nil {
    mset.mu.Unlock()
    o.deleteWithoutAdvisory()  // clean up client, sysc, store, queues
    return nil, err
}

Minor: no error wrapping

Other consumer-creation errors are wrapped with NewJS* helpers (e.g. NewJSConsumerStoreFailedError). The selectStartingSeqNo() failure bubbles up the raw store error. That's consistent with the pre-existing code there, so not a blocker, but worth noting if error-type consistency matters to callers.


Summary: The mutex unlock fix is correct and addresses the reported deadlock. Consider also adding o.deleteWithoutAdvisory() to avoid leaking the consumer's internal resources on this error path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consumer creation can leak stream lock on starting sequence errors

2 participants