Skip to content

streamingccl: data race from Subscription.Subscribe #83694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stevendanna opened this issue Jul 1, 2022 · 1 comment · Fixed by #84394
Closed

streamingccl: data race from Subscription.Subscribe #83694

stevendanna opened this issue Jul 1, 2022 · 1 comment · Fixed by #84394
Assignees
Labels
A-tenant-streaming Including cluster streaming C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Milestone

Comments

@stevendanna
Copy link
Collaborator

stevendanna commented Jul 1, 2022

Describe the problem

When attempting to remove skip.UnderRace from TestPartitionedStreamReplicationClient, I hit:

==================
WARNING: DATA RACE
Write at 0x00c00146e328 by goroutine 714:
  runtime.slicecopy()
      GOROOT/src/runtime/slice.go:284 +0x0
  bufio.(*Reader).Read()
      GOROOT/src/bufio/bufio.go:238 +0x6c4
  io.ReadAtLeast()
      GOROOT/src/io/io.go:328 +0xdd
  io.ReadFull()
      GOROOT/src/io/io.go:347 +0x324
  github.com/lib/pq.(*conn).recvMessage()
      github.com/lib/pq/external/com_github_lib_pq/conn.go:1018 +0x2bd
  github.com/lib/pq.(*conn).recv1Buf()
      github.com/lib/pq/external/com_github_lib_pq/conn.go:1059 +0x3a
  github.com/lib/pq.(*rows).Next()
      github.com/lib/pq/external/com_github_lib_pq/conn.go:1573 +0x20e
  github.com/lib/pq.(*rows).Close()
      github.com/lib/pq/external/com_github_lib_pq/conn.go:1530 +0xc4
  database/sql.(*Rows).close.func1()
      GOROOT/src/database/sql/sql.go:3280 +0x57
  database/sql.withLock()
      GOROOT/src/database/sql/sql.go:3396 +0xb5
  database/sql.(*Rows).close()
      GOROOT/src/database/sql/sql.go:3279 +0x24d
  database/sql.(*Rows).awaitDone()
      GOROOT/src/database/sql/sql.go:2933 +0x10b
  database/sql.(*Rows).initContextClose·dwrap·27()
      GOROOT/src/database/sql/sql.go:2917 +0x71

Previous read at 0x00c00146e328 by goroutine 478:
  runtime.slicecopy()
      GOROOT/src/runtime/slice.go:284 +0x0
  database/sql.cloneBytes()
      GOROOT/src/database/sql/convert.go:494 +0x38c
  database/sql.convertAssignRows()
      GOROOT/src/database/sql/convert.go:261 +0x357
  database/sql.(*Rows).Scan()
      GOROOT/src/database/sql/sql.go:3246 +0x526
  github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamclient.(*partitionedStreamSubscription).Subscribe.func1()
      github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamclient/partitioned_stream_client.go:304 +0x104
  github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamclient.(*partitionedStreamSubscription).Subscribe()
      github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamclient/partitioned_stream_client.go:316 +0x4a1
  github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamclient.Subscription.Subscribe-fm()
      github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamclient/client.go:113 +0x6b
  github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1()
      github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:169 +0x4c
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:74 +0x91

This looks to me like the race is actually in the lib/pq driver in which it is accessing the rows buffer in its asynchronous context cancellation handling at the same time as accessing it from Scan.

See also:

Jira issue: CRDB-17212

@stevendanna stevendanna added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-tenant-streaming Including cluster streaming labels Jul 1, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jul 1, 2022

cc @cockroachdb/tenant-streaming

@shermanCRL shermanCRL added this to the 22.2 milestone Jul 7, 2022
@gh-casper gh-casper self-assigned this Jul 13, 2022
gh-casper added a commit to gh-casper/cockroach that referenced this issue Jul 13, 2022
We have been switching to pgx from lib/pq across teams in CRL.
This PR makes PartitionedStreamClient to use pgx and also in
a thread-safe way.

To avoid overhead of creating connection, APIs except Subscribe
share the same connection guarded by a mutex, and Subcribe owns
a private connection which makes sense as two Subscribe calls
don't want ot share a connection.

Fixes data race we have been seen in the past: cockroachdb#83694

Release note: None
gh-casper added a commit to gh-casper/cockroach that referenced this issue Jul 25, 2022
We have been switching to pgx from lib/pq across teams in CRL.
This PR makes PartitionedStreamClient to use pgx and also in
a thread-safe way.

To avoid overhead of creating connection, APIs except Subscribe
share the same connection guarded by a mutex, and Subcribe owns
a private connection which makes sense as two Subscribe calls
don't want ot share a connection.

Fixes data race we have been seen in the past: cockroachdb#83694

Release note: None
@craig craig bot closed this as completed in 543d5a1 Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tenant-streaming Including cluster streaming C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants