Skip to content

runtime: TestRuntimeLockMetricsAndProfile/runtime.lock failures #70602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gopherbot opened this issue Nov 27, 2024 · 10 comments
Open

runtime: TestRuntimeLockMetricsAndProfile/runtime.lock failures #70602

gopherbot opened this issue Nov 27, 2024 · 10 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@gopherbot
Copy link
Contributor

gopherbot commented Nov 27, 2024

#!watchflakes
default <- test ~ `TestRuntimeLockMetricsAndProfile/runtime.lock/`

Issue created automatically to collect these failures.

Example (log):

=== RUN   TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1
    metrics_test.go:1065: lock contention growth in runtime/pprof's view  (1.241778s)
    metrics_test.go:1066: lock contention growth in runtime/metrics' view (1.241730s)
    metrics_test.go:1104: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=199 value=1207128831
    metrics_test.go:1192: mutex profile reported contention count different from the known true count (199 != 200)
--- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (1.25s)

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 27, 2024
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime:cpu4" && test == "TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1"
2024-11-27 15:22 gotip-linux-loong64 go@7e095086 runtime:cpu4.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log)
=== RUN   TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1
    metrics_test.go:1065: lock contention growth in runtime/pprof's view  (1.241778s)
    metrics_test.go:1066: lock contention growth in runtime/metrics' view (1.241730s)
    metrics_test.go:1104: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=199 value=1207128831
    metrics_test.go:1192: mutex profile reported contention count different from the known true count (199 != 200)
--- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (1.25s)

watchflakes

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 27, 2024
@mknyszek mknyszek added the arch-loong64 Issues solely affecting the loongson architecture. label Dec 4, 2024
@mknyszek mknyszek added this to the Backlog milestone Dec 4, 2024
@mknyszek
Copy link
Contributor

mknyszek commented Dec 4, 2024

CC @golang/loong64

@abner-chenc
Copy link
Contributor

CC @golang/loong64

I'm trying to reproduce this error.

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime:cpu4" && test == "TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1"
2025-03-06 21:41 gotip-linux-ppc64le_power10 go@232dfd22 runtime:cpu4.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log)
=== RUN   TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1
    metrics_test.go:1065: lock contention growth in runtime/pprof's view  (0.082601s)
    metrics_test.go:1066: lock contention growth in runtime/metrics' view (0.082592s)
    metrics_test.go:1104: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=199 value=80474842
    metrics_test.go:1192: mutex profile reported contention count different from the known true count (199 != 200)
--- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.10s)

watchflakes

@mknyszek
Copy link
Contributor

Looks like (perhaps unsurprisingly) this is a more general problem.

@mknyszek mknyszek removed the arch-loong64 Issues solely affecting the loongson architecture. label Mar 10, 2025
@mknyszek mknyszek changed the title runtime:cpu4: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 failures runtime:cpu4: TestRuntimeLockMetricsAndProfile/runtime.lock failures Mar 12, 2025
@mknyszek mknyszek changed the title runtime:cpu4: TestRuntimeLockMetricsAndProfile/runtime.lock failures runtime: TestRuntimeLockMetricsAndProfile/runtime.lock failures Mar 12, 2025
@mknyszek mknyszek self-assigned this Mar 12, 2025
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- test ~ `TestRuntimeLockMetricsAndProfile/runtime.lock/`
2025-02-03 16:20 gotip-linux-arm64-asan-clang15 go@caedd0a8 runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log)
=== RUN   TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1
    metrics_test.go:1065: lock contention growth in runtime/pprof's view  (0.052108s)
    metrics_test.go:1066: lock contention growth in runtime/metrics' view (0.052104s)
    metrics_test.go:1104: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=199 value=44066160
    metrics_test.go:1192: mutex profile reported contention count different from the known true count (199 != 200)
--- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.06s)
2025-03-06 21:35 gotip-linux-arm64-race go@584e6310 runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log)
=== RUN   TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1
    metrics_test.go:1065: lock contention growth in runtime/pprof's view  (0.037789s)
    metrics_test.go:1066: lock contention growth in runtime/metrics' view (0.037789s)
    metrics_test.go:1104: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=199 value=35586077
    metrics_test.go:1192: mutex profile reported contention count different from the known true count (199 != 200)
--- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.05s)

watchflakes

@gopherbot
Copy link
Contributor Author

Change https://go.dev/cl/658095 mentions this issue: runtime: log profile when mutex profile test fails

gopherbot pushed a commit that referenced this issue Mar 15, 2025
For #70602

Change-Id: I3f723ebc17ef690d5be7f4f948c9dd1f890196fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/658095
Auto-Submit: Rhys Hiltner <[email protected]>
Reviewed-by: David Chase <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- test ~ `TestRuntimeLockMetricsAndProfile/runtime.lock/`
2025-04-03 16:21 gotip-linux-loong64 go@f2e90767 runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log)
=== RUN   TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1
    metrics_test.go:1065: lock contention growth in runtime/pprof's view  (0.045230s)
    metrics_test.go:1066: lock contention growth in runtime/metrics' view (0.045227s)
    metrics_test.go:1104: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=199 value=41452316
    metrics_test.go:1199: mutex profile reported contention count different from the known true count (199 != 200)
    metrics_test.go:1117: profile:
        PeriodType: contentions count
        Period: 1
        Time: 2025-04-04 00:43:44.550064546 +0800 CST
        Samples:
...
            25: 0x120095a27 M=1 runtime.morestack /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_loong64.s:281 s=247
            26: 0x1200562c8 M=1 runtime.unlock /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/lock_spinbit.go:253 s=252
                     runtime.findRunnable /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/proc.go:3500 s=3297
        Mappings
        1: 0x120004000/0x120438000/0x4000 /home/swarming/.swarming/w/ir/x/t/go-build1758899072/b2029/runtime.test 589a72cf3c20f6d661701351ebae491b3d01fdd0 [FN]
        2: 0x7ffff2550000/0x7ffff2658000/0x28000 /usr/lib64/libc.so.6 fc4f10bb76753448c6ad4aa4556dbc8009a1e7bd 
        3: 0x7ffff2718000/0x7ffff2720000/0x4000 /usr/lib64/libresolv.so.2 04ac657828d7be4423962e670b60ac88e647618d 
        4: 0x7ffff2740000/0x7ffff2760000/0x4000 /usr/lib64/ld-linux-loongarch-lp64d.so.1 0a012b0a55332338795c9d68dea6b74e71c5b88d 
        5: 0x7fffff99c000/0x7fffff9a0000/0x0 [vdso]  
--- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.05s)

watchflakes

@rhysh
Copy link
Contributor

rhysh commented Apr 21, 2025

The 2025-04-03 failure has 199 samples where the test expects 200. It has 1 sample in runtime._LostContendedRuntimeLock, and 4 samples with other call stacks that correspond to sched.lock.

I suspect that the test is generating the full 200 contention events, and that one of them is being reported as _LostContendedRuntimeLock.

The alternative I had in mind was that runtime.mutexContended was indicating to the test code that reportable contention was present and that the test should proceed, when the lock was not really in that state. I see a bug like that in lock_spinbit.go's mutexContended (which I'll fix), but it wouldn't explain the 2024-11-27 failure on linux/loong64, which would have been using lock_futex_tristate.go (prior to CL 658455).

@gopherbot
Copy link
Contributor Author

Change https://go.dev/cl/667095 mentions this issue: runtime: fix test of when a mutex is contended

gopherbot pushed a commit that referenced this issue Apr 22, 2025
This is used only in tests that verify reports of runtime-internal mutex
contention.

For #66999
For #70602

Change-Id: I72cb1302d8ea0524f1182ec892f5c9a1923cddba
Reviewed-on: https://go-review.googlesource.com/c/go/+/667095
Reviewed-by: Michael Knyszek <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Rhys Hiltner <[email protected]>
Reviewed-by: Junyang Shao <[email protected]>
@mknyszek mknyszek marked this as a duplicate of #73467 Apr 23, 2025
@gopherbot gopherbot marked this as not a duplicate of #73467 Apr 25, 2025
@prattmic prattmic marked this as a duplicate of #73487 May 7, 2025
@gopherbot gopherbot marked this as not a duplicate of #73487 May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: No status
Development

No branches or pull requests

4 participants