-
Notifications
You must be signed in to change notification settings - Fork 18k
sync: Map: internal/sync.HashTrieMap: ran out of hash bits while inserting #73427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This can help: this code reproduces bug every hour near 100% probability:
No more usage updatePoolLineUsage_keys anywhere in code. |
Thanks for the report. Can you please provide a more complete reproducer? How is |
just
And nothing more. No more any call. It just collects keys and dumps used keys for each minute. Also this repeats on ubuntu, I build soft with:
Thus of this bug only solution is to awaid using sync.Map, how to revert to previous sync.Map ? |
I cant reproduce this faster. This happens since go updated from 1.24 (.1 ?) to 1.24.2 |
Has just cought this again in another code of this project:
It happens constantly here
Keys are random of 16 bytes only. |
we need a complete runnable example |
It needs to recreate all "key" database in desired order with fatalling "key" last... I'll try. |
You can try building your Go program with Again, please be aware that sync.Map is fairly widely used, and Go 1.24 has been deployed fairly widely at this point. This is the first bug report we've received of this kind. I understand the issue is frustrating, and it may well be a real bug in the map implementation, especially if you're using the map in a way that the existing test suite doesn't check for. Unfortunately without a complete runnable example, as @seankhliao mentioned, we have little way of knowing whether that's true. Based on your last message it sounds like you're working on that; thank you! That'll be really helpful to get to the bottom of this. (Just looking at your examples briefly, I don't see usage that's out of the ordinary, or that we don't test. All the tests also run on windows/amd64, the same platform you're running on. That's why more context would be very helpful. An example that runs (doesn't need to reproduce instantly) on https://play.go.dev for example, where we can see all the code and the way it's being invoked. Or, detailed steps explaining how to set up the reproducing program and execute it.)
Hm. I don't see any relevant changes on the release branch. The difference between 1.24.0 and 1.24.2 is fairly small. (https://go.googlesource.com/go/+log/refs/heads/release-branch.go1.24) |
I have added this:
So i got:
Yes, no ANY keys in json panic log file! This version never got an error due 2 hours:
This version reproduces it every 5 minutes, but not constantly, all random:
I investigate further. If you can, you can give me detailed-log-enabled-sync-map lib package, so i use it instead of standart sync.Map package, so we can get more debug data on panic interseption inside sync.Map. |
I think your marshalling code is incorrect, causing nothing to actually get marshalled. IIRC JSON can only marshal fields that are exported.
If you edit the source code of the standard library where your Go tool is located ( |
During tests got new interesting panic
its here:
reproduce:
But val cant be nil by definition:
Here val never == nil ! All lib code i provided above. So it seems like GC removed sync.Map data element as unused? or LoadOrStore someway returned cleaned object? There is no any unsafe lib usage in code. |
It's not first. It was already before, but disappered I'm trying to catch bug in small case, not in prod. |
Here's my attempt at a reproducer based on the information you provided. package main
import (
"math/rand/v2"
"sync"
"time"
)
func main() {
var s sync.Map
go func() {
for {
var keys []string
s.Range(func(k, _ any) bool {
keys = append(keys, k.(string))
return true
})
for _, key := range keys {
s.Delete(key)
}
time.Sleep(10 * time.Millisecond)
}
}()
for range 4 {
go func() {
for {
s.LoadOrStore(randomString(), new(int32))
time.Sleep(100 * time.Microsecond)
}
}()
}
select {}
}
func randomString() string {
var b [16]byte
for i := range b {
b[i] = rand.N[byte](0x7e-0x30) + 0x30 // random valid ASCII
}
return string(b[:])
} Does this seem accurate to you? If not, why? FTR, I cannot produce any crash with this code. (Including if I increase the rate at which entries are inserted or deleted, but I wanted to match the ~100 active keys (this is 400) in the map.) Small note:
is unnecessarily expensive. You can use the |
I think you're misunderstanding what happened in #69534. The reporter discovered the same bug that was fixed by https://go.googlesource.com/go/+/21ac23a96f204dfb558a8d3071380c1d105a93ba, but made a mistake when they thought they tested with the fix. The issue was indeed fixed -- it did not just go away. Also, the bug in that issue required using |
Not every, just all current, but there are new keys, which are not dumped to log, so i cant use "Clear". But thanx. Try to reproduce with this:
and whole code of lib package i provided in This will be "clean" reproducing code. There are near 10-20 calls per second of it, bug appears after 5-10 minutes of app is started, so it takes near 15 * 60 * 10 calls to reproduce. I'm trying to catch "key" data to make small 100% reproducer now. Thanx for help. |
Huge amount of this panic since i returned usage of sync.Map.
This panic call string ierarchy is written by github.com/MasterDimmy/zipologger module, so i can see, who started goroutine and all call tree instead of just function name of standart panic name. ( there is no bug connected with this lib, it works 5+ years ok ). There is no data for "key" value inside sync.Map after it has been catched. I cant reproduce this separetly from prod app now. |
Hi @MasterDimmy, you should definitely take one of scenarios where this seems reproducible and build/run with the race detector enabled ( |
-race not builds in cross building (on Windows 10 x64 for Linux x64), it builds ok without -race |
Hi @MasterDimmy , three questions for you: Question 1: To keep things simple, can you build on Linux with Question 2: If not, can you build on Windows with Question 3: In #73427 (comment), @mknyszek attempted a standalone reproducer . Can you more directly answer @mknyszek's question about that standalone reproducer:
For example, you could say "That seems accurate based on the data I have so far", or "I suspect that won't reproduce the problem because the standalone reproducer is not doing X", or "I'm certain that won't reproduce the problem because it is doing Y". As you answer, it is helpful to communicate what you know with certainty, vs. what has some perhaps inconclusive data, vs. what is based on a suspicion or hunch. |
q1 q2 q3
But i cant reproduce load and panic on windows. It seems it happens just in cross-compilation. In code
i take value by key, then work with it, maybe there is any swaps in LoadOrStore that moved original value &Parent{} away, The case is usage of sync.Map as child of another sync.Map, which can be moved due to swap in inner LoadOrStore. So it moves part of itself. |
One general comment about sync.Map is that I believe it does not provide any locking of the values stored within the sync.Map. In other words, if you get a certain object out of a sync.Map in one goroutine and also get the same object out of the sync.Map in another goroutine and then manipulate that object without any additional synchronization protection, sync.Map is not providing any additional protection of the data. I don't know if that applies here. Separately, thank you for your answers. Part of the reason I asked about build on Windows & run on Windows (q1), or build on Linux & run on Linux (q2), is that would also eliminate the cross compilation step. (It sounds like you are using gox for cross compilation. I'm only lightly familiar with gox, but as I understand it, it does some clever things to help cgo work during cross compilation, and some maybe small chance that is involved here. In any event, eliminating cross compilation would help with triage). In addition to identifying races (which at least in theory might be related to issue here), For building & running on Linux with |
I see you added this in an edit to your prior comment (which is fine; I just missed it at first). I might have misunderstood, but I will note that the sync.Map documentation includes:
|
I'll try to build -race on prod server, it takes time.
Sure! But i didnt do any copy (check code above i provided, there is nothing instead of this), i think moving do sync.Map inside itself !! - inner swap function in LoadOrStore i think. To reproduce: create sync.Map as values inside parent sync.Map, then insert values into parent sync.Map - it moves childs anywhere inside? swaps? And in goroutine loop child sync.Map , in one moment it loses its control bits and panics. |
This swap i talking about. It moves sync.Map as value to another place.
Trying to reproduce. |
Go version
go1.24.2 windows/amd64
Output of
go env
in your module/workspace:What did you do?
var updatePoolLineUsage_keys sync.Map // key => ip
var key, ip string
...
updatePoolLineUsage_keys.Store(key, ip)
What did you see happen?
Panic in
internal/sync.HashTrieMap: ran out of hash bits while inserting
STACK:
goroutine 870014 [running]:
panic({0xfb2660?, 0x1434870?})
D:/go/src/runtime/panic.go:792 +0x132
internal/sync.(*HashTrieMap[...]).expand(0x14602c0?, 0xc003e8aba0, 0xc008401bc0, 0xa879ce25c8134d64, 0x0, 0xc004f668c0)
D:/go/src/internal/sync/hashtriemap.go:181 +0x1e5
internal/sync.(*HashTrieMap[...]).Swap(0x14602c0, {0xfb2660, 0xc0048a8950}, {0xfb2660, 0xc0048a8960})
D:/go/src/internal/sync/hashtriemap.go:272 +0x397
internal/sync.(*HashTrieMap[...]).Store(...)
D:/go/src/internal/sync/hashtriemap.go:200
sync.(*Map).Store(...)
D:/go/src/sync/hashtriemap.go:55
What did you expect to see?
ok
The text was updated successfully, but these errors were encountered: