-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Describe the bug
We upgraded vmauth from 1.97.1 to 1.108.1 and when we increased the load on the cluster to 1M datapoints per second and 100+ queries per second we started seeing errors in the Request error rate chart in Victoria Metrics dashboard - see attached screenshot.
After some investigation it turned out that vmauth was crashing with the following stack trace:
[signal SIGSEGV: segmentation violation code=0x1 addr=0x2c pc=0x6b42e4]
goroutine 25756 [running]:
net/http.(*body).Read(0xc0002769d0?, {0xc000130000?, 0x0?, 0x1?})
net/http/transfer.go:831 +0x44
main.(*readTrackingBody).Read(0xc0002b1630, {0xc000130000, 0x46c405?, 0x2000})
[github.com/VictoriaMetrics/VictoriaMetrics/app/vmauth/main.go:622](http://github.com/VictoriaMetrics/VictoriaMetrics/app/vmauth/main.go:622) +0x122
io.discard.ReadFrom({}, {0x7f2d05c49d18, 0xc0002b1630})
io/io.go:666 +0x6d
io.copyBuffer({0x97a080, 0xc210e0}, {0x7f2d05c49d18, 0xc0002b1630}, {0xc000184000, 0x8000, 0x8000})
io/io.go:415 +0x151
io.CopyBuffer({0x97a080?, 0xc210e0?}, {0x7f2d05c49d18?, 0xc0002b1630?}, {0xc000184000?, 0xc40?, 0xc0000b6e08?})
io/io.go:402 +0x36
net/http.(*transferWriter).doBodyCopy(0xc000166320, {0x97a080, 0xc210e0}, {0x7f2d05c49d18, 0xc0002b1630})
net/http/transfer.go:416 +0xe5
net/http.(*transferWriter).writeBody(0xc000166320, {0x9796a0, 0xc0002441c0})
net/http/transfer.go:376 +0x3a5
net/http.(*Request).write(0xc0001fe3c0, {0x9796a0, 0xc0002441c0}, 0x0, 0x0, 0x0)
net/http/request.go:771 +0xaed
net/http.(*persistConn).writeLoop(0xc00026e000)
net/http/transport.go:2522 +0x174
created by net/http.(*Transport).dialConn in goroutine 25752
net/http/transport.go:1875 +0x15a5
We reverted the version to 1.97.1 and the error disappeared.
Later on we started upgrades to 1.101.0 and 1.102.1.
It looks like the bug was introduced in 1.102.1 or between 1.101.0 and 1.102.1.
The trace from 1.102.1 looks as follows:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x2c pc=0x65cd44]
goroutine 2031993 [running]:
net/http.(*body).Read(0xc0001fca38?, {0xc000364000?, 0x47db1d?, 0xc000258f60?})
net/http/transfer.go:827 +0x44
main.(*readTrackingBody).Read(0xc000185b30, {0xc000364000, 0x63bfab?, 0x2000})
github.com/VictoriaMetrics/VictoriaMetrics/app/vmauth/main.go:599 +0x122
io.discard.ReadFrom({}, {0x7f378c9d54e8, 0xc000185b30})
io/io.go:666 +0x6d
io.copyBuffer({0x8d6040, 0xb6cec0}, {0x7f378c9d54e8, 0xc000185b30}, {0xc00029e000, 0x8000, 0x8000})
io/io.go:415 +0x151
io.CopyBuffer({0x8d6040?, 0xb6cec0?}, {0x7f378c9d54e8?, 0xc000185b30?}, {0xc00029e000?, 0x18?, 0xc000100008?})
io/io.go:402 +0x36
net/http.(*transferWriter).doBodyCopy(0xc000170fa0, {0x8d6040, 0xb6cec0}, {0x7f378c9d54e8, 0xc000185b30})
net/http/transfer.go:416 +0xe8
net/http.(*transferWriter).writeBody(0xc000170fa0, {0x8d5900, 0xc0001b1880})
net/http/transfer.go:376 +0x3ac
net/http.(*Request).write(0xc000298d80, {0x8d5900, 0xc0001b1880}, 0x0, 0x0, 0x0)
net/http/request.go:755 +0xb0d
net/http.(*persistConn).writeLoop(0xc000298ea0)
net/http/transport.go:2461 +0x18e
created by net/http.(*Transport).dialConn in goroutine 2031989
net/http/transport.go:1800 +0x1585
To Reproduce
Deploy vmauth v1.102.1 which is in front of vminsert and vmselect and run some decent load against the cluster - in our case it was around 1M datapoints per second and 100+ queries per second (vmauth was co-located with vminsert and vmselect as a sidecar and only the one with vminsert crashed) - we used https://github.com/VictoriaMetrics/prometheus-benchmark to generate the load.
NAME READY STATUS RESTARTS AGE
vminsert-5c58898557-467fj 4/4 Running 0 12h
vminsert-5c58898557-5t2sv 4/4 Running 1 (10h ago) 12h
vminsert-5c58898557-6cw2d 4/4 Running 1 (11h ago) 12h
vminsert-5c58898557-6znr6 4/4 Running 1 (11h ago) 12h
vminsert-5c58898557-72kx8 4/4 Running 0 12h
vminsert-5c58898557-7p9s7 4/4 Running 1 (3h49m ago) 12h
vminsert-5c58898557-d6wwc 4/4 Running 1 (133m ago) 12h
vminsert-5c58898557-g6prb 4/4 Running 1 (140m ago) 12h
vminsert-5c58898557-j8qcw 4/4 Running 0 12h
vminsert-5c58898557-jxdmz 4/4 Running 3 (3h30m ago) 12h
vminsert-5c58898557-ks7gb 4/4 Running 0 12h
vminsert-5c58898557-nz8tq 4/4 Running 3 (5h35m ago) 12h
vminsert-5c58898557-q768d 4/4 Running 1 (3h35m ago) 12h
vminsert-5c58898557-vgpt5 4/4 Running 1 (3h49m ago) 12h
vminsert-5c58898557-xhg78 4/4 Running 1 (12h ago) 12h
Version
~ $ /vmauth-prod --version
vmauth-20240801-125905-tags-v1.102.1-0-g996b623585
Logs
No response
Screenshots
Used command-line flags
No response
Additional information
No response