Skip to content

vmauth crashes - panic: runtime error: invalid memory address or nil pointer dereference #8051

@bweglarz

Description

@bweglarz

Describe the bug

We upgraded vmauth from 1.97.1 to 1.108.1 and when we increased the load on the cluster to 1M datapoints per second and 100+ queries per second we started seeing errors in the Request error rate chart in Victoria Metrics dashboard - see attached screenshot.

After some investigation it turned out that vmauth was crashing with the following stack trace:

[signal SIGSEGV: segmentation violation code=0x1 addr=0x2c pc=0x6b42e4]
goroutine 25756 [running]:
net/http.(*body).Read(0xc0002769d0?, {0xc000130000?, 0x0?, 0x1?})
	net/http/transfer.go:831 +0x44
main.(*readTrackingBody).Read(0xc0002b1630, {0xc000130000, 0x46c405?, 0x2000})
	[github.com/VictoriaMetrics/VictoriaMetrics/app/vmauth/main.go:622](http://github.com/VictoriaMetrics/VictoriaMetrics/app/vmauth/main.go:622) +0x122
io.discard.ReadFrom({}, {0x7f2d05c49d18, 0xc0002b1630})
	io/io.go:666 +0x6d
io.copyBuffer({0x97a080, 0xc210e0}, {0x7f2d05c49d18, 0xc0002b1630}, {0xc000184000, 0x8000, 0x8000})
	io/io.go:415 +0x151
io.CopyBuffer({0x97a080?, 0xc210e0?}, {0x7f2d05c49d18?, 0xc0002b1630?}, {0xc000184000?, 0xc40?, 0xc0000b6e08?})
	io/io.go:402 +0x36
net/http.(*transferWriter).doBodyCopy(0xc000166320, {0x97a080, 0xc210e0}, {0x7f2d05c49d18, 0xc0002b1630})
	net/http/transfer.go:416 +0xe5
net/http.(*transferWriter).writeBody(0xc000166320, {0x9796a0, 0xc0002441c0})
	net/http/transfer.go:376 +0x3a5
net/http.(*Request).write(0xc0001fe3c0, {0x9796a0, 0xc0002441c0}, 0x0, 0x0, 0x0)
	net/http/request.go:771 +0xaed
net/http.(*persistConn).writeLoop(0xc00026e000)
	net/http/transport.go:2522 +0x174
created by net/http.(*Transport).dialConn in goroutine 25752
	net/http/transport.go:1875 +0x15a5

We reverted the version to 1.97.1 and the error disappeared.
Later on we started upgrades to 1.101.0 and 1.102.1.
It looks like the bug was introduced in 1.102.1 or between 1.101.0 and 1.102.1.

The trace from 1.102.1 looks as follows:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x2c pc=0x65cd44]

goroutine 2031993 [running]:
net/http.(*body).Read(0xc0001fca38?, {0xc000364000?, 0x47db1d?, 0xc000258f60?})
	net/http/transfer.go:827 +0x44
main.(*readTrackingBody).Read(0xc000185b30, {0xc000364000, 0x63bfab?, 0x2000})
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmauth/main.go:599 +0x122
io.discard.ReadFrom({}, {0x7f378c9d54e8, 0xc000185b30})
	io/io.go:666 +0x6d
io.copyBuffer({0x8d6040, 0xb6cec0}, {0x7f378c9d54e8, 0xc000185b30}, {0xc00029e000, 0x8000, 0x8000})
	io/io.go:415 +0x151
io.CopyBuffer({0x8d6040?, 0xb6cec0?}, {0x7f378c9d54e8?, 0xc000185b30?}, {0xc00029e000?, 0x18?, 0xc000100008?})
	io/io.go:402 +0x36
net/http.(*transferWriter).doBodyCopy(0xc000170fa0, {0x8d6040, 0xb6cec0}, {0x7f378c9d54e8, 0xc000185b30})
	net/http/transfer.go:416 +0xe8
net/http.(*transferWriter).writeBody(0xc000170fa0, {0x8d5900, 0xc0001b1880})
	net/http/transfer.go:376 +0x3ac
net/http.(*Request).write(0xc000298d80, {0x8d5900, 0xc0001b1880}, 0x0, 0x0, 0x0)
	net/http/request.go:755 +0xb0d
net/http.(*persistConn).writeLoop(0xc000298ea0)
	net/http/transport.go:2461 +0x18e
created by net/http.(*Transport).dialConn in goroutine 2031989
	net/http/transport.go:1800 +0x1585

To Reproduce

Deploy vmauth v1.102.1 which is in front of vminsert and vmselect and run some decent load against the cluster - in our case it was around 1M datapoints per second and 100+ queries per second (vmauth was co-located with vminsert and vmselect as a sidecar and only the one with vminsert crashed) - we used https://github.com/VictoriaMetrics/prometheus-benchmark to generate the load.

NAME                        READY   STATUS    RESTARTS        AGE
vminsert-5c58898557-467fj   4/4     Running   0               12h
vminsert-5c58898557-5t2sv   4/4     Running   1 (10h ago)     12h
vminsert-5c58898557-6cw2d   4/4     Running   1 (11h ago)     12h
vminsert-5c58898557-6znr6   4/4     Running   1 (11h ago)     12h
vminsert-5c58898557-72kx8   4/4     Running   0               12h
vminsert-5c58898557-7p9s7   4/4     Running   1 (3h49m ago)   12h
vminsert-5c58898557-d6wwc   4/4     Running   1 (133m ago)    12h
vminsert-5c58898557-g6prb   4/4     Running   1 (140m ago)    12h
vminsert-5c58898557-j8qcw   4/4     Running   0               12h
vminsert-5c58898557-jxdmz   4/4     Running   3 (3h30m ago)   12h
vminsert-5c58898557-ks7gb   4/4     Running   0               12h
vminsert-5c58898557-nz8tq   4/4     Running   3 (5h35m ago)   12h
vminsert-5c58898557-q768d   4/4     Running   1 (3h35m ago)   12h
vminsert-5c58898557-vgpt5   4/4     Running   1 (3h49m ago)   12h
vminsert-5c58898557-xhg78   4/4     Running   1 (12h ago)     12h

Version

~ $ /vmauth-prod --version
vmauth-20240801-125905-tags-v1.102.1-0-g996b623585

Logs

No response

Screenshots

Image
Image

Used command-line flags

No response

Additional information

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingvmauth

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions