Skip to content

cmd/agent/TestStartStopRace deadlocks #2601

@yurishkuro

Description

@yurishkuro

Describe the bug

The test deadlocks under travis-ci.com https://travis-ci.com/github/jaegertracing/jaeger/jobs/417602049#L486.

To Reproduce
Only seems to happen in Travis. The call to agent.Stop() deadlocks:

agent.Stop()

More specifically, processor.Stop() deadlocks:

func (s *ThriftProcessor) Stop() {
stopwatch := metrics.StartStopwatch(s.metrics.ProcessorCloseTimer)
s.server.Stop()
s.processing.Wait()
stopwatch.Stop()
}

It appears that the processor does not get a change to start in the goroutine before processor.Stop() is called from the test.

Panic dump:

=== RUN   TestStartStopRace
    logger.go:130: 2020-10-29T04:13:48.871Z	INFO	shutting down agent's HTTP server	{"addr": "[::]:5778"}
    logger.go:130: 2020-10-29T04:13:48.872Z	INFO	Starting jaeger-agent HTTP server	{"http-port": 5778}
    logger.go:130: 2020-10-29T04:13:48.873Z	INFO	agent's http server exiting
coverage: 59.1% of statements
panic: test timed out after 5m0s
goroutine 767 [running]:
testing.(*M).startAlarm.func1()
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/testing/testing.go:1618 +0x11f
created by time.goFunc
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/time/sleep.go:167 +0x52
goroutine 1 [chan receive, 4 minutes]:
testing.(*T).Run(0xc000162300, 0x12376ad, 0x11, 0x126dee0, 0x1)
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/testing/testing.go:1169 +0x5f4
testing.runTests.func1(0xc000162300)
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/testing/testing.go:1439 +0xa7
testing.tRunner(0xc000162300, 0xc00010fc78)
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/testing/testing.go:1123 +0x203
testing.runTests(0xc000164ee0, 0x18a7700, 0xc, 0xc, 0xbfdeae88f2495920, 0x45da9bd743, 0x18b4c80, 0xc00017bba8)
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/testing/testing.go:1437 +0x613
testing.(*M).Run(0xc0000fcc80, 0x0)
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/testing/testing.go:1345 +0x3b4
main.main()
	_testmain.go:119 +0x357
goroutine 99 [semacquire, 4 minutes]:
sync.runtime_Semacquire(0xc000207ca8)
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc000207ca0)
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/sync/waitgroup.go:130 +0xe5
github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).Stop(0xc000207c70)
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:109 +0x125
github.com/jaegertracing/jaeger/cmd/agent/app.(*Agent).Stop(0xc000201400)
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/agent.go:98 +0x37f
github.com/jaegertracing/jaeger/cmd/agent/app.TestStartStopRace(0xc0001b1b00)
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/agent_test.go:184 +0x6b7
testing.tRunner(0xc0001b1b00, 0x126dee0)
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/testing/testing.go:1123 +0x203
created by testing.(*T).Run
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/testing/testing.go:1168 +0x5bc
goroutine 90 [chan receive, 4 minutes]:
github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer(0xc000207c70)
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:116 +0x126
github.com/jaegertracing/jaeger/cmd/agent/app/processors.NewThriftProcessor.func2(0xc000207c70)
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:87 +0x39
created by github.com/jaegertracing/jaeger/cmd/agent/app/processors.NewThriftProcessor
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:86 +0x32f
goroutine 94 [runnable]:
github.com/jaegertracing/jaeger/cmd/agent/app/servers.NewTBufferedServer.func1(0xc0001d7b00, 0x0)
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/servers/tbuffered_server.go:73 +0x76
sync.(*Pool).Get(0xc0001d7b00, 0x1, 0xfde8)
	/home/travis/.gimme/versions/go1.15.3.linux.amd64/src/sync/pool.go:148 +0xe2
github.com/jaegertracing/jaeger/cmd/agent/app/servers.(*TBufferedServer).Serve(0xc0001a3050)
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/servers/tbuffered_server.go:91 +0x8d
github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).Serve(0xc000207c70)
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:96 +0x51
created by github.com/jaegertracing/jaeger/cmd/agent/app.(*Agent).Run
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/agent.go:76 +0x271
FAIL	github.com/jaegertracing/jaeger/cmd/agent/app	300.044s

Metadata

Metadata

Assignees

No one assigned

    Labels

    bughelp wantedFeatures that maintainers are willing to accept but do not have cycles to implement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions