Skip to content

Commit ea18a1c

Browse files
committed
runtime/pprof: avoid crash due to truncated stack traces
The profile proto message builder maintains a location entry cache that maps a location (possibly involving multiple user frames that represent inlined function calls) to the location id. We have been using the first pc of the inlined call sequence as the key of the cached location entry assuming that, for a given pc, the sequence of frames representing the inlined call stack is deterministic and stable. Then, when analyzing the new stack trace, we expected the exact number of pcs to be present in the captured stack trace upon the cache hit. This assumption does not hold, however, in the presence of the stack trace truncation in the runtime during profiling, and also with the potential bugs in runtime. A better fix is to use all the pcs of the inlined call sequece as the key instead of the first pc. But that is a bigger code change. This CL avoids the crash assuming the trace was truncated. Fixes #35538 Change-Id: I8c6bae98bc8b178ee51523c7316f56b1cce6df16 Reviewed-on: https://go-review.googlesource.com/c/go/+/207609 Run-TryBot: Hyang-Ah Hana Kim <[email protected]> Reviewed-by: Keith Randall <[email protected]>
1 parent 95be9b7 commit ea18a1c

File tree

2 files changed

+43
-1
lines changed

2 files changed

+43
-1
lines changed

src/runtime/pprof/pprof_test.go

+26
Original file line numberDiff line numberDiff line change
@@ -1183,6 +1183,32 @@ func TestTryAdd(t *testing.T) {
11831183
{Value: []int64{30, 30 * period}, Location: []*profile.Location{{ID: 1}, {ID: 1}}},
11841184
{Value: []int64{40, 40 * period}, Location: []*profile.Location{{ID: 1}}},
11851185
},
1186+
}, {
1187+
name: "truncated_stack_trace_later",
1188+
input: []uint64{
1189+
3, 0, 500, // hz = 500. Must match the period.
1190+
5, 0, 50, inlinedCalleePtr, inlinedCallerPtr,
1191+
4, 0, 60, inlinedCalleePtr,
1192+
},
1193+
wantLocs: [][]string{{"runtime/pprof.inlinedCallee", "runtime/pprof.inlinedCaller"}},
1194+
wantSamples: []*profile.Sample{
1195+
{Value: []int64{50, 50 * period}, Location: []*profile.Location{{ID: 1}}},
1196+
{Value: []int64{60, 60 * period}, Location: []*profile.Location{{ID: 1}}},
1197+
},
1198+
}, {
1199+
name: "truncated_stack_trace_first",
1200+
input: []uint64{
1201+
3, 0, 500, // hz = 500. Must match the period.
1202+
4, 0, 70, inlinedCalleePtr,
1203+
5, 0, 80, inlinedCalleePtr, inlinedCallerPtr,
1204+
},
1205+
wantLocs: [][]string{ // the inline info is screwed up, but better than a crash.
1206+
{"runtime/pprof.inlinedCallee"},
1207+
{"runtime/pprof.inlinedCaller"}},
1208+
wantSamples: []*profile.Sample{
1209+
{Value: []int64{70, 70 * period}, Location: []*profile.Location{{ID: 1}}},
1210+
{Value: []int64{80, 80 * period}, Location: []*profile.Location{{ID: 1}, {ID: 2}}},
1211+
},
11861212
}}
11871213

11881214
for _, tc := range testCases {

src/runtime/pprof/proto.go

+17-1
Original file line numberDiff line numberDiff line change
@@ -394,7 +394,23 @@ func (b *profileBuilder) appendLocsForStack(locs []uint64, stk []uintptr) (newLo
394394

395395
// then, record the cached location.
396396
locs = append(locs, l.id)
397-
stk = stk[len(l.pcs):] // skip the matching pcs.
397+
398+
// The stk may be truncated due to the stack depth limit
399+
// (e.g. See maxStack and maxCPUProfStack in runtime) or
400+
// bugs in runtime. Avoid the crash in either case.
401+
// TODO(hyangah): The correct fix may require using the exact
402+
// pcs as the key for b.locs cache management instead of just
403+
// relying on the very first pc. We are late in the go1.14 dev
404+
// cycle, so this is a workaround with little code change.
405+
if len(l.pcs) > len(stk) {
406+
stk = nil
407+
// TODO(hyangah): would be nice if we can enable
408+
// debug print out on demand and report the problematic
409+
// cached location entry and stack traces. Do we already
410+
// have such facility to utilize (e.g. GODEBUG)?
411+
} else {
412+
stk = stk[len(l.pcs):] // skip the matching pcs.
413+
}
398414
continue
399415
}
400416

0 commit comments

Comments
 (0)