Skip to content

fix: clone label values to prevent buffer reuse-after-free#5116

Merged
simonswine merged 1 commit into
mainfrom
20260506_fix-label-values-use-after-free
May 6, 2026
Merged

fix: clone label values to prevent buffer reuse-after-free#5116
simonswine merged 1 commit into
mainfrom
20260506_fix-label-values-use-after-free

Conversation

@simonswine

@simonswine simonswine commented May 6, 2026

Copy link
Copy Markdown
Contributor

Reader.LabelValues (no-matcher path) returns yoloString aliases into the
pooled TSDB buffer. Once the dataset closes, that buffer is returned to
bufferpool and can be immediately reused by a concurrent request — overwriting
the strings while they are still held by the aggregator.

Fix by cloning each value with strings.Clone before the dataset is closed.

Adds Test_LabelValues_NoMatcher_BufferReuse: 20 goroutines × 20 iterations
of a no-matcher LabelValues query. Without the fix, concurrent buffer
recycling corrupts the returned strings (e.g. "test-app" becomes "codeInt64").
Run with -race to also trigger the detector on the underlying read/write race.


Note

Medium Risk
Touches a hot query path and changes memory/lifetime behavior to eliminate a potential data race from pooled buffer reuse; correctness improves but adds per-value allocations that could impact performance under high cardinality.

Overview
Fixes the no-matcher LabelValues fast path to clone returned strings (strings.Clone) so label values no longer alias a pooled TSDB/index buffer that may be recycled before the report is marshaled.

Adds a concurrent regression test (Test_LabelValues_NoMatcher_BufferReuse) that issues many parallel label-values queries and asserts stable results (and triggers -race without the fix) to guard against buffer reuse corruption/data races.

Reviewed by Cursor Bugbot for commit 4fc79da. Bugbot is set up for automated code reviews on this repo. Configure here.

Reader.LabelValues (no-matcher path) returns yoloString aliases into the
pooled TSDB buffer.  Once the dataset closes, that buffer is returned to
bufferpool and can be immediately reused by a concurrent request — overwriting
the strings while they are still held by the aggregator.

Fix by cloning each value with strings.Clone before the dataset is closed.

Adds Test_LabelValues_NoMatcher_BufferReuse: 20 goroutines × 20 iterations
of a no-matcher LabelValues query.  Without the fix, concurrent buffer
recycling corrupts the returned strings (e.g. "test-app" becomes "codeInt64").
Run with -race to also trigger the detector on the underlying read/write race.
@simonswine simonswine marked this pull request as ready for review May 6, 2026 14:30
@simonswine simonswine requested a review from marcsanmi as a code owner May 6, 2026 14:30
@simonswine simonswine merged commit 96b9dd7 into main May 6, 2026
33 checks passed
@simonswine simonswine deleted the 20260506_fix-label-values-use-after-free branch May 6, 2026 14:30
aleks-p added a commit that referenced this pull request May 7, 2026
…5122)

Reader.LabelValues (no-matcher path) returns yoloString aliases into the
pooled TSDB buffer.  Once the dataset closes, that buffer is returned to
bufferpool and can be immediately reused by a concurrent request — overwriting
the strings while they are still held by the aggregator.

Fix by cloning each value with strings.Clone before the dataset is closed.

Adds Test_LabelValues_NoMatcher_BufferReuse: 20 goroutines × 20 iterations
of a no-matcher LabelValues query.  Without the fix, concurrent buffer
recycling corrupts the returned strings (e.g. "test-app" becomes "codeInt64").
Run with -race to also trigger the detector on the underlying read/write race.

Co-authored-by: Christian Simon <simon@swine.de>
aleks-p added a commit that referenced this pull request May 7, 2026
…5121)

Reader.LabelValues (no-matcher path) returns yoloString aliases into the
pooled TSDB buffer.  Once the dataset closes, that buffer is returned to
bufferpool and can be immediately reused by a concurrent request — overwriting
the strings while they are still held by the aggregator.

Fix by cloning each value with strings.Clone before the dataset is closed.

Adds Test_LabelValues_NoMatcher_BufferReuse: 20 goroutines × 20 iterations
of a no-matcher LabelValues query.  Without the fix, concurrent buffer
recycling corrupts the returned strings (e.g. "test-app" becomes "codeInt64").
Run with -race to also trigger the detector on the underlying read/write race.

Co-authored-by: Christian Simon <simon@swine.de>
simonswine added a commit to simonswine/go that referenced this pull request May 8, 2026
Two correctness fixes:

1. GC safety (poolGuardCanMigrate):
   Only migrate slice backing arrays whose element type has no Go pointer
   fields (PtrBytes == 0).  Previously any slice field in a pooled struct
   was migrated to mmap pages.  If the element type contained Go pointers
   (e.g. []SomeStruct with pointer fields), those pointers became invisible
   to the GC after migration, causing premature collection and the runtime
   assertion "found pointer to free object".
   Pure-data slices ([]byte, []uint8, []uint64, []int64 …) have PtrBytes=0
   and are unaffected; they are still the common case for pooled byte buffers.

2. directOnly suppression frame index (poolGuardSuppressed):
   The direct Pool.Put caller sits at different indices depending on the
   code path through poolguard:
     • *Struct pools (via poolGuardPutSliceFields): pcs[0]=sync.(*Pool).Put,
       pcs[1]=actual caller.
     • []byte pools (direct from sync_poolGuardPut): pcs[0]=actual caller.
   The previous code always checked pcs[1], so "=" suppressions never fired
   for slice-type pools.  Fixed to check both pcs[0] and pcs[1], skipping
   any frame that starts with "sync.(*Pool)." (the Pool.Put implementation).

With both fixes the Pyroscope V2 integration test (TestMicroServicesIntegrationV2)
runs to completion with a precise 17-entry suppression list:

  POOLGUARD_SUPPRESS=\
  =connectrpc.com/connect.(*bufferPool).Put,\
  =fmt.(*pp).free,\
  =github.com/go-kit/log.logfmtLogger.Log,\
  =github.com/parquet-go/parquet-go.(*ColumnWriter).writeDataPage.deferwrap1,\
  =github.com/parquet-go/parquet-go/internal/memory.(*ChunkBuffer[go.shape.uint8]).Reset,\
  =github.com/parquet-go/parquet-go/internal/memory.putSliceToPool[go.shape.int64],\
  =github.com/parquet-go/parquet-go/internal/memory.putSliceToPool[go.shape.uint64],\
  =github.com/parquet-go/parquet-go/internal/memory.putSliceToPool[go.shape.uint8],\
  =github.com/parquet-go/parquet-go.(*memoryBufferPool).PutBuffer,\
  =github.com/parquet-go/parquet-go.putBufioReader,\
  =google.golang.org/protobuf/internal/filedesc.(*File).unmarshalFull.deferwrap1,\
  =google.golang.org/protobuf/internal/filedesc.(*File).unmarshalSeed.deferwrap1,\
  =net/http.putBufioReader,\
  =net/http.putBufioWriter,\
  =regexp.freeBitState,\
  =regexp.freeOnePassMachine

The Pyroscope bufferpool violation (grafana/pyroscope#5116) is detected
first and must be added to suppress the false positive before the test
passes; any regression in bufferpool would surface immediately.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
simonswine added a commit to simonswine/go that referenced this pull request May 8, 2026
Some packages intentionally access pooled data after Pool.Put within
their own carefully managed lifetimes (net/http's bufio.Writer pool,
parquet-go's memory pool, protobuf descriptor bytes).  Without a way to
suppress these, GODEBUG=poolguard=1 fires too many false positives on
startup before reaching user code.

POOLGUARD_SUPPRESS is a comma-separated list of fully-qualified package
path prefixes.  A Pool.Put whose call stack contains any frame whose
function name starts with a suppressed prefix is silently skipped —
the backing array is never moved to guarded pages.

  POOLGUARD_SUPPRESS=net/http,google.golang.org/protobuf

Matching is a simple string prefix against the full function name
(e.g. "net/http.putBufioWriter"), so:

  • "net/http"                   silences the bufio.Writer pool
  • "google.golang.org/protobuf" silences descriptor pools
  • "cmd/compile"                silences SSA sparse-set pools
  • "github.com/parquet-go"      silences parquet memory pools

With appropriate suppressions, GODEBUG=poolguard=1 catches the exact
Pyroscope bufferpool bug (grafana/pyroscope#5116) even in a sequential
single-goroutine test:

  WARNING: accessed pool buffer after Pool.Put
  Buffer was Put at:
    github.com/grafana/pyroscope/v2/pkg/util/bufferpool.Put
          .../bufferpool/pool.go:48
    github.com/grafana/pyroscope/v2/pkg/block.(*Object).closeErr
          .../block/object.go:254
    ...

Implementation: poolGuardSuppressPatterns is parsed from
POOLGUARD_SUPPRESS in poolguard.go's init().  poolGuardPutBacking
captures the call stack with callers() before any mmap work, then calls
poolGuardSuppressed() which walks the PCs and checks each frame's
package path against the pattern list.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
simonswine added a commit to simonswine/go that referenced this pull request May 8, 2026
Two correctness fixes:

1. GC safety (poolGuardCanMigrate):
   Only migrate slice backing arrays whose element type has no Go pointer
   fields (PtrBytes == 0).  Previously any slice field in a pooled struct
   was migrated to mmap pages.  If the element type contained Go pointers
   (e.g. []SomeStruct with pointer fields), those pointers became invisible
   to the GC after migration, causing premature collection and the runtime
   assertion "found pointer to free object".
   Pure-data slices ([]byte, []uint8, []uint64, []int64 …) have PtrBytes=0
   and are unaffected; they are still the common case for pooled byte buffers.

2. directOnly suppression frame index (poolGuardSuppressed):
   The direct Pool.Put caller sits at different indices depending on the
   code path through poolguard:
     • *Struct pools (via poolGuardPutSliceFields): pcs[0]=sync.(*Pool).Put,
       pcs[1]=actual caller.
     • []byte pools (direct from sync_poolGuardPut): pcs[0]=actual caller.
   The previous code always checked pcs[1], so "=" suppressions never fired
   for slice-type pools.  Fixed to check both pcs[0] and pcs[1], skipping
   any frame that starts with "sync.(*Pool)." (the Pool.Put implementation).

With both fixes the Pyroscope V2 integration test (TestMicroServicesIntegrationV2)
runs to completion with a precise 17-entry suppression list:

  POOLGUARD_SUPPRESS=\
  =connectrpc.com/connect.(*bufferPool).Put,\
  =fmt.(*pp).free,\
  =github.com/go-kit/log.logfmtLogger.Log,\
  =github.com/parquet-go/parquet-go.(*ColumnWriter).writeDataPage.deferwrap1,\
  =github.com/parquet-go/parquet-go/internal/memory.(*ChunkBuffer[go.shape.uint8]).Reset,\
  =github.com/parquet-go/parquet-go/internal/memory.putSliceToPool[go.shape.int64],\
  =github.com/parquet-go/parquet-go/internal/memory.putSliceToPool[go.shape.uint64],\
  =github.com/parquet-go/parquet-go/internal/memory.putSliceToPool[go.shape.uint8],\
  =github.com/parquet-go/parquet-go.(*memoryBufferPool).PutBuffer,\
  =github.com/parquet-go/parquet-go.putBufioReader,\
  =google.golang.org/protobuf/internal/filedesc.(*File).unmarshalFull.deferwrap1,\
  =google.golang.org/protobuf/internal/filedesc.(*File).unmarshalSeed.deferwrap1,\
  =net/http.putBufioReader,\
  =net/http.putBufioWriter,\
  =regexp.freeBitState,\
  =regexp.freeOnePassMachine

The Pyroscope bufferpool violation (grafana/pyroscope#5116) is detected
first and must be added to suppress the false positive before the test
passes; any regression in bufferpool would surface immediately.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants