Skip to content

Conversation

@siavashs
Copy link
Contributor

The alert store workload is read-heavy, so use RWMutex to allow reads and write in parallel.

Store usage analysis:

  • provider/mem/mem.go

    • Reads
      • List() x 4 (Subscribe, SlurpAndSubscribe, GetPending, countByState)
      • Get() x 2
    • Writes
      • Set() x 1
      • GC() via gcLoop
  • dispatch/dispatch.go (aggrGroup)

    • Reads

      • List() x 2
      • Get() x 1
      • Empty() x 1
    • Writes

      • Set() x 1
      • DeleteIfNotModified() x 1
  • inhibit/inhibit.go

    • Reads
      • Get() x 2
    • Writes
      • Set() x 1

The alert store workload is read-heavy, so use RWMutex to allow reads
and write in parallel.

Store usage analysis:

provider/mem/mem.go:
Reads:
- List() x 4 (Subscribe, SlurpAndSubscribe, GetPending, countByState)
- Get() x 2

Writes:
- Set() x 1
- GC() via gcLoop

dispatch/dispatch.go (aggrGroup):
Reads:
- List() x 2
- Get() x 1
- Empty()x 1

Writes:
- Set() x 1
- DeleteIfNotModified()x 1

inhibit/inhibit.go:
Reads:
- Get() x 2

Writes:
- Set() x 1

Signed-off-by: Siavash Safi <[email protected]>
Copy link
Contributor

@Spaceman1701 Spaceman1701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that this will improve performance, but I'm not sure. RWMutex can be slower when there's a lot of write contention, which could happen in the alert ingestion flow.

Do you have any benchmarks or profiles of this change?

@siavashs
Copy link
Contributor Author

siavashs commented Dec 12, 2025

My intuition is that this will improve performance, but I'm not sure. RWMutex can be slower when there's a lot of write contention, which could happen in the alert ingestion flow.

Do you have any benchmarks or profiles of this change?

Generated some benchmarks:

goos: darwin
goarch: arm64
pkg: github.com/prometheus/alertmanager/store
cpu: Apple M3 Pro
                              │  mutex.txt  │             rwmutex.txt              │
                              │   sec/opsec/op      vs baseGet-12                          5.865n ± 1%    5.565n ±  1%   -5.12% (p=0.001 n=7)
Set-12                          102.2n ± 2%    103.6n ±  1%   +1.37% (p=0.026 n=7)
List/100_alerts-12              172.0n ± 3%    169.6n ±  4%        ~ (p=0.090 n=7)
List/1000_alerts-12             174.2n ± 5%    170.5n ±  5%        ~ (p=0.710 n=7)
List/10000_alerts-12            173.1n ± 7%    172.6n ±  5%        ~ (p=0.557 n=7)
ConcurrentReads-12              76.64n ± 1%   130.30n ± 40%  +70.02% (p=0.001 n=7)
ConcurrentWrites-12             258.9n ± 4%    251.6n ± 10%   -2.82% (p=0.026 n=7)
ConcurrentList/100_alerts-12    286.3n ± 1%    146.0n ± 27%  -49.00% (p=0.001 n=7)
ConcurrentList/1000_alerts-12   286.0n ± 1%    199.4n ± 28%  -30.28% (p=0.001 n=7)
MixedReadWrite/50pct_reads-12   169.6n ± 2%    147.9n ±  3%  -12.79% (p=0.001 n=7)
MixedReadWrite/90pct_reads-12   92.64n ± 1%    57.49n ±  2%  -37.94% (p=0.001 n=7)
MixedReadWrite/99pct_reads-12   78.78n ± 1%    37.83n ± 11%  -51.98% (p=0.001 n=7)
geomean                         116.5n         98.97n        -15.05%

Interestingly concurrent reads are 70% slower on a read only workload.
Concurrent writes are a little bit faster.
The rest of concurrent and mixed workloads seem quite faster.

Benchmark code
// Benchmarks for comparing Mutex vs RWMutex performance.

func newBenchmarkAlert(i int) *types.Alert {
	return &types.Alert{
		Alert: model.Alert{
			Labels: model.LabelSet{
				"alertname": model.LabelValue("test"),
				"instance":  model.LabelValue(string(rune('a' + i%26))),
			},
		},
		UpdatedAt: time.Now(),
	}
}

// BenchmarkGet measures single-threaded Get performance.
func BenchmarkGet(b *testing.B) {
	s := NewAlerts()
	// Pre-populate with alerts.
	for i := range 1000 {
		_ = s.Set(newBenchmarkAlert(i))
	}
	alert := newBenchmarkAlert(500)
	fp := alert.Fingerprint()

	b.ResetTimer()
	for range b.N {
		_, _ = s.Get(fp)
	}
}

// BenchmarkSet measures single-threaded Set performance.
func BenchmarkSet(b *testing.B) {
	s := NewAlerts()
	alert := newBenchmarkAlert(0)

	b.ResetTimer()
	for range b.N {
		_ = s.Set(alert)
	}
}

// BenchmarkList measures single-threaded List performance.
func BenchmarkList(b *testing.B) {
	for _, size := range []int{100, 1000, 10000} {
		b.Run(fmt.Sprintf("%d_alerts", size), func(b *testing.B) {
			s := NewAlerts()
			for i := range size {
				_ = s.Set(newBenchmarkAlert(i))
			}

			b.ResetTimer()
			for range b.N {
				_ = s.List()
			}
		})
	}
}

// BenchmarkConcurrentReads measures concurrent read performance.
// This is where RWMutex should outperform Mutex.
func BenchmarkConcurrentReads(b *testing.B) {
	s := NewAlerts()
	for i := range 1000 {
		_ = s.Set(newBenchmarkAlert(i))
	}
	alert := newBenchmarkAlert(500)
	fp := alert.Fingerprint()

	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			_, _ = s.Get(fp)
		}
	})
}

// BenchmarkConcurrentWrites measures concurrent write performance.
func BenchmarkConcurrentWrites(b *testing.B) {
	s := NewAlerts()
	alerts := make([]*types.Alert, 100)
	for i := range alerts {
		alerts[i] = newBenchmarkAlert(i)
	}

	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		i := 0
		for pb.Next() {
			_ = s.Set(alerts[i%len(alerts)])
			i++
		}
	})
}

// BenchmarkConcurrentList measures concurrent List performance.
func BenchmarkConcurrentList(b *testing.B) {
	for _, size := range []int{100, 1000} {
		b.Run(fmt.Sprintf("%d_alerts", size), func(b *testing.B) {
			s := NewAlerts()
			for i := range size {
				_ = s.Set(newBenchmarkAlert(i))
			}

			b.ResetTimer()
			b.RunParallel(func(pb *testing.PB) {
				for pb.Next() {
					_ = s.List()
				}
			})
		})
	}
}

// BenchmarkMixedReadWrite simulates realistic workloads with different read/write ratios.
func BenchmarkMixedReadWrite(b *testing.B) {
	for _, readPct := range []int{50, 90, 99} {
		b.Run(fmt.Sprintf("%dpct_reads", readPct), func(b *testing.B) {
			s := NewAlerts()
			for i := range 1000 {
				_ = s.Set(newBenchmarkAlert(i))
			}
			alert := newBenchmarkAlert(500)
			fp := alert.Fingerprint()

			b.ResetTimer()
			b.RunParallel(func(pb *testing.PB) {
				i := 0
				for pb.Next() {
					if i%100 < readPct {
						_, _ = s.Get(fp)
					} else {
						_ = s.Set(alert)
					}
					i++
				}
			})
		})
	}
}

@siavashs
Copy link
Contributor Author

siavashs commented Dec 12, 2025

After further analysis, it seems ConcurrentReads is slower since the overhead of RWMutex RLock/RUnlock is more than the map lookup that is happening. The high variance of 40% points to scheduler affecting the performance when many routines are fighting for the lock.

We can switch Get() to use Lock/Unlock instead of RLock/RUnlock but in real world scenario really depends how much other components hit Get() in parallel.

@Spaceman1701
Copy link
Contributor

That's an interesting result!

I guess using RLock vs Lock in Get also depends on how much contention is caused by calls to Get - e.g. is it work making each call to Get 50% slower but allowing List to happen at the same time...

As far as how to move forward here, I don't know for sure. My concern is that we could accidentally introduce a performance regression without realizing it. I think we need some "real world" data before we can be confident that this helps more than in hurts.

@siavashs
Copy link
Contributor Author

siavashs commented Dec 12, 2025

I can test this in a local lab environment next week, and in production next year with the canary instance and compare performance.

@Spaceman1701
Copy link
Contributor

I can test this in a local lab environment next week, and in production next year with the canary instance and compare performance.

That'd make me feel a lot more comfortable with the change. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants