-
Notifications
You must be signed in to change notification settings - Fork 717
feat(v2): tenant-wide datasets #3885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # pkg/experiment/ingester/segment.go
aleks-p
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I left a few questions / minor remarks that are not blockers to move this forward.
pkg/experiment/block/writer.go
Outdated
| } | ||
| } | ||
| return toc, err | ||
| func (w *Writer) ReadFrom(r io.Reader) (n int64, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, is this technically WriteFromReader or similar? (I know it was already named this way, just noticing it now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you're referring to. The io.ReaderFrom interface is mainly implemented by types that are also io.Writer. And io.WriterTo interface is mainly implemented by types that are also io.Reader.
I've removed this bit either way – now that we do not use staging files, it is not super helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am saying it is strange for a method that does writing to be called ReadFrom, although I might be misunderstanding what it does. If this is a Go convention, then it is fine - though I would still find it strange :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely unrelated, but since we've already started this conversation:
ReaderFrom and WriterTo are primarily meant for io.Copy and very useful: they allow to delegate copying to the implementation in:
io.Copy(dst io.Writer, src io.Reader)https://cs.opensource.google/go/go/+/refs/tags/go1.24.0:src/io/io.go;l=407-416
This might be confusing at times.
For example, bytes.Buffer.ReaderFrom will actually grow your buffer by at least bytes.MinRead extra space, if you call ReadFrom, or copy to the buffer, and the source does not implement WriterTo (which would be called before dst.ReadFrom in io.Copy).
// MinRead is the minimum slice size passed to a [Buffer.Read] call by
// [Buffer.ReadFrom]. As long as the [Buffer] has at least MinRead bytes beyond
// what is required to hold the contents of r, [Buffer.ReadFrom] will not grow the
// underlying buffer.
const MinRead = 512https://cs.opensource.google/go/go/+/refs/tags/go1.24.0:src/bytes/buffer.go;l=196-206
This is quite a pain – when you allocate a buffer that you know has sufficient capacity (e.g., 8MB), but it gets reallocated to 8MB + 512 bytes. This is especially frustrating when you're optimizing for performance and using a buffer pool (since most of them rely on size classes).
This is one of the reasons why we have our own "yet-another" buffer pool:
https://github.com/grafana/pyroscope/blob/main/pkg/util/bufferpool/pool.go#L34
This trick helped to achieve quite good inuse/alloc ratio:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for explaining!
This trick helped to achieve quite good inuse/alloc ratio:
This seems like a useful "metric" that could be extracted from profiling data, e.g., for automated analysis / recommendations.
1ffc672 to
73cc609
Compare
|
Thank you for the review Aleks! |
ab53ba1 to
ba8c885
Compare


The PR adds dataset indices: a TSDB index structurally identical to the one we use for storing series indices, but with postings pointing to datasets within the block. This index helps serve queries that do not specify which dataset is being queried (we currently create datasets based on the
service_namelabel).The index is also used for TSDB queries that do not rely on the SeriesIndex: specifically,
LabelNames,LabelValue, andSeriesByAPIs. This is particularly useful for our old UX, where we need to generate suggestions for query input _ for example, listing all values for thepodlabel.The dataset index is written at the tenant level. To enable this, I introduce the concept of tenant-wide datasets, or anonymous datasets: data regions attributed to the tenant itself. This allows storing data that is intended to be accessed globally for the tenant (i.e., when no specific query expression is provided). I plan to leverage this approach to implement #3256.
The dataset index is only generated during compaction. While we could create it in the segment writer, I have decided to postpone this for now. If any issues arise, they should be easy to address – I've modified the way datasets are flushed into segments, so we now get a stream of "flushes" and can produce the index in the same way it is done in the compactors.
In the PR, I also optimized the way blocks are compacted, eliminating most of the staging files: as a result the overall job duration (and resource usage) has decreased:


The reason is quite obvious:
I experimented with direct writes to object storage, but the client we use does not handle them well. As a result, I had to revert the optimization – something we should explore further in the future.