-
Notifications
You must be signed in to change notification settings - Fork 3.7k
feat(dataob): Implement SelectSamples #16251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dataob): Implement SelectSamples #16251
Conversation
…tenant and timerange
This commit adds new methods to the dataobj querier store to support: - Retrieving data objects for a specific time range - Parsing query shards - Simplifying error handling in metastore listing - Implement series, labelNames and labelsValues - Removing unimplemented placeholder methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, but nothing major.
// sectionIndex tracks the global section number across all objects to ensure consistent sharding | ||
var sectionIndex uint64 | ||
shardedReaders := make([]*shardedObject, 0, len(objects)) | ||
|
||
for i, metadata := range metadatas { | ||
var reader *shardedObject | ||
|
||
for j := 0; j < metadata.LogsSections; j++ { | ||
if shard.PowerOfTwo != nil && shard.PowerOfTwo.Of > 1 { | ||
if sectionIndex%uint64(shard.PowerOfTwo.Of) != uint64(shard.PowerOfTwo.Shard) { | ||
sectionIndex++ | ||
continue | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth mentioning this approach minimizes data locality by ensuring shards are spread across data objects as much as possible. This is probably fine for now, but I expect we'll want a way to do the opposite. E.g. if there are two objects with 2 sections each, a shard factor of 2 should assign all sections from the first object to the first shard and all sections from the second object to the second shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree this is why we discussed that during this week meeting.
Consensus was, we get better spread by doing it on section and stream section are small.
noShard = logql.Shard{ | ||
PowerOfTwo: &index.ShardAnnotation{ | ||
Shard: uint32(1), | ||
Of: uint32(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention the shard field is [0..n)
streams map[int64]dataobj.Stream | ||
} | ||
|
||
func shardObjects( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func shardObjects( | |
// shardObjects only (currently) works with power-of-two shard factors, not the `bounded` strategy. | |
func shardObjects( |
var reader *shardedObject | ||
|
||
for j := 0; j < metadata.LogsSections; j++ { | ||
if shard.PowerOfTwo != nil && shard.PowerOfTwo.Of > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can (eventually) extend the data objects in a way to work with shard.Match
, which uses bit prefixes in uint64
fingerprints. Not something we should be worrying too much about right now though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to approved so you can merge after addressing comments
Co-authored-by: Owen Diehl <[email protected]> Signed-off-by: Cyril Tovena <[email protected]>
This implements SelectSamples for dataobj.
I've also refactored a bit the tests and code. It now pool as much as possible readers.