Skip to content

feat(dataob): Implement SelectSamples #16251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Feb 13, 2025

Conversation

cyriltovena
Copy link
Contributor

This implements SelectSamples for dataobj.

I've also refactored a bit the tests and code. It now pool as much as possible readers.

@cyriltovena cyriltovena requested a review from a team as a code owner February 13, 2025 15:19
owen-d

This comment was marked as outdated.

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, but nothing major.

Comment on lines +215 to +228
// sectionIndex tracks the global section number across all objects to ensure consistent sharding
var sectionIndex uint64
shardedReaders := make([]*shardedObject, 0, len(objects))

for i, metadata := range metadatas {
var reader *shardedObject

for j := 0; j < metadata.LogsSections; j++ {
if shard.PowerOfTwo != nil && shard.PowerOfTwo.Of > 1 {
if sectionIndex%uint64(shard.PowerOfTwo.Of) != uint64(shard.PowerOfTwo.Shard) {
sectionIndex++
continue
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth mentioning this approach minimizes data locality by ensuring shards are spread across data objects as much as possible. This is probably fine for now, but I expect we'll want a way to do the opposite. E.g. if there are two objects with 2 sections each, a shard factor of 2 should assign all sections from the first object to the first shard and all sections from the second object to the second shard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree this is why we discussed that during this week meeting.

Consensus was, we get better spread by doing it on section and stream section are small.

noShard = logql.Shard{
PowerOfTwo: &index.ShardAnnotation{
Shard: uint32(1),
Of: uint32(1),
Copy link
Member

@owen-d owen-d Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention the shard field is [0..n)

streams map[int64]dataobj.Stream
}

func shardObjects(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func shardObjects(
// shardObjects only (currently) works with power-of-two shard factors, not the `bounded` strategy.
func shardObjects(

var reader *shardedObject

for j := 0; j < metadata.LogsSections; j++ {
if shard.PowerOfTwo != nil && shard.PowerOfTwo.Of > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can (eventually) extend the data objects in a way to work with shard.Match, which uses bit prefixes in uint64 fingerprints. Not something we should be worrying too much about right now though.

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to approved so you can merge after addressing comments

@cyriltovena cyriltovena merged commit 13a6c33 into grafana:main Feb 13, 2025
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants