Skip to content

Using Select() for both sample+metadata queries is unsuitable for remote storage implementations #4057

@juliusv

Description

@juliusv

The storage.Querier interface has a Select() method which can be used to retrieve either only metadata or bulk sample data of series. In the local storage case, this works nicely, as sample data is loaded lazily when accessed, and thus metadata-only queries never over-fetch sample data. This breaks down for remote storage implementations of the same interface, where we cannot afford to lazily fetch sample data only as it's accessed (that would require more remote round trips). We need some way of knowing beforehand (when we call Select()) whether a query is metadata-only or whether it actually needs sample data. Either a separate method or query parameter would work.

Context: Cortex reuses Prometheus's web API and PromQL packages to offer comparable functionality based on a different storage engine. In the past, we were able to solve this issue because the Prometheus web API packages allowed injecting two different queriers: one that would be used as part of the PromQL engine (the full sample querier) and one that would be used for metadata queries. This got broken in 7ccd4b3#diff-d81f5cda89ea7b129ba708b586c2bc83L132, due to a PromQL engine restructuring (the querier is not part of the engine anymore, it gets passed in for every query, and the web API only knows about one querier now, because we cannot encapsulate the other one in the engine anymore).

The issue about this on the Cortex side is cortexproject/cortex#787

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions