Skip to content

Log sorting anomalies or missing logs in query_range #19133

@A-BenMao

Description

@A-BenMao

Describe the bug
When I use Grafana to view Loki logs, I found that in certain situations, Grafana displays incomplete logs.

For example, when I query logs from 0:00 to 0:30, if sorted by "newest first", it will show logs from 0:15 to 0:30 and logs from 0:00 to 0:10, which makes me mistakenly think there are no logs between 0:10 to 0:20 (but there actually are).

To Reproduce
Steps to reproduce the behavior:

  1. Started Loki (3.5.0)
  2. Application logs are collected by the vector process and sent to Kafka, then Alloy collects the Kafka data and finally sends it to Loki.
  3. Application are multiple similar apps (all called gas), but they have different tags, such as: gas1, gas2, etc.
  4. Query: {cluster="g1009", app="gas"} |= LuaServerPlayerCharacter:onHeroSpawnAvailable``
    Expected behavior
    All logs that meet the criteria are output sorted by time.

Environment:

  • Infrastructure: app-log --> vector ---> kafka ---> alloy --> loki
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output

Image Note: The Logs section jumps directly from 16:36 to 16:45, but from the `logs volume view`, there are logs around 16:40.

Then I selected "oldest first" and found there are still missing logs, or more accurately, the sorting is abnormal.

Image

loki config

limits_config:
    ingestion_rate_strategy: "local"
    ingestion_rate_mb: 50
    ingestion_burst_size_mb: 200
    max_label_name_length: 1024
    max_label_value_length: 2048
    max_label_names_per_series: 30
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    creation_grace_period: 10m
    discover_service_name: [app component]
    discover_log_levels: true
    log_level_fields: [loglevel level LEVEL Level]
    use_owned_stream_count: false
    max_streams_per_user: 0
    max_global_streams_per_user: 50000
    unordered_writes: true
    per_stream_rate_limit: 128MB
    per_stream_rate_limit_burst: 521MB
    max_chunks_per_query: 2000000
    max_query_series: 500
    max_query_length: 30d1h
    max_query_range: 0
    max_query_parallelism: 32
    cardinality_limit: 50000
    max_streams_matchers_per_query: 1000
    max_concurrent_tail_requests: 10
    max_entries_limit_per_query: 5000
    max_cache_freshness_per_query: 10m
    query_timeout: 300s
    split_queries_by_interval: 15m
    split_metadata_queries_by_interval: 12h
    split_instant_metric_queries_by_interval: 1h
    min_sharding_lookback: 0s
    deletion_mode: filter-and-delete
    retention_period: 360h  
    volume_enabled: true

Related analysis
Due to my configured split_queries_by_interval being 15m and Grafana's limits being set to 3000, a 30-minute log query will be split into two 15-minute segments. Since the first 15-minute segment doesn't reach the 3000 row limit, it continues to query the second 15-minute segment. However, for some unknown reason, the sorting of the logs in the second segment is not executed as expected.

Then I conducted another experiment: if there is only one log stream (in my example, only gas1), meaning all labels are the same with no unique labels, I found that this situation does not occur.

Then I captured the query requests sent from Grafana to Loki and found the requests as follows: http://lokiXXXXXXXX/loki/api/v1/query_range?direction=forward&end=1757062799000000000&limit=3000&query=%7Bcluster%3D%22g1009%22%2C+app%3D%22gas%22%7D+%7C%3D+%60LuaServerPlayerCharacter%3AonHeroSpawnAvailable%60&start=1757061000000000000&step=1000ms, with the corresponding results as follows:

Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions