Evals not executing in real time in high throughput environment #10773

rjayaram-godaddy · 2025-11-28T11:13:03Z

rjayaram-godaddy
Nov 28, 2025

Describe your question

I have a self hosted langfuse and relevant components. This is a very high throughput environment and already some measures have been taken for scaling to increase ingestion rate/clickhouse writes. We have already set LANGFUSE_SKIP_INGESTION_CLICKHOUSE_READ_MIN_PROJECT_CREATE_DATE to a date in the past before the project was created.

Currently I setup some evals using an internal proxy apis that allows us to connect to different LLM providers. The evals execute successfully but they don't evaluate in realtime. The proxy API does have some latency( one api call takes around 15-20s approx), so the evaluation that starts now will be successful after 2 hours.

I also checked the evaluation-execution-queue using Admin API

{
    "queueName": "evaluation-execution-queue",
    "jobCount": {
      "active": 5,
      "completed": 0,
      "delayed": 92,
      "failed": 0,
      "paused": 0,
      "prioritized": 0,
      "waiting": 106893,
      "waiting-children": 0
    }
  }

Can this performance be improved at langfuse level by any method? Or is this due to the latency of internal proxy API?

Langfuse Cloud or Self-Hosted?

Self-Hosted

If Self-Hosted

3.116.1

If Langfuse Cloud

No response

SDK and integration versions

No response

Pre-Submission Checklist

I have checked for existing issues/discussions and consulted Langfuse AI.

rjayaram-godaddy · 2025-11-28T11:13:39Z

inkeep-ai-assistant[bot]
bot Nov 28, 2025

Based on your queue metrics showing 106,893 waiting jobs with only 5 active, the bottleneck is in evaluation processing capacity rather than your proxy API latency alone(1).

Scaling Recommendations

Scale Worker Containers
The primary solution is to scale your Langfuse worker containers, which process evaluations(1). For high-throughput environments, scale workers based on CPU load - a load above 50% for a 2 CPU container indicates saturation and need for additional containers(1).

Monitor Queue Metrics
Langfuse publishes queue length metrics via statsd that can be used for scaling decisions(1). The langfuse.queue.ingestion.length metric is the main indicator for scaling(1). You can also publish these to AWS CloudWatch by setting ENABLE_AWS_CLOUDWATCH_METRIC_PUBLISHING=true to configure auto-scalers(1).

Separate Ingestion and UI
When ingestion load is high, split your langfuse-web deployment into separate instances - one handling ingestion and one for the user interface(1). Route traffic to /api/public/ingestion*, /api/public/media*, and /api/public/otel* to the ingestion-focused deployment(1).

Increase S3 Write Concurrency
If you see socket capacity warnings like @smithy/node-http-handler:WARN - socket usage at capacity=150, increase LANGFUSE_S3_CONCURRENT_WRITES above the default of 50(1). Each additional socket has small memory overhead, so increase gradually(1).

Evaluation-Specific Considerations

Your 15-20s proxy API latency per evaluation call compounds with the queue backlog(2). With 106,893 waiting jobs and only 5 active workers, even fast evaluations would take considerable time to process(1).

The combination of high ingestion throughput and evaluation workload requires scaling worker containers to process both ingestion events and evaluation jobs concurrently(1).

📚 Sources:

Have another question? Just tag @inkeep.

1 reply

rjayaram-godaddy Nov 28, 2025
Author

@inkeep
Langfuse worker has been configured with 8cpus and 16gb memory. Even during the time when evals were going into wait queue the cpu and memory metrics did not show any anomaly. Utilization was max 10-15%

2025-11-28T11:16:55Z

inkeep-ai-assistant[bot]
bot Nov 28, 2025

Low CPU/memory usage with a large waiting queue indicates a concurrency bottleneck rather than resource constraints(1).

Tune Worker Concurrency

Your worker has resources available but isn't processing jobs in parallel. Adjust these environment variables to increase concurrent evaluation processing(1):

LANGFUSE_INGESTION_QUEUE_PROCESSING_CONCURRENCY - Controls how many jobs are processed in parallel(1)
LANGFUSE_INGESTION_CLICKHOUSE_WRITE_BATCH_SIZE - Controls batch sizes per write(1)

Lowering batch size or increasing concurrency can help clear the queue faster, though it may reduce throughput efficiency(1).

Check Tokenizer Worker Pool

The worker uses a thread pool for tokenization managed by LANGFUSE_TOKEN_COUNT_WORKER_POOL_SIZE (default 2)(1). If set too high, pending requests can accumulate in memory without showing CPU load(1). Consider the current setting relative to your evaluation workload.

Monitor for Job Stalls

Low CPU with large queues can indicate stalled jobs(1). Use the BullMQ Admin API to check for stalled jobs in the evaluation-execution-queue(2). Stalled jobs may indicate insufficient concurrency even when CPU appears low(1).

Scale Out vs Scale Up

Since your single worker isn't saturating resources, running multiple worker containers with lower resources each can distribute the evaluation load more effectively than one large worker(3)(1). This allows parallel processing of evaluation jobs across instances.

The 106,893 waiting jobs with only 5 active suggests the worker's concurrency settings are limiting parallel execution, not your hardware capacity(1).

📚 Sources:

Have another question? Just tag @inkeep.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Evals not executing in real time in high throughput environment #10773

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

Evals not executing in real time in high throughput environment #10773

Uh oh!

rjayaram-godaddy Nov 28, 2025

Describe your question

Langfuse Cloud or Self-Hosted?

If Self-Hosted

If Langfuse Cloud

SDK and integration versions

Pre-Submission Checklist

Replies: 2 comments · 1 reply

Uh oh!

inkeep-ai-assistant[bot] bot Nov 28, 2025

Scaling Recommendations

Evaluation-Specific Considerations

Uh oh!

rjayaram-godaddy Nov 28, 2025 Author

Uh oh!

inkeep-ai-assistant[bot] bot Nov 28, 2025

Tune Worker Concurrency

Check Tokenizer Worker Pool

Monitor for Job Stalls

Scale Out vs Scale Up

rjayaram-godaddy
Nov 28, 2025

Replies: 2 comments 1 reply

inkeep-ai-assistant[bot]
bot Nov 28, 2025

rjayaram-godaddy Nov 28, 2025
Author

inkeep-ai-assistant[bot]
bot Nov 28, 2025