Evals not executing in real time in high throughput environment #10773
Replies: 2 comments 1 reply
-
|
Based on your queue metrics showing 106,893 waiting jobs with only 5 active, the bottleneck is in evaluation processing capacity rather than your proxy API latency alone(1). Scaling RecommendationsScale Worker Containers Monitor Queue Metrics Separate Ingestion and UI Increase S3 Write Concurrency Evaluation-Specific ConsiderationsYour 15-20s proxy API latency per evaluation call compounds with the queue backlog(2). With 106,893 waiting jobs and only 5 active workers, even fast evaluations would take considerable time to process(1). The combination of high ingestion throughput and evaluation workload requires scaling worker containers to process both ingestion events and evaluation jobs concurrently(1). 📚 Sources: Have another question? Just tag @inkeep. |
Beta Was this translation helpful? Give feedback.
-
|
Low CPU/memory usage with a large waiting queue indicates a concurrency bottleneck rather than resource constraints(1). Tune Worker ConcurrencyYour worker has resources available but isn't processing jobs in parallel. Adjust these environment variables to increase concurrent evaluation processing(1):
Lowering batch size or increasing concurrency can help clear the queue faster, though it may reduce throughput efficiency(1). Check Tokenizer Worker PoolThe worker uses a thread pool for tokenization managed by Monitor for Job StallsLow CPU with large queues can indicate stalled jobs(1). Use the BullMQ Admin API to check for stalled jobs in the evaluation-execution-queue(2). Stalled jobs may indicate insufficient concurrency even when CPU appears low(1). Scale Out vs Scale UpSince your single worker isn't saturating resources, running multiple worker containers with lower resources each can distribute the evaluation load more effectively than one large worker(3)(1). This allows parallel processing of evaluation jobs across instances. The 106,893 waiting jobs with only 5 active suggests the worker's concurrency settings are limiting parallel execution, not your hardware capacity(1). 📚 Sources:
Have another question? Just tag @inkeep. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe your question
I have a self hosted langfuse and relevant components. This is a very high throughput environment and already some measures have been taken for scaling to increase ingestion rate/clickhouse writes. We have already set LANGFUSE_SKIP_INGESTION_CLICKHOUSE_READ_MIN_PROJECT_CREATE_DATE to a date in the past before the project was created.
Currently I setup some evals using an internal proxy apis that allows us to connect to different LLM providers. The evals execute successfully but they don't evaluate in realtime. The proxy API does have some latency( one api call takes around 15-20s approx), so the evaluation that starts now will be successful after 2 hours.
I also checked the evaluation-execution-queue using Admin API
Can this performance be improved at langfuse level by any method? Or is this due to the latency of internal proxy API?
Langfuse Cloud or Self-Hosted?
Self-Hosted
If Self-Hosted
3.116.1
If Langfuse Cloud
No response
SDK and integration versions
No response
Pre-Submission Checklist
Beta Was this translation helpful? Give feedback.
All reactions