Skip to content

Warn when Redis pending timeout is shorter than worker request timeout (+buffer)#2181

Open
vinay0826 wants to merge 7 commits intosvix:mainfrom
vinay0826:redis-timeout-guardrail
Open

Warn when Redis pending timeout is shorter than worker request timeout (+buffer)#2181
vinay0826 wants to merge 7 commits intosvix:mainfrom
vinay0826:redis-timeout-guardrail

Conversation

@vinay0826
Copy link
Contributor

Context

In OSS/self-hosted setups, operators sometimes increase
worker_request_timeout to accommodate slow endpoints.

The Redis visibility timeout (redis_pending_duration_secs) is configured
separately and remains global.

If worker_request_timeout is increased for slow deliveries but
redis_pending_duration_secs is not adjusted accordingly, Redis may
re-queue a task while the original worker is still processing it.

That can result in overlapping deliveries of the same message.

Change

This PR adds a startup guardrail:

If using a Redis-based queue and:

redis_pending_duration_secs < worker_request_timeout + 5s

a tracing::warn! is emitted during startup.

Why a Warning (Not a Hard Error)

The configuration is technically valid and may be intentional.
This change does not modify runtime behavior.

It simply highlights a potentially unsafe configuration
that could lead to duplicate in-flight deliveries when
handling slow endpoints.

Scope

  • Applies only to Redis-based queue types
  • No behavior change
  • OSS/self-hosted safeguard only

@vinay0826 vinay0826 requested a review from a team as a code owner February 13, 2026 13:03
Copy link
Contributor

@svix-jbrown svix-jbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch; one style nit (I believe our style is to prefer matches!() over batch blocks for booleans)

I wonder if we could detect the equivalent problem for SQS and RabbitMQ?

Copy link
Contributor Author

@vinay0826 vinay0826 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rustfmt is keeping me humble today 😅.

Re: SQS/RabbitMQ — I took a look at the OSS backends. In this repo we only have Redis and RabbitMQ (no SQS here).

RabbitMQ doesn’t rely on visibility/pending timeouts — messages are only requeued on NACK or if the consumer/channel dies — so the “pending timeout < request timeout” mismatch doesn’t really apply there. That’s why I kept the guardrail Redis-only.

If SQS exists in Cloud/EE, then doing a similar check against the SQS visibility timeout vs worker_request_timeout would probably make sense on that side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants