Skip to content

fix: sentinel value collision in FailoverProvider cooldown #125

@ilblackdragon

Description

@ilblackdragon

Summary

ProviderCooldown uses cooldown_activated_nanos == 0 as the sentinel for "not in cooldown", but now_nanos() can theoretically return 0 if called immediately after FailoverProvider construction (since epoch is set to Instant::now() in the constructor).

If activate_cooldown(0) were ever called, the cooldown would silently not take effect because is_in_cooldown() checks activated != 0 first.

Impact

Low in practice — there's always at least an HTTP round-trip between epoch creation and the first failure recording, so now_nanos() will never actually return 0 in production. But using a valid timestamp as a sentinel is fragile and could bite in tests or future refactors.

Suggested fix

One-line change in ProviderCooldown::activate_cooldown:

fn activate_cooldown(&self, now_nanos: u64) {
    // Use max(1, ..) so 0 remains a safe "not in cooldown" sentinel.
    self.cooldown_activated_nanos
        .store(now_nanos.max(1), Ordering::Relaxed);
}

Location

src/llm/failover.rsProviderCooldown::activate_cooldown method

Context

Found during review of #114.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions