Skip to content

feat: add exit node health checks and auto-failover#45

Open
mkarim1378 wants to merge 1 commit intomasterking32:python_testingfrom
mkarim1378:feature/exit-node-health-check
Open

feat: add exit node health checks and auto-failover#45
mkarim1378 wants to merge 1 commit intomasterking32:python_testingfrom
mkarim1378:feature/exit-node-health-check

Conversation

@mkarim1378
Copy link
Copy Markdown

@mkarim1378 mkarim1378 commented May 5, 2026

Summary

  • Adds a background health checker that periodically pings all configured
    exit node URLs and automatically fails over to the next healthy URL when
    one goes down, then restores the primary when it recovers.
  • Introduces a urls list in exit_node config for declaring fallback URLs.
  • Fully backward-compatible — existing single-URL configs work unchanged.

What changed

src/relay/domain_fronter.py

  • _exit_node_health_loop — background task, pings every health_check_interval seconds
  • _ping_exit_node — lightweight TCP+TLS GET, returns True if any HTTP response arrives
  • _record_exit_node_failure — tracks consecutive failures per URL; marks dead after threshold and triggers failover; skips increment if URL already in cooldown
  • _record_exit_node_success — clears failure state; restores _exit_node_url from all-down state; switches back to primary when it recovers
  • _try_exit_node_failover — switches active URL to next alive fallback; clears _exit_node_url to "" when all are down so _exit_node_matches returns False and traffic silently falls back to Apps Script
  • _build_exit_node_url_list — merges url + urls[] into deduped ordered list; promotes first urls[] entry if url is empty

config.example.json

  • Added urls, health_check_interval, health_check_failures_before_failover fields

README.md

  • Added Failover section under exit node docs with example config and field descriptions

src/core/constants.py

  • Version bumped 1.1.01.2.0

Behavior

Scenario Result
One URL goes down Failover to next URL after N failures
Primary recovers Automatic switch back to primary
All URLs down _exit_node_url cleared → traffic falls back to Apps Script silently
All-down + one recovers _exit_node_url restored, exit node re-enabled
Single URL (no fallback) Dead → falls back to Apps Script; recovers automatically via health loop
Existing config (no urls) No behavior change, fully backward-compatible

Test plan

  • exit_node.enabled: false — no health task started, no matching
  • Single URL: dead after N failures, recovers after success ping
  • Multi URL: failover on primary failure, switch back on primary recovery
  • All-down: traffic bypasses exit node, restores after recovery
  • urls-only config (empty url field): first entry promoted as primary
  • health_check_interval and health_check_failures_before_failover values respected
  • No double-increment past cooldown threshold

Monitors all configured exit node URLs in the background and
automatically switches to the next healthy URL when one goes down,
then switches back to primary when it recovers.

- Add _exit_node_health_loop: background task that pings all exit
  node URLs every health_check_interval seconds (default 30s)
- Add _ping_exit_node: lightweight GET to detect reachability
- Add _record_exit_node_failure / _record_exit_node_success:
  track consecutive failures per URL with cooldown (2x interval)
- Add _try_exit_node_failover: switches _exit_node_url to next
  alive URL; clears it to "" when all are down so traffic silently
  falls back to Apps Script
- Add _record_exit_node_success recovery: restores _exit_node_url
  from all-down state and switches back to primary when it recovers
- Support urls[] list in exit_node config for fallback URLs
- Promote first entry of urls[] if url field is empty
- Bump version to 1.2.0

Config additions (all optional, backward-compatible):
  exit_node.urls                          — fallback URL list
  exit_node.health_check_interval         — default 30s (min 10s)
  exit_node.health_check_failures_before_failover — default 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant