Skip to content

fix(memory): use plain OS threads for postgres operations to avoid nested runtime panic#4129

Merged
theonlyhennygod merged 1 commit intomasterfrom
fix/postgres-runtime-panic
Mar 21, 2026
Merged

fix(memory): use plain OS threads for postgres operations to avoid nested runtime panic#4129
theonlyhennygod merged 1 commit intomasterfrom
fix/postgres-runtime-panic

Conversation

@theonlyhennygod
Copy link
Copy Markdown
Collaborator

Summary

  • Replace tokio::task::spawn_blocking() with plain std::thread::Builder OS threads in all PostgresMemory trait method implementations (store, recall, get, list, forget, count, health_check)
  • Add a run_on_os_thread helper that spawns an OS thread and bridges the result back via tokio::sync::oneshot channel
  • The sync postgres crate internally calls Runtime::block_on(), which panics on Tokio blocking pool threads in recent Tokio versions; plain OS threads have no runtime context, avoiding the conflict

Motivation

Fixes #4101 -- the daemon panics on the first incoming request when using the memory-postgres backend with: "Cannot start a runtime from within a runtime."

The initialization in PostgresMemory::new() already correctly used std::thread::Builder (plain OS threads), which is why it worked. But all subsequent operations (store, recall, etc.) used tokio::task::spawn_blocking, whose threads are still associated with the Tokio runtime's blocking pool, triggering the panic.

Risk

Medium -- changes runtime threading for postgres memory operations only. No behavioral changes to query logic or data handling. The pattern matches the existing initialize_client() approach that was already proven to work.

Test plan

  • Existing unit tests pass (valid_identifiers_pass_validation, invalid_identifiers_are_rejected, parse_category_maps_known_and_custom_values, new_does_not_panic_inside_tokio_runtime)
  • Manual verification with a running PostgreSQL instance using memory-postgres backend
  • CI checks (format, clippy, test)

…sted runtime panic

Replace `tokio::task::spawn_blocking()` with plain `std::thread::Builder`
OS threads in all PostgresMemory trait methods. The sync `postgres` crate
(v0.19.x) internally calls `Runtime::block_on()`, which panics when called
from Tokio's blocking pool threads in recent Tokio versions. Plain OS threads
have no runtime context, so the nested `block_on` succeeds.

This matches the pattern already used in `PostgresMemory::initialize_client()`,
which correctly used `std::thread::Builder` and never exhibited this bug.

A new `run_on_os_thread` helper centralizes the pattern: spawn an OS thread,
run the closure, and bridge the result back via a `tokio::sync::oneshot` channel.

Fixes #4101
@theonlyhennygod theonlyhennygod merged commit 8d7e7e9 into master Mar 21, 2026
17 checks passed
@theonlyhennygod theonlyhennygod deleted the fix/postgres-runtime-panic branch March 21, 2026 10:04
glamberson pushed a commit to lamco-admin/zeroclaw that referenced this pull request Mar 24, 2026
…deadpool

Eliminates the run_on_os_thread workaround that spawned a fresh OS thread
per database operation. The sync `postgres` crate internally calls
Runtime::block_on(), which panics inside tokio. This was the root cause
of issues zeroclaw-labs#961, zeroclaw-labs#1055, zeroclaw-labs#4101 and the motivation for the thread-per-query
pattern added in zeroclaw-labs#4129.

Changes:
- Replace `postgres 0.19` with `tokio-postgres 0.7` (native async)
- Add `deadpool-postgres 0.14` for connection pooling (default: 4)
- Add `tokio-postgres-rustls 0.13` for TLS support
- Lazy initialization via OnceCell (matches Qdrant backend pattern)
- Add namespace, importance, superseded_by columns (parity with SQLite)
- Idempotent schema migration via ADD COLUMN IF NOT EXISTS
- Native store_with_metadata() and recall_namespaced() implementations
- Filter superseded_by IS NULL in recall/list (matching SQLite)
- Add pool_size and tls_mode to StorageProviderConfig

Implements Step 2 from upgrade-in-place plan (zeroclaw-labs#4028).
glamberson pushed a commit to lamco-admin/zeroclaw that referenced this pull request Mar 24, 2026
…deadpool

Eliminates the run_on_os_thread workaround that spawned a fresh OS thread
per database operation. The sync `postgres` crate internally calls
Runtime::block_on(), which panics inside tokio. This was the root cause
of issues zeroclaw-labs#961, zeroclaw-labs#1055, zeroclaw-labs#4101 and the motivation for the thread-per-query
pattern added in zeroclaw-labs#4129.

Changes:
- Replace `postgres 0.19` with `tokio-postgres 0.7` (native async)
- Add `deadpool-postgres 0.14` for connection pooling (default: 4)
- Add `tokio-postgres-rustls 0.13` for TLS support
- Lazy initialization via OnceCell (matches Qdrant backend pattern)
- Add namespace, importance, superseded_by columns (parity with SQLite)
- Idempotent schema migration via ADD COLUMN IF NOT EXISTS
- Native store_with_metadata() and recall_namespaced() implementations
- Filter superseded_by IS NULL in recall/list (matching SQLite)
- Add pool_size and tls_mode to StorageProviderConfig

Implements Step 2 from upgrade-in-place plan (zeroclaw-labs#4028).
webhive pushed a commit to webhive/zeroclaw that referenced this pull request Mar 24, 2026
…sted runtime panic (zeroclaw-labs#4129)

Replace `tokio::task::spawn_blocking()` with plain `std::thread::Builder`
OS threads in all PostgresMemory trait methods. The sync `postgres` crate
(v0.19.x) internally calls `Runtime::block_on()`, which panics when called
from Tokio's blocking pool threads in recent Tokio versions. Plain OS threads
have no runtime context, so the nested `block_on` succeeds.

This matches the pattern already used in `PostgresMemory::initialize_client()`,
which correctly used `std::thread::Builder` and never exhibited this bug.

A new `run_on_os_thread` helper centralizes the pattern: spawn an OS thread,
run the closure, and bridge the result back via a `tokio::sync::oneshot` channel.

Fixes zeroclaw-labs#4101
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant