Skip to content

fix: response=0 chars regression in gateway after lazy session creation commit #18765

@drzeast-png

Description

@drzeast-png

Bug Description

After commit c5b4c4816 (fix: lazy session creation — defer DB row until first message (#18370)), the gateway agent occasionally returns response=0 chars — the agent completes a full run (many API calls, long elapsed time) but produces no output and sends nothing back to the user.

Symptoms

From gateway agent.log, entries like:

INFO gateway.run: response ready: platform=weixin chat=... time=943.0s api_calls=2 response=0 chars
INFO gateway.run: response ready: platform=weixin chat=... time=789.0s api_calls=10 response=0 chars
INFO gateway.run: response ready: platform=weixin chat=... time=227.6s api_calls=15 response=0 chars

The agent is clearly doing work (high api_calls count, long elapsed time) but returns nothing. This is a silent failure — no error is logged.

Frequency

Observed 4 times in ~24 hours before any upstream update on May 2. The issue predates the May 2 systemd unit update.

Suspected Root Cause

In run_agent.py, the _ensure_db_session() method added by c5b4c48 raises an exception that gets caught and logged, but the agent continues running with _session_db_created = False. The next message to the same session will retry — but the current run may proceed with a partially-initialized session state, causing the final response to be discarded.

The old code used ensure_session() (idempotent, INSERT OR IGNORE) in _flush_messages_or_raise, which never failed silently. The new code relies on _ensure_db_session() called at the top of run_conversation(), but when session row creation fails, the exception is caught and logged — yet the conversation loop continues, potentially completing but then returning no response.

Environment

  • Platform: macOS (WeChat gateway)
  • Hermes: latest from main (commit f98b5d0)
  • Python: 3.11
  • Config: gateway mode with WeChat adapter

Logs

2026-05-01 21:01:05,565 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=943.0s api_calls=2 response=0 chars
2026-05-02 09:13:05,677 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=5.2s api_calls=0 response=0 chars
2026-05-02 11:18:43,925 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=788.9s api_calls=10 response=0 chars
2026-05-02 16:14:12,097 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=227.6s api_calls=15 response=0 chars

Proposed Fix

The retry logic in _ensure_db_session() should not silently continue the conversation if session creation fails. Either:

  1. Make _ensure_db_session() raise instead of silently catching (and let the caller handle it), OR
  2. Fall back to the old ensure_session() call in _flush_messages_or_raise as a safety net, OR
  3. Add a success flag check at the end of run_conversation() and return an error response if the session was never created

Commit

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildercomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions