Bug Description
NOTE: This is Matrix gateway related but your bug report form does not have a Matrix gateway option.
When using low-latency LLM models (I was using gemini-3.1-flash-lite-preview), the Matrix gateway would only respond with "response truncated due to output length limit," even when the actual content is well within the 4,000-character limit. This was me using the Element X client on android. This was really frustrating as I could not even self-diagnose from my phone while using Matrix and had to wait till I was back at my laptop to get access to hermes chat.
Steps to Reproduce
- Use a high-speed model (gemini-3.1-flash-lite-preview).
- Observe that the bot sends a message followed immediately by a redaction of system reactions.
- The gateway logs confirm successful delivery (sent event) followed by redacted, but the user interface/logs report a delivery error.
- Comment out the redaction, then try again. Message comes through fine.
Expected Behavior
I expected to receive a non-error response to all my messages regardless of length. I even got this error when sending a simple "test" message.
Actual Behavior
"response truncated due to output length limit"
Affected Component
Gateway (Telegram/Discord/Slack/WhatsApp), Other
Messaging Platform (if gateway-related)
No response
Debug Report
I am not comfortable outputting all of the content in there as some of those logs included PII from a brief scan.
I am a human writing this, this report is not clanker slop reporting.
Operating System
Ubuntu 24.04.4 LTS (Noble Numbat)
Python Version
3.12.3
Hermes Version
v2026.4.30-161-gf98b5d00a
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
This seems like it may be an architectural race condition in gateway/platforms/matrix.py. The MatrixAdapter performs automated cleanup (redacting 👀 processing reactions and bot-seeded approval ✅ ❌ buttons) in tight succession with the actual message delivery.
With high-speed models, the redaction request is reaching the Matrix homeserver before or immediately alongside the message delivery confirmation. This triggers a false-positive in the gateway's monitoring/tracking logic, where it interprets the sudden "event missing" state (due to redaction) as a failure or truncation of the primary message delivery.
Proposed Fix (optional)
Decouple redaction logic from the immediate message delivery/processing loop. Implementing a mandatory delay of say, 5-10 seconds before performing auto-redaction cleanup might allow the message delivery status to stabilize, preventing the race condition and the subsequent false-positive error reporting.
Are you willing to submit a PR for this?
Bug Description
NOTE: This is Matrix gateway related but your bug report form does not have a Matrix gateway option.
When using low-latency LLM models (I was using
gemini-3.1-flash-lite-preview), the Matrix gateway would only respond with "response truncated due to output length limit," even when the actual content is well within the 4,000-character limit. This was me using the Element X client on android. This was really frustrating as I could not even self-diagnose from my phone while using Matrix and had to wait till I was back at my laptop to get access to hermes chat.Steps to Reproduce
Expected Behavior
I expected to receive a non-error response to all my messages regardless of length. I even got this error when sending a simple "test" message.
Actual Behavior
"response truncated due to output length limit"
Affected Component
Gateway (Telegram/Discord/Slack/WhatsApp), Other
Messaging Platform (if gateway-related)
No response
Debug Report
I am not comfortable outputting all of the content in there as some of those logs included PII from a brief scan. I am a human writing this, this report is not clanker slop reporting.Operating System
Ubuntu 24.04.4 LTS (Noble Numbat)
Python Version
3.12.3
Hermes Version
v2026.4.30-161-gf98b5d00a
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
This seems like it may be an architectural race condition in
gateway/platforms/matrix.py. The MatrixAdapter performs automated cleanup (redacting 👀 processing reactions and bot-seeded approval ✅ ❌ buttons) in tight succession with the actual message delivery.With high-speed models, the redaction request is reaching the Matrix homeserver before or immediately alongside the message delivery confirmation. This triggers a false-positive in the gateway's monitoring/tracking logic, where it interprets the sudden "event missing" state (due to redaction) as a failure or truncation of the primary message delivery.
Proposed Fix (optional)
Decouple redaction logic from the immediate message delivery/processing loop. Implementing a mandatory delay of say, 5-10 seconds before performing auto-redaction cleanup might allow the message delivery status to stabilize, preventing the race condition and the subsequent false-positive error reporting.
Are you willing to submit a PR for this?