Skip to content

fix(api-server): typed Responses input items + duplicated history on chained turns#21963

Open
WKHarmon wants to merge 2 commits into
NousResearch:mainfrom
WKHarmon:fix/responses-input-parsing
Open

fix(api-server): typed Responses input items + duplicated history on chained turns#21963
WKHarmon wants to merge 2 commits into
NousResearch:mainfrom
WKHarmon:fix/responses-input-parsing

Conversation

@WKHarmon
Copy link
Copy Markdown

@WKHarmon WKHarmon commented May 8, 2026

What does this PR do?

Fixes two related bugs in /v1/responses request parsing that cause Open WebUI's stateful Responses-mode multi-turn chats to corrupt their stored conversation_history.

Bug 1: Typed Responses input items get coerced into {role: "user", content: ""} messages

The Responses API spec allows input[] to contain typed items: {type: "function_call", ...}, {type: "function_call_output", ...}, {type: "reasoning", ...}, {type: "message", role: ..., content: ...}. Open WebUI forwards prior assistant turns as a sequence of these typed items when chaining.

The current parser at gateway/platforms/api_server.py:2025 treats every dict the same way:

elif isinstance(item, dict):
    role = item.get("role", "user")
    ...
    input_messages.append({"role": role, "content": content})

function_call / function_call_output items have no role, so they default to user with empty content. They become spurious user-shaped history entries, bloating context and making the agent re-address old user questions.

Bug 2: previous_response_id + inlined history → duplicated transcript

When previous_response_id is set, the server loads stored prior history from the response store (api_server.py:2081). It then unconditionally appends input_messages[:-1] to that loaded history. Open WebUI's Responses-mode connector sends both previous_response_id and re-inlines the entire prior transcript in input[]. Result: every chained turn duplicates every prior turn in stored history; long chains grow exponentially.

This is distinct from #18995 / #21185 ("avoid duplicated Responses history"). That fix runs at storage time and inspects result["messages"] returned by the agent. It does not catch the input-side duplication that happens before the agent runs.

Related Issue

No existing issue — first reproduction is described below. Happy to file a bug-report issue separately if maintainers prefer.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • gateway/platforms/api_server.py:
    • Bug 1 fix (lines 2018–2030): when iterating input[] dicts, skip items whose type is set to anything other than "message". Untyped role/content dicts (chat-style callers) still pass through unchanged.
    • Bug 2 fix (lines 2087–2102): track whether conversation_history was loaded from a prior source (body.conversation_history or previous_response_id). When it was, skip the for msg in input_messages[:-1]: conversation_history.append(msg) loop — those inlined items are a redundant client-side replay.
  • tests/gateway/test_api_server.py: three regression tests
    • test_responses_input_skips_function_call_items — Bug 1 coverage
    • test_previous_response_id_does_not_duplicate_inlined_history — Bug 2 coverage (previous_response_id path)
    • test_explicit_conversation_history_is_not_duplicated_by_input — Bug 2 coverage (body.conversation_history path)

How to Test

Reproduction (before fix)

Connect Open WebUI in Responses mode (api_configs[N].api_type = "responses") to a Hermes api_server endpoint. Send two turns:

  1. "What is in my home folder?" (triggers a tool call)
  2. "Which file is the largest?" (context-dependent follow-up)

Observe the second turn's stored conversation_history in ~/.hermes/response_store.db:

[0] user: "What is in my home folder?"
[1] user: ""                                       ← function_call item, coerced
[2] user: ""                                       ← function_call_output item, coerced
[3] assistant: "Your home folder is /home/kyle..."
[4] user: "Which file is the largest?"
[5] user: "What is in my home folder?"             ← DUPLICATE of [0]
[6] assistant: "Your home folder is..."            ← DUPLICATE of [3]
[7] user: "Which file is the largest?"             ← DUPLICATE of [4]
[8-10] current turn's tool flow

hist_len = 11 for what should be a 5-message chain.

After fix

Same two-turn sequence; second turn stores:

[0] user: "What is in my home folder..."
[1] assistant: "Your home folder is /home/kyle..."
[2] user: "Which file is the largest?"

hist_len = 3. Empty user messages gone. No duplication.

Test commands

pytest tests/gateway/test_api_server.py -q

141/141 pass on my checkout (138 existing + 3 new). The three new tests fail on main and pass after this PR.

Checklist

Code

Documentation & Housekeeping

  • No documentation changes needed — internal request-parsing fix
  • No config keys added/changed
  • No architecture changes
  • No cross-platform impact — pure Python request handler logic

WKHarmon added 2 commits May 8, 2026 09:21
When conversation_history is loaded from previous_response_id or
body.conversation_history, the input[] array's leading items are a
client-side replay of the same turns — appending them duplicates every
prior turn in stored history.

Open WebUI's Responses mode triggers this: it sends both
previous_response_id (which loads stored prior history) AND re-inlines
the entire prior transcript as typed message items in input[].  Without
this guard, every chained turn doubles conversation_history; long chains
grow exponentially.

This is distinct from NousResearch#18995 / NousResearch#21185 (which deduplicates result["messages"]
on the storage path).  That fix runs at storage time and inspects the
agent's returned transcript; this fix runs at request time and rejects
redundant inlined history before it ever reaches conversation_history.

Test: test_previous_response_id_does_not_duplicate_inlined_history,
      test_explicit_conversation_history_is_not_duplicated_by_input.
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists labels May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants