Skip to content

[BUG] output_pydantic / response_model leaks into agent tool-calling loop, causing tools to be skipped on non-OpenAI LLMs #5472

@dnivra26

Description

@dnivra26

Description

Since v1.9.0, Task.output_pydantic is mapped to response_model and passed into the agent executor's tool-calling loop. This causes the LLM to receive both tools and response_format simultaneously on every iteration. For LLMs served via vLLM (and likely other OpenAI-compatible servers), this results in the model skipping tool calls entirely and returning structured JSON directly — tools never execute.

Prior to v1.9.0, output_pydantic was applied as a post-processing step via Task._export_output() / converter.py, which allowed tools to run freely before formatting the final answer. The v1.9.0 change ("Structured outputs and response_format support across providers") regressed this behavior.

Root Cause

In agent/core.py, create_agent_executor (line ~869) and _update_executor_parameters (line ~898) set:

response_model = task.response_model if task else None

This response_model is then passed to get_llm_response() on every iteration of the tool-calling loop in:

  • CrewAgentExecutor._invoke_loop_react() (crew_agent_executor.py:337-347)
  • CrewAgentExecutor._invoke_loop_native_tools() (crew_agent_executor.py:498-510)
  • AgentExecutor.call_llm_and_parse() (agent_executor.py:361-371)
  • AgentExecutor.call_llm_native_tools() (agent_executor.py:439-451)

When the LLM returns a valid BaseModel, the loop immediately exits via AgentFinish without ever parsing for tool calls.

Additionally, for LiteLLM-backed LLMs (is_litellm=True), the _handle_non_streaming_response() method (llm.py:1111-1160) routes directly to InternalInstructor when response_model is set — this makes a completely separate LLM call with no tools at all, bypassing the tool-calling mechanism entirely.

Steps to Reproduce

  1. Create a CrewAI Agent with tools and a non-OpenAI LLM (e.g., vLLM-served model)
  2. Create a Task with output_pydantic=MyOutput (any Pydantic model)
  3. Run the crew
  4. Observe that tool _run() methods are never called
  5. The agent returns structured JSON from the LLM's own knowledge instead of using tools

Minimal CrewAI reproduction

from crewai import Agent, Crew, Process, Task
from crewai.tools import BaseTool
from pydantic import BaseModel, Field

class AddInput(BaseModel):
    a: float = Field(..., description="First number")
    b: float = Field(..., description="Second number")

class AddTool(BaseTool):
    name: str = "Add Numbers"
    description: str = "Adds two numbers."
    args_schema = AddInput
    def _run(self, a: float, b: float) -> str:
        print(f"[AddTool] {a} + {b} = {a + b}")  # Never prints
        return str(a + b)

class MyOutput(BaseModel):
    answer: str = Field(..., description="The answer")

agent = Agent(
    role="Helper",
    goal="Use tools to answer questions.",
    backstory="You must use tools.",
    tools=[AddTool()],
    llm=llm,  # Any vLLM-served model
    verbose=True,
)

task = Task(
    description="What is 42 + 58? Use the Add Numbers tool.",
    expected_output="The sum.",
    agent=agent,
    output_pydantic=MyOutput,  # This causes tools to be skipped
)

crew = Crew(agents=[agent], tasks=[task], process=Process.sequential, verbose=True)
result = crew.kickoff()
# AddTool._run() is NEVER called
# agent.tools_results is empty

Remove output_pydantic=MyOutput and tools work correctly.

Expected behavior

Task.output_pydantic should be applied as a post-processing step after the tool loop completes, not during it. The agent should:

  1. Run the tool-calling loop with tools only (no response_model / response_format)
  2. Get the raw text result from the agent
  3. Convert the raw result to the Pydantic model via Task._export_output() / converter.py (which already exists and works)

This was the behavior prior to v1.9.0 and tools were invoked correctly.

Screenshots/Code snippets

Verified code paths in CrewAI source (v0.2.149+)

1. response_model assignment in agent/core.py:

Line 869 (create_agent_executor):

self.agent_executor = self.executor_class(
    llm=cast(BaseLLM, self.llm),
    task=task,
    ...
    response_model=task.response_model if task else None,
)

Line 898 (_update_executor_parameters):

self.agent_executor.response_model = task.response_model if task else None

2. response_model passed on every loop iteration in crew_agent_executor.py:

_invoke_loop_react() (line ~344):

answer = get_llm_response(
    llm=self.llm,
    messages=self.messages,
    ...
    response_model=self.response_model,  # Passed every iteration
)

_invoke_loop_native_tools() (line ~507):

answer = get_llm_response(
    llm=self.llm,
    messages=self.messages,
    ...
    response_model=self.response_model,  # Passed every iteration
)

3. LiteLLM bypass in llm.py _handle_non_streaming_response() (line ~1134):

if response_model and self.is_litellm:
    from crewai.utilities.internal_instructor import InternalInstructor
    # ... makes separate LLM call with NO tools at all
    instructor_instance = InternalInstructor(
        content=combined_content,
        model=response_model,
        llm=self,
    )
    result = instructor_instance.to_pydantic()
    return structured_response  # RETURNS EARLY - BYPASSES TOOLS

4. Gemini already has a workaround for this exact conflict (gemini/completion.py:37):

STRUCTURED_OUTPUT_TOOL_NAME = "structured_output"
# When both tools and response_model are present, injects a pseudo-tool

curl proof (no CrewAI needed)

# WITH response_format — tools SKIPPED
curl -s "$VLLM_ENDPOINT/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "Qwen3.5-27B",
    "messages": [
      {"role": "system", "content": "You MUST use tools to answer."},
      {"role": "user", "content": "What is 42 + 58?"}
    ],
    "tools": [{"type": "function", "function": {"name": "add", "parameters": {"type": "object", "properties": {"a": {"type": "number"}, "b": {"type": "number"}}}}}],
    "response_format": {"type": "json_object"},
    "temperature": 0.1
  }'
# Result: {"result": 100}, tool_calls: []  <-- tools skipped

# WITHOUT response_format — tools CALLED correctly

Operating System

macOS Sonoma

Python Version

3.12

crewAI Version

1.14.1 (confirmed), affects v1.9.0+

crewAI Tools Version

Latest

Virtual Environment

Venv

Evidence

Affected Providers

Any LLM provider where tools and response_format conflict:

  • vLLM (confirmed) — with tool_choice: "auto", response_format takes priority over tools
  • Likely any OpenAI-compatible server that doesn't natively support both simultaneously

Note: CrewAI already has a provider-specific workaround for Gemini via STRUCTURED_OUTPUT_TOOL_NAME in gemini/completion.py, acknowledging this exact conflict exists for other providers too.

Evidence from code inspection

  1. crew_agent_executor.py passes response_model=self.response_model on every loop iteration in both _invoke_loop_react() and _invoke_loop_native_tools()
  2. llm.py:_handle_non_streaming_response() early-returns via InternalInstructor when response_model is set for LiteLLM-backed models — tools are never invoked
  3. Post-processing conversion already exists in Task._export_output() (task.py:961-986) and converter.py — the infrastructure to apply output_pydantic after the tool loop is already in place
  4. When output_pydantic is removed from the Task, tools execute correctly

Possible Solution

Option A (recommended): Don't pass output_pydantic as response_model into the executor loop. Keep it only in Task._export_output() as a post-processing conversion step (pre-v1.9.0 behavior). This is how LangGraph handles it — tools run in the ReAct loop, structured formatting happens in a separate generate_structured_response node after the loop.

Option B: Generalize the Gemini STRUCTURED_OUTPUT_TOOL_NAME pattern for all providers. When both tools and response_model are needed, automatically inject a set_model_response tool and instruct the LLM to use it for the final answer. Remove response_format from the LLM call params. (This is what Google ADK does with set_model_response and what LangChain does with ToolStrategy.)

Option C: Add a flag like structured_output_mode="post_process" | "native" on Task, defaulting to "post_process" for compatibility. "native" would use the current v1.9.0+ behavior for providers that support it (OpenAI, Anthropic).

Additional context

  • Workaround 1: Don't use output_pydantic / response_model on Task or Agent. Parse the raw output manually.
  • Workaround 2 (ToolStrategy pattern): Expose the output schema as a tool (e.g., FinalAnswerTool). The model calls regular tools for work, then calls the schema-tool to submit structured results. No response_format needed.
  • LLM tested: Qwen3.5-27B served via vLLM
  • LiteLLM: latest
  • The bug is a regression introduced in v1.9.0 with the "Structured outputs and response_format support across providers" change

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions