Description
Since v1.9.0, Task.output_pydantic is mapped to response_model and passed into the agent executor's tool-calling loop. This causes the LLM to receive both tools and response_format simultaneously on every iteration. For LLMs served via vLLM (and likely other OpenAI-compatible servers), this results in the model skipping tool calls entirely and returning structured JSON directly — tools never execute.
Prior to v1.9.0, output_pydantic was applied as a post-processing step via Task._export_output() / converter.py, which allowed tools to run freely before formatting the final answer. The v1.9.0 change ("Structured outputs and response_format support across providers") regressed this behavior.
Root Cause
In agent/core.py, create_agent_executor (line ~869) and _update_executor_parameters (line ~898) set:
response_model = task.response_model if task else None
This response_model is then passed to get_llm_response() on every iteration of the tool-calling loop in:
CrewAgentExecutor._invoke_loop_react() (crew_agent_executor.py:337-347)
CrewAgentExecutor._invoke_loop_native_tools() (crew_agent_executor.py:498-510)
AgentExecutor.call_llm_and_parse() (agent_executor.py:361-371)
AgentExecutor.call_llm_native_tools() (agent_executor.py:439-451)
When the LLM returns a valid BaseModel, the loop immediately exits via AgentFinish without ever parsing for tool calls.
Additionally, for LiteLLM-backed LLMs (is_litellm=True), the _handle_non_streaming_response() method (llm.py:1111-1160) routes directly to InternalInstructor when response_model is set — this makes a completely separate LLM call with no tools at all, bypassing the tool-calling mechanism entirely.
Steps to Reproduce
- Create a CrewAI
Agent with tools and a non-OpenAI LLM (e.g., vLLM-served model)
- Create a
Task with output_pydantic=MyOutput (any Pydantic model)
- Run the crew
- Observe that tool
_run() methods are never called
- The agent returns structured JSON from the LLM's own knowledge instead of using tools
Minimal CrewAI reproduction
from crewai import Agent, Crew, Process, Task
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
class AddInput(BaseModel):
a: float = Field(..., description="First number")
b: float = Field(..., description="Second number")
class AddTool(BaseTool):
name: str = "Add Numbers"
description: str = "Adds two numbers."
args_schema = AddInput
def _run(self, a: float, b: float) -> str:
print(f"[AddTool] {a} + {b} = {a + b}") # Never prints
return str(a + b)
class MyOutput(BaseModel):
answer: str = Field(..., description="The answer")
agent = Agent(
role="Helper",
goal="Use tools to answer questions.",
backstory="You must use tools.",
tools=[AddTool()],
llm=llm, # Any vLLM-served model
verbose=True,
)
task = Task(
description="What is 42 + 58? Use the Add Numbers tool.",
expected_output="The sum.",
agent=agent,
output_pydantic=MyOutput, # This causes tools to be skipped
)
crew = Crew(agents=[agent], tasks=[task], process=Process.sequential, verbose=True)
result = crew.kickoff()
# AddTool._run() is NEVER called
# agent.tools_results is empty
Remove output_pydantic=MyOutput and tools work correctly.
Expected behavior
Task.output_pydantic should be applied as a post-processing step after the tool loop completes, not during it. The agent should:
- Run the tool-calling loop with
tools only (no response_model / response_format)
- Get the raw text result from the agent
- Convert the raw result to the Pydantic model via
Task._export_output() / converter.py (which already exists and works)
This was the behavior prior to v1.9.0 and tools were invoked correctly.
Screenshots/Code snippets
Verified code paths in CrewAI source (v0.2.149+)
1. response_model assignment in agent/core.py:
Line 869 (create_agent_executor):
self.agent_executor = self.executor_class(
llm=cast(BaseLLM, self.llm),
task=task,
...
response_model=task.response_model if task else None,
)
Line 898 (_update_executor_parameters):
self.agent_executor.response_model = task.response_model if task else None
2. response_model passed on every loop iteration in crew_agent_executor.py:
_invoke_loop_react() (line ~344):
answer = get_llm_response(
llm=self.llm,
messages=self.messages,
...
response_model=self.response_model, # Passed every iteration
)
_invoke_loop_native_tools() (line ~507):
answer = get_llm_response(
llm=self.llm,
messages=self.messages,
...
response_model=self.response_model, # Passed every iteration
)
3. LiteLLM bypass in llm.py _handle_non_streaming_response() (line ~1134):
if response_model and self.is_litellm:
from crewai.utilities.internal_instructor import InternalInstructor
# ... makes separate LLM call with NO tools at all
instructor_instance = InternalInstructor(
content=combined_content,
model=response_model,
llm=self,
)
result = instructor_instance.to_pydantic()
return structured_response # RETURNS EARLY - BYPASSES TOOLS
4. Gemini already has a workaround for this exact conflict (gemini/completion.py:37):
STRUCTURED_OUTPUT_TOOL_NAME = "structured_output"
# When both tools and response_model are present, injects a pseudo-tool
curl proof (no CrewAI needed)
# WITH response_format — tools SKIPPED
curl -s "$VLLM_ENDPOINT/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"model": "Qwen3.5-27B",
"messages": [
{"role": "system", "content": "You MUST use tools to answer."},
{"role": "user", "content": "What is 42 + 58?"}
],
"tools": [{"type": "function", "function": {"name": "add", "parameters": {"type": "object", "properties": {"a": {"type": "number"}, "b": {"type": "number"}}}}}],
"response_format": {"type": "json_object"},
"temperature": 0.1
}'
# Result: {"result": 100}, tool_calls: [] <-- tools skipped
# WITHOUT response_format — tools CALLED correctly
Operating System
macOS Sonoma
Python Version
3.12
crewAI Version
1.14.1 (confirmed), affects v1.9.0+
crewAI Tools Version
Latest
Virtual Environment
Venv
Evidence
Affected Providers
Any LLM provider where tools and response_format conflict:
- vLLM (confirmed) — with
tool_choice: "auto", response_format takes priority over tools
- Likely any OpenAI-compatible server that doesn't natively support both simultaneously
Note: CrewAI already has a provider-specific workaround for Gemini via STRUCTURED_OUTPUT_TOOL_NAME in gemini/completion.py, acknowledging this exact conflict exists for other providers too.
Evidence from code inspection
crew_agent_executor.py passes response_model=self.response_model on every loop iteration in both _invoke_loop_react() and _invoke_loop_native_tools()
llm.py:_handle_non_streaming_response() early-returns via InternalInstructor when response_model is set for LiteLLM-backed models — tools are never invoked
- Post-processing conversion already exists in
Task._export_output() (task.py:961-986) and converter.py — the infrastructure to apply output_pydantic after the tool loop is already in place
- When
output_pydantic is removed from the Task, tools execute correctly
Possible Solution
Option A (recommended): Don't pass output_pydantic as response_model into the executor loop. Keep it only in Task._export_output() as a post-processing conversion step (pre-v1.9.0 behavior). This is how LangGraph handles it — tools run in the ReAct loop, structured formatting happens in a separate generate_structured_response node after the loop.
Option B: Generalize the Gemini STRUCTURED_OUTPUT_TOOL_NAME pattern for all providers. When both tools and response_model are needed, automatically inject a set_model_response tool and instruct the LLM to use it for the final answer. Remove response_format from the LLM call params. (This is what Google ADK does with set_model_response and what LangChain does with ToolStrategy.)
Option C: Add a flag like structured_output_mode="post_process" | "native" on Task, defaulting to "post_process" for compatibility. "native" would use the current v1.9.0+ behavior for providers that support it (OpenAI, Anthropic).
Additional context
- Workaround 1: Don't use
output_pydantic / response_model on Task or Agent. Parse the raw output manually.
- Workaround 2 (ToolStrategy pattern): Expose the output schema as a tool (e.g.,
FinalAnswerTool). The model calls regular tools for work, then calls the schema-tool to submit structured results. No response_format needed.
- LLM tested: Qwen3.5-27B served via vLLM
- LiteLLM: latest
- The bug is a regression introduced in v1.9.0 with the "Structured outputs and response_format support across providers" change
Description
Since v1.9.0,
Task.output_pydanticis mapped toresponse_modeland passed into the agent executor's tool-calling loop. This causes the LLM to receive bothtoolsandresponse_formatsimultaneously on every iteration. For LLMs served via vLLM (and likely other OpenAI-compatible servers), this results in the model skipping tool calls entirely and returning structured JSON directly — tools never execute.Prior to v1.9.0,
output_pydanticwas applied as a post-processing step viaTask._export_output()/converter.py, which allowed tools to run freely before formatting the final answer. The v1.9.0 change ("Structured outputs and response_format support across providers") regressed this behavior.Root Cause
In
agent/core.py,create_agent_executor(line ~869) and_update_executor_parameters(line ~898) set:This
response_modelis then passed toget_llm_response()on every iteration of the tool-calling loop in:CrewAgentExecutor._invoke_loop_react()(crew_agent_executor.py:337-347)CrewAgentExecutor._invoke_loop_native_tools()(crew_agent_executor.py:498-510)AgentExecutor.call_llm_and_parse()(agent_executor.py:361-371)AgentExecutor.call_llm_native_tools()(agent_executor.py:439-451)When the LLM returns a valid
BaseModel, the loop immediately exits viaAgentFinishwithout ever parsing for tool calls.Additionally, for LiteLLM-backed LLMs (
is_litellm=True), the_handle_non_streaming_response()method (llm.py:1111-1160) routes directly toInternalInstructorwhenresponse_modelis set — this makes a completely separate LLM call with no tools at all, bypassing the tool-calling mechanism entirely.Steps to Reproduce
Agentwith tools and a non-OpenAI LLM (e.g., vLLM-served model)Taskwithoutput_pydantic=MyOutput(any Pydantic model)_run()methods are never calledMinimal CrewAI reproduction
Remove
output_pydantic=MyOutputand tools work correctly.Expected behavior
Task.output_pydanticshould be applied as a post-processing step after the tool loop completes, not during it. The agent should:toolsonly (noresponse_model/response_format)Task._export_output()/converter.py(which already exists and works)This was the behavior prior to v1.9.0 and tools were invoked correctly.
Screenshots/Code snippets
Verified code paths in CrewAI source (v0.2.149+)
1.
response_modelassignment inagent/core.py:Line 869 (
create_agent_executor):Line 898 (
_update_executor_parameters):2.
response_modelpassed on every loop iteration increw_agent_executor.py:_invoke_loop_react()(line ~344):_invoke_loop_native_tools()(line ~507):3. LiteLLM bypass in
llm.py_handle_non_streaming_response()(line ~1134):4. Gemini already has a workaround for this exact conflict (
gemini/completion.py:37):curl proof (no CrewAI needed)
Operating System
macOS Sonoma
Python Version
3.12
crewAI Version
1.14.1 (confirmed), affects v1.9.0+
crewAI Tools Version
Latest
Virtual Environment
Venv
Evidence
Affected Providers
Any LLM provider where
toolsandresponse_formatconflict:tool_choice: "auto",response_formattakes priority over toolsNote: CrewAI already has a provider-specific workaround for Gemini via
STRUCTURED_OUTPUT_TOOL_NAMEingemini/completion.py, acknowledging this exact conflict exists for other providers too.Evidence from code inspection
crew_agent_executor.pypassesresponse_model=self.response_modelon every loop iteration in both_invoke_loop_react()and_invoke_loop_native_tools()llm.py:_handle_non_streaming_response()early-returns viaInternalInstructorwhenresponse_modelis set for LiteLLM-backed models — tools are never invokedTask._export_output()(task.py:961-986) andconverter.py— the infrastructure to applyoutput_pydanticafter the tool loop is already in placeoutput_pydanticis removed from the Task, tools execute correctlyPossible Solution
Option A (recommended): Don't pass
output_pydanticasresponse_modelinto the executor loop. Keep it only inTask._export_output()as a post-processing conversion step (pre-v1.9.0 behavior). This is how LangGraph handles it — tools run in the ReAct loop, structured formatting happens in a separategenerate_structured_responsenode after the loop.Option B: Generalize the Gemini
STRUCTURED_OUTPUT_TOOL_NAMEpattern for all providers. When both tools andresponse_modelare needed, automatically inject aset_model_responsetool and instruct the LLM to use it for the final answer. Removeresponse_formatfrom the LLM call params. (This is what Google ADK does withset_model_responseand what LangChain does withToolStrategy.)Option C: Add a flag like
structured_output_mode="post_process" | "native"on Task, defaulting to"post_process"for compatibility."native"would use the current v1.9.0+ behavior for providers that support it (OpenAI, Anthropic).Additional context
output_pydantic/response_modelon Task or Agent. Parse the raw output manually.FinalAnswerTool). The model calls regular tools for work, then calls the schema-tool to submit structured results. Noresponse_formatneeded.