[BUG] `output_pydantic` / `response_model` leaks into agent tool-calling loop, causing tools to be skipped on non-OpenAI LLMs

### Description

Since v1.9.0, `Task.output_pydantic` is mapped to `response_model` and passed into the agent executor's tool-calling loop. This causes the LLM to receive both `tools` and `response_format` simultaneously on every iteration. For LLMs served via vLLM (and likely other OpenAI-compatible servers), this results in the model skipping tool calls entirely and returning structured JSON directly — tools never execute.

Prior to v1.9.0, `output_pydantic` was applied as a **post-processing step** via `Task._export_output()` / `converter.py`, which allowed tools to run freely before formatting the final answer. The v1.9.0 change ("Structured outputs and response_format support across providers") regressed this behavior.

## Root Cause

In `agent/core.py`, `create_agent_executor` (line ~869) and `_update_executor_parameters` (line ~898) set:

```python
response_model = task.response_model if task else None
```

This `response_model` is then passed to `get_llm_response()` on **every iteration** of the tool-calling loop in:
- `CrewAgentExecutor._invoke_loop_react()` (crew_agent_executor.py:337-347)
- `CrewAgentExecutor._invoke_loop_native_tools()` (crew_agent_executor.py:498-510)
- `AgentExecutor.call_llm_and_parse()` (agent_executor.py:361-371)
- `AgentExecutor.call_llm_native_tools()` (agent_executor.py:439-451)

When the LLM returns a valid `BaseModel`, the loop immediately exits via `AgentFinish` without ever parsing for tool calls.

Additionally, for LiteLLM-backed LLMs (`is_litellm=True`), the `_handle_non_streaming_response()` method (llm.py:1111-1160) routes directly to `InternalInstructor` when `response_model` is set — this makes a completely separate LLM call with **no tools at all**, bypassing the tool-calling mechanism entirely.

### Steps to Reproduce

1. Create a CrewAI `Agent` with tools and a non-OpenAI LLM (e.g., vLLM-served model)
2. Create a `Task` with `output_pydantic=MyOutput` (any Pydantic model)
3. Run the crew
4. Observe that tool `_run()` methods are **never called**
5. The agent returns structured JSON from the LLM's own knowledge instead of using tools

### Minimal CrewAI reproduction

```python
from crewai import Agent, Crew, Process, Task
from crewai.tools import BaseTool
from pydantic import BaseModel, Field

class AddInput(BaseModel):
    a: float = Field(..., description="First number")
    b: float = Field(..., description="Second number")

class AddTool(BaseTool):
    name: str = "Add Numbers"
    description: str = "Adds two numbers."
    args_schema = AddInput
    def _run(self, a: float, b: float) -> str:
        print(f"[AddTool] {a} + {b} = {a + b}")  # Never prints
        return str(a + b)

class MyOutput(BaseModel):
    answer: str = Field(..., description="The answer")

agent = Agent(
    role="Helper",
    goal="Use tools to answer questions.",
    backstory="You must use tools.",
    tools=[AddTool()],
    llm=llm,  # Any vLLM-served model
    verbose=True,
)

task = Task(
    description="What is 42 + 58? Use the Add Numbers tool.",
    expected_output="The sum.",
    agent=agent,
    output_pydantic=MyOutput,  # This causes tools to be skipped
)

crew = Crew(agents=[agent], tasks=[task], process=Process.sequential, verbose=True)
result = crew.kickoff()
# AddTool._run() is NEVER called
# agent.tools_results is empty
```

Remove `output_pydantic=MyOutput` and tools work correctly.

### Expected behavior

`Task.output_pydantic` should be applied as a **post-processing step after the tool loop completes**, not during it. The agent should:

1. Run the tool-calling loop with `tools` only (no `response_model` / `response_format`)
2. Get the raw text result from the agent
3. Convert the raw result to the Pydantic model via `Task._export_output()` / `converter.py` (which already exists and works)

This was the behavior prior to v1.9.0 and tools were invoked correctly.

### Screenshots/Code snippets

### Verified code paths in CrewAI source (v0.2.149+)

**1. `response_model` assignment in `agent/core.py`:**

Line 869 (`create_agent_executor`):
```python
self.agent_executor = self.executor_class(
    llm=cast(BaseLLM, self.llm),
    task=task,
    ...
    response_model=task.response_model if task else None,
)
```

Line 898 (`_update_executor_parameters`):
```python
self.agent_executor.response_model = task.response_model if task else None
```

**2. `response_model` passed on every loop iteration in `crew_agent_executor.py`:**

`_invoke_loop_react()` (line ~344):
```python
answer = get_llm_response(
    llm=self.llm,
    messages=self.messages,
    ...
    response_model=self.response_model,  # Passed every iteration
)
```

`_invoke_loop_native_tools()` (line ~507):
```python
answer = get_llm_response(
    llm=self.llm,
    messages=self.messages,
    ...
    response_model=self.response_model,  # Passed every iteration
)
```

**3. LiteLLM bypass in `llm.py` `_handle_non_streaming_response()` (line ~1134):**

```python
if response_model and self.is_litellm:
    from crewai.utilities.internal_instructor import InternalInstructor
    # ... makes separate LLM call with NO tools at all
    instructor_instance = InternalInstructor(
        content=combined_content,
        model=response_model,
        llm=self,
    )
    result = instructor_instance.to_pydantic()
    return structured_response  # RETURNS EARLY - BYPASSES TOOLS
```

**4. Gemini already has a workaround for this exact conflict** (`gemini/completion.py:37`):
```python
STRUCTURED_OUTPUT_TOOL_NAME = "structured_output"
# When both tools and response_model are present, injects a pseudo-tool
```

### curl proof (no CrewAI needed)

```bash
# WITH response_format — tools SKIPPED
curl -s "$VLLM_ENDPOINT/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "Qwen3.5-27B",
    "messages": [
      {"role": "system", "content": "You MUST use tools to answer."},
      {"role": "user", "content": "What is 42 + 58?"}
    ],
    "tools": [{"type": "function", "function": {"name": "add", "parameters": {"type": "object", "properties": {"a": {"type": "number"}, "b": {"type": "number"}}}}}],
    "response_format": {"type": "json_object"},
    "temperature": 0.1
  }'
# Result: {"result": 100}, tool_calls: []  <-- tools skipped

# WITHOUT response_format — tools CALLED correctly
```

### Operating System

macOS Sonoma

### Python Version

3.12

### crewAI Version

1.14.1 (confirmed), affects v1.9.0+

### crewAI Tools Version

Latest

### Virtual Environment

Venv

### Evidence

### Affected Providers

Any LLM provider where `tools` and `response_format` conflict:
- **vLLM** (confirmed) — with `tool_choice: "auto"`, `response_format` takes priority over tools
- Likely any OpenAI-compatible server that doesn't natively support both simultaneously

**Note:** CrewAI already has a provider-specific workaround for Gemini via `STRUCTURED_OUTPUT_TOOL_NAME` in `gemini/completion.py`, acknowledging this exact conflict exists for other providers too.

### Evidence from code inspection

1. `crew_agent_executor.py` passes `response_model=self.response_model` on every loop iteration in both `_invoke_loop_react()` and `_invoke_loop_native_tools()`
2. `llm.py:_handle_non_streaming_response()` early-returns via `InternalInstructor` when `response_model` is set for LiteLLM-backed models — tools are never invoked
3. Post-processing conversion already exists in `Task._export_output()` (task.py:961-986) and `converter.py` — the infrastructure to apply `output_pydantic` after the tool loop is already in place
4. When `output_pydantic` is removed from the Task, tools execute correctly

### Possible Solution

**Option A (recommended):** Don't pass `output_pydantic` as `response_model` into the executor loop. Keep it only in `Task._export_output()` as a post-processing conversion step (pre-v1.9.0 behavior). This is how LangGraph handles it — tools run in the ReAct loop, structured formatting happens in a separate `generate_structured_response` node after the loop.

**Option B:** Generalize the Gemini `STRUCTURED_OUTPUT_TOOL_NAME` pattern for all providers. When both tools and `response_model` are needed, automatically inject a `set_model_response` tool and instruct the LLM to use it for the final answer. Remove `response_format` from the LLM call params. (This is what Google ADK does with `set_model_response` and what LangChain does with `ToolStrategy`.)

**Option C:** Add a flag like `structured_output_mode="post_process" | "native"` on Task, defaulting to `"post_process"` for compatibility. `"native"` would use the current v1.9.0+ behavior for providers that support it (OpenAI, Anthropic).

### Additional context

- **Workaround 1:** Don't use `output_pydantic` / `response_model` on Task or Agent. Parse the raw output manually.
- **Workaround 2 (ToolStrategy pattern):** Expose the output schema as a tool (e.g., `FinalAnswerTool`). The model calls regular tools for work, then calls the schema-tool to submit structured results. No `response_format` needed.
- LLM tested: Qwen3.5-27B served via vLLM
- LiteLLM: latest
- The bug is a regression introduced in v1.9.0 with the "Structured outputs and response_format support across providers" change

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `output_pydantic` / `response_model` leaks into agent tool-calling loop, causing tools to be skipped on non-OpenAI LLMs #5472

Description

Root Cause

Steps to Reproduce

Minimal CrewAI reproduction

Expected behavior

Screenshots/Code snippets

Verified code paths in CrewAI source (v0.2.149+)

curl proof (no CrewAI needed)

Operating System

Python Version

crewAI Version

crewAI Tools Version

Virtual Environment

Evidence

Affected Providers

Evidence from code inspection

Possible Solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] output_pydantic / response_model leaks into agent tool-calling loop, causing tools to be skipped on non-OpenAI LLMs #5472

Description

Description

Root Cause

Steps to Reproduce

Minimal CrewAI reproduction

Expected behavior

Screenshots/Code snippets

Verified code paths in CrewAI source (v0.2.149+)

curl proof (no CrewAI needed)

Operating System

Python Version

crewAI Version

crewAI Tools Version

Virtual Environment

Evidence

Affected Providers

Evidence from code inspection

Possible Solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG] `output_pydantic` / `response_model` leaks into agent tool-calling loop, causing tools to be skipped on non-OpenAI LLMs #5472