-
Notifications
You must be signed in to change notification settings - Fork 319
Description
Bug Report: Server tool input reconstruction missing in streaming
Summary
The SDK correctly reconstructs input
fields for tool_use
blocks during streaming via input_json_delta
events, but fails to do the same for server_tool_use
blocks (e.g., code execution tool). This creates inconsistent behavior between client and server tools, breaking legitimate use cases like code extraction from streaming responses. If confirmed, I can submit a PR.
Use Case & Context
I was building a math solver application that uses Claude's code execution tool with streaming for better user experience. The application needed to:
- Stream the response for real-time feedback
- Extract the executed code blocks for logging/analysis
- Provide a smooth user experience with both streaming and code extraction
Problem Description
When using streaming with server tools (like code_execution_20250522
), the final message contains server_tool_use
blocks with empty input dictionaries, making it impossible to extract the actual code that was executed.
Expected Behavior
# After streaming completion
for item in final_message.content:
if item.type == "server_tool_use" and item.name == "code_execution":
print(item.input) # Should contain: {"code": "print(2 + 2)"}
Actual Behavior
# After streaming completion
for item in final_message.content:
if item.type == "server_tool_use" and item.name == "code_execution":
print(item.input) # Actually contains: {}
Investigation & Root Cause
What We Tried
- Non-streaming vs Streaming comparison: Non-streaming works perfectly, streaming fails
- Different streaming approaches: Both simple
text_stream
and complex event handling fail - current_message_snapshot inspection: Same empty inputs (it's the same object as
get_final_message()
) - Manual delta reconstruction: Successfully implemented by tracking
input_json_delta
events - Client vs Server tool comparison: Client tools work, server tools don't
Root Cause in SDK Source Code
Found in src/anthropic/lib/streaming/_messages.py
, line 431:
elif event.delta.type == "input_json_delta":
if content.type == "tool_use": # ← Only handles CLIENT tools
from jiter import from_json
# JSON reconstruction logic...
json_buf = cast(bytes, getattr(content, JSON_BUF_PROPERTY, b""))
json_buf += bytes(event.delta.partial_json, "utf-8")
if json_buf:
content.input = from_json(json_buf, partial_mode=True)
setattr(content, JSON_BUF_PROPERTY, json_buf)
# Missing: elif content.type == "server_tool_use": block
The SDK only reconstructs inputs for tool_use
(client tools), completely ignoring server_tool_use
(server tools).
Reproduction Steps
Minimal Test Case
import os
from anthropic import Anthropic
client = Anthropic(
api_key=os.getenv("ANTHROPIC_API_KEY"),
default_headers={"anthropic-beta": "code-execution-2025-05-22"}
)
# Test with streaming
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Calculate 2+2 using Python"}],
tools=[{"type": "code_execution_20250522", "name": "code_execution"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final_message = stream.get_final_message()
for item in final_message.content:
if item.type == "server_tool_use":
print(f"\nServer tool input: {item.input}") # Shows: {}
# Compare with non-streaming (works correctly)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Calculate 2+2 using Python"}],
tools=[{"type": "code_execution_20250522", "name": "code_execution"}],
)
for item in response.content:
if item.type == "server_tool_use":
print(f"Non-streaming input: {item.input}") # Shows: {"code": "print(2 + 2)"}
Client vs Server Tool Comparison
# CLIENT TOOL (works with streaming)
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=[{
"name": "get_weather",
"description": "Get weather",
"input_schema": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}],
tool_choice={"type": "tool", "name": "get_weather"}
) as stream:
# ... consume stream ...
for item in final_message.content:
if item.type == "tool_use":
print(item.input) # ✅ Shows: {"location": "Paris"}
# SERVER TOOL (broken with streaming)
with client.messages.stream(
# ... same code as above but with code_execution tool ...
for item in final_message.content:
if item.type == "server_tool_use":
print(item.input) # ❌ Shows: {}
Evidence from API Documentation
The official streaming documentation clearly shows that input_json_delta
events are sent for server tools:
// Code execution streaming example from docs
event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "input_json_delta", "partial_json": "{\"code\":\"import pandas as pd\\ndf = pd.read_csv('data.csv')\\nprint(df.head())\""}}
The API sends the data, but the SDK ignores it for server tools.
Impact
This affects any application that needs to:
- Extract executed code for logging/analysis
- Build debugging tools for AI code execution
- Implement code history/replay features
- Provide transparency about what code was run
- Create educational tools showing step-by-step code execution
Recommended Fix
Extend the existing reconstruction logic to handle server tools:
elif event.delta.type == "input_json_delta":
if content.type == "tool_use":
# existing client tool logic
from jiter import from_json
json_buf = cast(bytes, getattr(content, JSON_BUF_PROPERTY, b""))
json_buf += bytes(event.delta.partial_json, "utf-8")
if json_buf:
content.input = from_json(json_buf, partial_mode=True)
setattr(content, JSON_BUF_PROPERTY, json_buf)
elif content.type == "server_tool_use": # ← Add this block
# Same reconstruction logic for server tools
from jiter import from_json
json_buf = cast(bytes, getattr(content, JSON_BUF_PROPERTY, b""))
json_buf += bytes(event.delta.partial_json, "utf-8")
if json_buf:
content.input = from_json(json_buf, partial_mode=True)
setattr(content, JSON_BUF_PROPERTY, json_buf)
Workaround (Manual Implementation)
We successfully implemented manual delta tracking as a workaround:
def extract_code_blocks_streaming_fixed(response):
"""Working code extraction with manual delta reconstruction."""
code_blocks = []
accumulated_deltas = {} # Track by content block index
# During streaming, accumulate input_json_delta events
# Then manually parse and reconstruct after completion
# (Full implementation available if needed)
return code_blocks
But this should not be necessary - the SDK should handle this automatically like it does for client tools.
Environment
- anthropic-sdk-python: Latest version
- Python: 3.9+
- Model: claude-sonnet-4-20250514
- Tool: code_execution_20250522
Conclusion
This appears to be an oversight in the SDK implementation rather than intentional design. The API sends input_json_delta
events for server tools, the documentation shows examples of it, and the SDK already has the reconstruction logic - it just doesn't apply it consistently to both tool types.
The fix would be minimal, low-risk, and would restore API consistency while enabling legitimate use cases.