AI Model Tool Call Proxy Server

A Flask-based proxy server that enables seamless integration of AI models with different tool call formats by automatically converting them to OpenAI's standard format. Perfect for using models like GLM with OpenAI-compatible clients.

I don't do vibe coding, generally, but this seemed like a decent place to get my feet wet. I've added Qwen3-Coder and Qwen3 parsing through codex and GPT-OSS-20B. It'll also pick up the .env file which is nice.

✨ Features

🔄 Automatic Tool Call Conversion: Converts model-specific tool call formats (like GLM's <tool_call> syntax) to OpenAI's standard format
⚡ Streaming Support: Full support for both streaming and non-streaming responses
🎯 Model-Specific Handling: Modular converter system that automatically detects and handles different model formats
🌐 Full OpenAI API Compatibility: Supports all major endpoints (/v1/chat/completions, /v1/completions, /v1/models, /v1/embeddings)
🔧 Configurable: Environment-based configuration for easy deployment
🧩 Extensible: Easy to add support for new model formats

🚀 Quick Start

Prerequisites

Python 3.7+
A running AI model server (LM Studio, Ollama, etc.)

Installation

Clone and setup:

git clone <repository-url>
cd proxy
pip install -r requirements.txt

Configure your backend (optional):

# Set environment variables or modify config.py
export BACKEND_HOST=localhost
export BACKEND_PORT=8888

Start the proxy:
```
python app.py
```

Use with any OpenAI-compatible client:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:5000", api_key="your-key")

response = client.chat.completions.create(
    model="glm-4.5-air-hi-mlx",
    messages=[{"role": "user", "content": "Search for information about Python"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "search_wikipedia",
            "description": "Search Wikipedia",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"]
            }
        }
    }]
)

🎯 Supported Models

Currently Supported

Model Type	Tool Call Format	Status
GLM Models	`<tool_call>` syntax	✅ Full Support
OpenAI Models	Standard format	✅ Pass-through
Claude Models	`<invoke>` syntax	✅ Example Implementation

GLM Format Example

Input (GLM format):

I'll search for that information.
<tool_call>fetch_wikipedia_content
<arg_key>search_query</arg_key>
<arg_value>Python programming</arg_value>
</tool_call>

Output (OpenAI format):

{
  "tool_calls": [{
    "id": "123456789",
    "type": "function",
    "function": {
      "name": "fetch_wikipedia_content",
      "arguments": "{\"search_query\": \"Python programming\"}"
    }
  }],
  "finish_reason": "tool_calls"
}

🔧 Configuration

Environment Variables

Variable	Default	Description
`BACKEND_HOST`	`localhost`	Backend server hostname
`BACKEND_PORT`	`8888`	Backend server port
`BACKEND_PROTOCOL`	`http`	Backend protocol
`PROXY_HOST`	`0.0.0.0`	Proxy server bind address
`PROXY_PORT`	`5000`	Proxy server port
`REQUEST_TIMEOUT`	`3600`	Regular request timeout (seconds)
`STREAMING_TIMEOUT`	`3600`	Streaming request timeout (seconds, use 'none' to disable)
`ENABLE_TOOL_CALL_CONVERSION`	`true`	Enable/disable tool call conversion
`REMOVE_THINK_TAGS`	`true`	Remove complete `<think>...</think>` blocks from responses
`LOG_LEVEL`	`INFO`	Logging level
`FLASK_ENV`	`development`	Environment (development/production/testing)

Configuration File

Create a .env file or modify config.py directly:

# config.py
BACKEND_HOST = 'localhost'
BACKEND_PORT = 8888
PROXY_PORT = 5000
ENABLE_TOOL_CALL_CONVERSION = True
REMOVE_THINK_TAGS = True  # Set to False to preserve <think> content

Predefined Backend Configurations

from config import get_backend_config

# Use with LM Studio
lmstudio_config = get_backend_config('lmstudio')  # localhost:8888

# Use with Ollama
ollama_config = get_backend_config('ollama')      # localhost:11434

# Use with OpenAI API
openai_config = get_backend_config('openai')     # api.openai.com:443

🏗️ Architecture

Modular Converter System

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Client        │───▶│  Proxy Server    │───▶│  Backend Model  │
│  (OpenAI API)   │    │                  │    │   (GLM/etc.)    │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌──────────────────┐
                       │ Converter Factory │
                       │  - GLM Converter  │
                       │  - OpenAI Converter│
                       │  - Claude Converter│
                       │  - Custom Converter│
                       └──────────────────┘

Adding New Model Support

Create a converter:

# converters/mymodel.py
from .base import ToolCallConverter

class MyModelConverter(ToolCallConverter):
    def can_handle_model(self, model_name: str) -> bool:
        return 'mymodel' in model_name.lower()
    
    def parse_tool_calls(self, content: str) -> List[Dict]:
        # Your parsing logic here
        pass

Register the converter:

# In factory.py or at runtime
from converters.factory import converter_factory
converter_factory.register_converter(MyModelConverter())

📡 API Endpoints

Chat Completions

POST /v1/chat/completions - Chat completions with tool call conversion
POST /chat/completions - Alternative endpoint

Other OpenAI Compatible Endpoints

GET /v1/models - List available models
POST /v1/completions - Text completions
POST /v1/embeddings - Text embeddings
GET /health - Health check

Request Example

curl -X POST http://localhost:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.5-air-hi-mlx",
    "messages": [
      {"role": "user", "content": "Search for Python tutorials"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "web_search",
          "description": "Search the web",
          "parameters": {
            "type": "object",
            "properties": {
              "query": {"type": "string"}
            },
            "required": ["query"]
          }
        }
      }
    ],
    "stream": false
  }'

🧪 Testing

Run All Tests

# Test modular converters
python test_modular_converters.py

# Test tool call conversion
python test_tool_call_real.py

# Test streaming functionality
python test_streaming_tools.py

# Test full API compatibility
python test_full_api.py

Integration Testing

# Start your backend model server (e.g., LM Studio on port 8888)
# Start the proxy server
python app.py

# Test with the example client
python lmstudio-tooluse-test.py

🐛 Troubleshooting

Common Issues

Connection Refused

# Check if backend server is running
curl http://localhost:8888/v1/models

# Check proxy server
curl http://localhost:5000/health

Tool Calls Not Converting

# Check if conversion is enabled
export ENABLE_TOOL_CALL_CONVERSION=true

# Check model detection
# Make sure your model name matches the converter patterns

Import Errors

# Make sure you're in the correct directory
cd /path/to/proxy
python app.py

Debug Mode

export FLASK_ENV=development
export LOG_LEVEL=DEBUG
python app.py

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Add tests for your changes
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

Adding Model Support

We welcome contributions for new model formats! Please:

Create a converter in converters/
Add comprehensive tests
Update documentation
Submit a PR with examples

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🧠 Think Tag Processing

The proxy server can handle GLM model <think> tags for better response formatting:

Configuration

Set REMOVE_THINK_TAGS environment variable to control behavior:

REMOVE_THINK_TAGS=true (default): Remove complete <think>...</think> blocks
REMOVE_THINK_TAGS=false: Preserve <think> content for debugging/transparency

Examples

With REMOVE_THINK_TAGS=true (default):

Input:  "I need to analyze this. <think>Let me think step by step...</think> Here's my answer."
Output: "I need to analyze this.  Here's my answer."

With REMOVE_THINK_TAGS=false:

Input:  "I need to analyze this. <think>Let me think step by step...</think> Here's my answer."
Output: "I need to analyze this. Let me think step by step... Here's my answer."

Note: Malformed/orphaned think tags are always cleaned up regardless of the setting:

</think> without opening tag → removed
<think> without closing tag → removed

⏱️ Timeout Configuration

The proxy server supports different timeout settings for regular and streaming requests:

Environment Variables

REQUEST_TIMEOUT=3600: Regular request timeout (60 minutes)
STREAMING_TIMEOUT=3600: Streaming request timeout (60 minutes)

Disable Streaming Timeout

For long-running streaming requests, you can disable the timeout:

export STREAMING_TIMEOUT=none
# or
export STREAMING_TIMEOUT=0
# or 
export STREAMING_TIMEOUT=false

Usage Examples

Short timeout for quick responses:

export REQUEST_TIMEOUT=30
export STREAMING_TIMEOUT=120

No timeout for long streaming sessions:

export REQUEST_TIMEOUT=3600
export STREAMING_TIMEOUT=none

Note: Disabling streaming timeout is useful for:

Long document generation
Complex reasoning tasks
Large dataset processing
Extended conversations

🙏 Acknowledgments

Built for seamless integration with LM Studio
Compatible with OpenAI Python SDK
Inspired by the need for universal AI model compatibility

📞 Support

🐛 Bug Reports: Open an issue
💡 Feature Requests: Start a discussion
📖 Documentation: Wiki
🌍 Korean Documentation: README_ko.md

Made with ❤️ for the AI community

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
converters		converters
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_ko.md		README_ko.md
app.py		app.py
config.py		config.py
debug_converter.py		debug_converter.py
debug_streaming_after_tool.py		debug_streaming_after_tool.py
debug_tool_call.py		debug_tool_call.py
lmstudio-tooluse-test.py		lmstudio-tooluse-test.py
requirements.txt		requirements.txt
run_test.py		run_test.py
simple_test.py		simple_test.py
test_empty_think_e2e.py		test_empty_think_e2e.py
test_empty_think_tags.py		test_empty_think_tags.py
test_full_api.py		test_full_api.py
test_glm_formats.py		test_glm_formats.py
test_korean_final.py		test_korean_final.py
test_korean_streaming.py		test_korean_streaming.py
test_lmstudio_integration.py		test_lmstudio_integration.py
test_lmstudio_noninteractive.py		test_lmstudio_noninteractive.py
test_lmstudio_streaming.py		test_lmstudio_streaming.py
test_modular_converters.py		test_modular_converters.py
test_multiturn.py		test_multiturn.py
test_multiturn_nonstreaming.py		test_multiturn_nonstreaming.py
test_multiturn_streaming.py		test_multiturn_streaming.py
test_proxy.py		test_proxy.py
test_regex.py		test_regex.py
test_simple_streaming.py		test_simple_streaming.py
test_streaming_final.py		test_streaming_final.py
test_streaming_tools.py		test_streaming_tools.py
test_think_env_config.py		test_think_env_config.py
test_think_tags.py		test_think_tags.py
test_timeout_config.py		test_timeout_config.py
test_tool_call_real.py		test_tool_call_real.py

License

dinerburger/llm-toolcall-proxy

Folders and files

Latest commit

History

Repository files navigation

AI Model Tool Call Proxy Server

✨ Features

🚀 Quick Start

Prerequisites

Installation

🎯 Supported Models

Currently Supported

GLM Format Example

🔧 Configuration

Environment Variables

Configuration File

Predefined Backend Configurations

🏗️ Architecture

Modular Converter System

Adding New Model Support

📡 API Endpoints

Chat Completions

Other OpenAI Compatible Endpoints

Request Example

🧪 Testing

Run All Tests

Integration Testing

🐛 Troubleshooting

Common Issues

Debug Mode

🤝 Contributing

Adding Model Support

📄 License

🧠 Think Tag Processing

Configuration

Examples

⏱️ Timeout Configuration

Environment Variables

Disable Streaming Timeout

Usage Examples

🙏 Acknowledgments

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages