Skip to content

PoC: InferenceClient is also a MCPClient #2986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 20, 2025
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
c720d86
Add extra dependency
julien-c Apr 8, 2025
a0be544
PoC: `InferenceClient` is also a `MCPClient`
julien-c Apr 8, 2025
2c7329c
[using Claude] change the code to make MCPClient inherit from AsyncIn…
julien-c Apr 8, 2025
cef1bba
Update mcp_client.py
julien-c Apr 8, 2025
42f036e
mcp_client: Support multiple servers (#2987)
julien-c Apr 8, 2025
990a926
Revert "[using Claude] change the code to make MCPClient inherit from…
julien-c Apr 9, 2025
879d2ee
`add_mcp_server`: the env should not be hardcoded here
julien-c Apr 11, 2025
9ee3c68
Handle the "no tool call" case
julien-c Apr 11, 2025
c827256
Merge branch 'main' into mcp-client
Wauplin May 13, 2025
e5d205b
Update setup.py
Wauplin May 13, 2025
7c08143
Merge branch 'mcp-client' of github.com:huggingface/huggingface_hub i…
Wauplin May 13, 2025
67304ce
Async mcp client + example + code quality
Wauplin May 13, 2025
3d422f8
docstring
Wauplin May 13, 2025
1a12eb5
accept ChatCompletionInputMessage as input
Wauplin May 13, 2025
1f2181c
Merge branch 'main' into mcp-client
Wauplin May 13, 2025
5313d8b
lazy loading
Wauplin May 13, 2025
ff1d39b
style
Wauplin May 13, 2025
bc8448d
better type
Wauplin May 13, 2025
b03ef86
no need mcp for dev
Wauplin May 13, 2025
5d9af3a
code quality on Python 3.8
Wauplin May 13, 2025
ee648eb
Merge branch 'main' into mcp-client
Wauplin May 20, 2025
0d6981a
address feedback
Wauplin May 20, 2025
63a37f9
address feedback
Wauplin May 20, 2025
b273cba
do not close client inside of [200~process_single_turn_with_tools~
Wauplin May 20, 2025
834cef2
docstring, no more warning, garbage collection
Wauplin May 20, 2025
b3ea2ee
docs
Wauplin May 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions mcp_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
import asyncio
import json
import os
from contextlib import AsyncExitStack
from typing import Dict, List, Optional, TypeAlias

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

from huggingface_hub import AsyncInferenceClient, ChatCompletionInputTool, ChatCompletionOutput
from huggingface_hub.inference._providers import PROVIDER_T


# Type alias for tool names
ToolName: TypeAlias = str


class MCPClient:
def __init__(
self,
*,
provider: PROVIDER_T,
model: str,
api_key: Optional[str] = None,
):
self.client = AsyncInferenceClient(
provider=provider,
api_key=api_key,
)
self.model = model
# Initialize MCP sessions as a dictionary of ClientSession objects
self.sessions: Dict[ToolName, ClientSession] = {}
self.exit_stack = AsyncExitStack()
self.available_tools: List[ChatCompletionInputTool] = []

async def add_mcp_server(self, command: str, args: List[str], env: Dict[str, str]):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would need to lighten a bit the requirements on your args if you want to make it work with SSE or the intent is just to support STDIO ? I see the rest seems to focus on stdio so maybe it's by design

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, just Stdio, but in the future Streaming HTTP from what i've understood should be the way to go?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is the new spec, but it is backward compatible and at the level you are working with in this PR I wouldnt expect much to change, probably the internals of the client will change but the client interface would remain the same. Which means if today you do something like add_mcp_server(StdioParameters | dict) dict being the arguments of the sse_client from the python sdk you could already support all the SSE servers + potentially future Streaming HTTP server with minor adjustments at most

"""Connect to an MCP server

Args:
todo
"""
server_params = StdioServerParameters(command=command, args=args, env=env)

stdio_transport = await self.exit_stack.enter_async_context(stdio_client(server_params))
stdio, write = stdio_transport
session = await self.exit_stack.enter_async_context(ClientSession(stdio, write))

await session.initialize()

# List available tools
response = await session.list_tools()
tools = response.tools
print("\nConnected to server with tools:", [tool.name for tool in tools])

# Map tool names to their server for later lookup
for tool in tools:
self.sessions[tool.name] = session

self.available_tools += [
{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.inputSchema,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note that I have seen some MCP servers with jsonref in their description which sometimes confuses the model. In mcpadapt I had to resolve the jsonref before passing it to the model, might be minor for now

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused or sometime plain unsupported by the model sdk like google genai...

Copy link
Member Author

@julien-c julien-c Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, does the spec mention anything about whether jsonref is allowed or not?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the spec mention it however, it gets auto generated if you use pydantic models by the official mcp python sdk using the fastMCP syntax. I had the case for one of my mcp server I use to test things: https://github.com/grll/pubmedmcp

},
}
for tool in tools
]

async def process_query(self, query: str) -> ChatCompletionOutput:
"""Process a query using `self.model` and available tools"""
messages = [{"role": "user", "content": query}]

response = await self.client.chat.completions.create(
model=self.model,
messages=messages,
tools=self.available_tools,
tool_choice="auto",
)

# Process response and handle tool calls
tool_calls = response.choices[0].message.tool_calls
if tool_calls is None or len(tool_calls) == 0:
return response

for tool_call in tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)

# Get the appropriate session for this tool
session = self.sessions.get(function_name)
if session:
# Execute tool call with the appropriate session
result = await session.call_tool(function_name, function_args)
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": result.content[0].text,
}
)
else:
error_msg = f"No session found for tool: {function_name}"
print(f"Error: {error_msg}")
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": f"Error: {error_msg}",
}
)

enriched_response = await self.client.chat.completions.create(
model=self.model,
messages=messages,
)

return enriched_response

async def cleanup(self):
"""Clean up resources"""
await self.exit_stack.aclose()


async def main():
client = MCPClient(
provider="together",
model="Qwen/Qwen2.5-72B-Instruct",
api_key=os.environ["HF_TOKEN"],
)
try:
await client.add_mcp_server(
"node",
["--disable-warning=ExperimentalWarning", f"{os.path.expanduser('~')}/Desktop/hf-mcp/index.ts"],
{"HF_TOKEN": os.environ["HF_TOKEN"]},
)
response = await client.process_query(
"""
find an app that generates 3D models from text,
and also get the best paper about transformers
"""
)
print("\n" + response.choices[0].message.content)
finally:
await client.cleanup()


if __name__ == "__main__":
asyncio.run(main())
5 changes: 5 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ def get_version() -> str:

extras["hf_xet"] = ["hf_xet>=0.1.4"]

extras["mcp"] = [
"mcp>=1.6.0",
"aiohttp", # for AsyncInferenceClient
]

extras["testing"] = (
extras["cli"]
+ extras["inference"]
Expand Down