Skip to content

PoC: InferenceClient is also a MCPClient #2986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 20, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
c720d86
Add extra dependency
julien-c Apr 8, 2025
a0be544
PoC: `InferenceClient` is also a `MCPClient`
julien-c Apr 8, 2025
2c7329c
[using Claude] change the code to make MCPClient inherit from AsyncIn…
julien-c Apr 8, 2025
cef1bba
Update mcp_client.py
julien-c Apr 8, 2025
42f036e
mcp_client: Support multiple servers (#2987)
julien-c Apr 8, 2025
990a926
Revert "[using Claude] change the code to make MCPClient inherit from…
julien-c Apr 9, 2025
879d2ee
`add_mcp_server`: the env should not be hardcoded here
julien-c Apr 11, 2025
9ee3c68
Handle the "no tool call" case
julien-c Apr 11, 2025
c827256
Merge branch 'main' into mcp-client
Wauplin May 13, 2025
e5d205b
Update setup.py
Wauplin May 13, 2025
7c08143
Merge branch 'mcp-client' of github.com:huggingface/huggingface_hub i…
Wauplin May 13, 2025
67304ce
Async mcp client + example + code quality
Wauplin May 13, 2025
3d422f8
docstring
Wauplin May 13, 2025
1a12eb5
accept ChatCompletionInputMessage as input
Wauplin May 13, 2025
1f2181c
Merge branch 'main' into mcp-client
Wauplin May 13, 2025
5313d8b
lazy loading
Wauplin May 13, 2025
ff1d39b
style
Wauplin May 13, 2025
bc8448d
better type
Wauplin May 13, 2025
b03ef86
no need mcp for dev
Wauplin May 13, 2025
5d9af3a
code quality on Python 3.8
Wauplin May 13, 2025
ee648eb
Merge branch 'main' into mcp-client
Wauplin May 20, 2025
0d6981a
address feedback
Wauplin May 20, 2025
63a37f9
address feedback
Wauplin May 20, 2025
b273cba
do not close client inside of [200~process_single_turn_with_tools~
Wauplin May 20, 2025
834cef2
docstring, no more warning, garbage collection
Wauplin May 20, 2025
b3ea2ee
docs
Wauplin May 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions mcp_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
import asyncio
import json
import os
from contextlib import AsyncExitStack
from typing import List, Optional

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

from huggingface_hub import AsyncInferenceClient, ChatCompletionInputTool, ChatCompletionOutput
from huggingface_hub.inference._providers import PROVIDER_T


class MCPClient(AsyncInferenceClient):
def __init__(
self,
*,
provider: PROVIDER_T,
model: str,
api_key: Optional[str] = None,
):
super().__init__(
provider=provider,
api_key=api_key,
)
self.model = model
# Initialize MCP session and client objects
self.session: Optional[ClientSession] = None
self.exit_stack = AsyncExitStack()
self.available_tools: List[ChatCompletionInputTool] = []

async def add_mcp_server(self, command: str, args: List[str]):
Copy link
Contributor

@mishig25 mishig25 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not name this method connect_to_server ?
we can't add multiple mcp servers to single instance of client, can we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean we would need to store a map of sessions, but there's nothing preventing us from doing it, conceptually

Copy link
Contributor

@mishig25 mishig25 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if the following question is out of scope.
Question: how would I connect to multiple MCP servers? Would it look like option 1 or option 2?

Option 1

client1 = MCPClient()
client1.add_mcp_server()

client2 = MCPClient()
client2.add_mcp_server()

or

Option 2

client = MCPClient()
client.add_mcp_server(server1)
client.add_mcp_server(server2)

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another design question: class MCPClient(AsyncInferenceClient) vs class AsyncInferenceClient(...args, mcp_clients: MCPClient[])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to chime in unannounced, but from a very removed external user standpoint, I find this all very confusing - I just don't think what you coded should be called MCPClient 😄

When I came to this PR I was fully expecting MCPClient to be passed as parameter to InferenceClient, though I hear @Wauplin above, so why not a wrapper. But the end result is really more of an InferenceClientWithEmbeddedMCP to me, not an MCPClient.

That being said, it's just about semantics, but I'm kind of a semantics extremist, sorry about that (and feel free to completely disregard this message, as is very likely XD)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was fully expecting MCPClient to be passed as parameter to InferenceClient

What do you mean as parameter? Do you have an example signature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second option of #2986 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, sure, we can probably add this I guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah actually with the async/await stuff i'm not so sure.

"""Connect to an MCP server

Args:
todo
"""
server_params = StdioServerParameters(command=command, args=args, env={"HF_TOKEN": os.environ["HF_TOKEN"]})

stdio_transport = await self.exit_stack.enter_async_context(stdio_client(server_params))
self.stdio, self.write = stdio_transport
self.session = await self.exit_stack.enter_async_context(ClientSession(self.stdio, self.write))

await self.session.initialize()

# List available tools
response = await self.session.list_tools()
tools = response.tools
print("\nConnected to server with tools:", [tool.name for tool in tools])
self.available_tools += [
{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.inputSchema,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note that I have seen some MCP servers with jsonref in their description which sometimes confuses the model. In mcpadapt I had to resolve the jsonref before passing it to the model, might be minor for now

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused or sometime plain unsupported by the model sdk like google genai...

Copy link
Member Author

@julien-c julien-c Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, does the spec mention anything about whether jsonref is allowed or not?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the spec mention it however, it gets auto generated if you use pydantic models by the official mcp python sdk using the fastMCP syntax. I had the case for one of my mcp server I use to test things: https://github.com/grll/pubmedmcp

},
}
for tool in tools
]

async def process_query(self, query: str) -> ChatCompletionOutput:
"""Process a query using Claude and available tools"""
messages = [{"role": "user", "content": query}]

response = await self.chat.completions.create(
model=self.model,
messages=messages,
tools=self.available_tools,
tool_choice="auto",
)

# Process response and handle tool calls
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
for tool_call in tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)

# Execute tool call
result = await self.session.call_tool(function_name, function_args)
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": result.content[0].text,
}
)

function_enriched_response = await self.chat.completions.create(
model=self.model,
messages=messages,
)

return function_enriched_response

async def cleanup(self):
"""Clean up resources"""
await self.exit_stack.aclose()


async def main():
client = MCPClient(
provider="together",
model="Qwen/Qwen2.5-72B-Instruct",
api_key=os.environ["HF_TOKEN"],
)
try:
await client.add_mcp_server(
"node", ["--disable-warning=ExperimentalWarning", f"{os.path.expanduser('~')}/Desktop/hf-mcp/index.ts"]
)
response = await client.process_query(
"""
find an app that generates 3D models from text,
and also get the best paper about transformers
"""
)
print("\n" + response.choices[0].message.content)
finally:
await client.cleanup()


if __name__ == "__main__":
asyncio.run(main())
5 changes: 5 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ def get_version() -> str:

extras["hf_xet"] = ["hf_xet>=0.1.4"]

extras["mcp"] = [
"mcp>=1.6.0",
"aiohttp", # for AsyncInferenceClient
]

extras["testing"] = (
extras["cli"]
+ extras["inference"]
Expand Down
Loading