Questions on the Future of mlx-lm.server: Production Readiness, Feature Roadmap, and Competitive Positioning #371

iansmathew · 2025-08-09T19:29:32Z

iansmathew
Aug 9, 2025

Hello MLX Team,
I have a few questions regarding the current and future status of mlx-lm and its broader positioning in the self-hosted landscape. I would appreciate your insights on the following points:

1. Role as Inference Provider vs Fine-Tuning Utility

Is the future direction of mlx-lm focused solely on fine-tuning and training models, or is there an intention for it to become a robust inference provider—on par with solutions like llama.cpp and Ollama?

2. Feature Development: Context Length and Advanced Controls

Are there plans to add more advanced controls such as context length or session parameters on mlx-lm.server, as supported in llama.cpp? At present, it appears context length is not adjustable and is dynamically allocated. I have yet to face any issues with this, but I wonder if there is a possibility of running out of memory with long sessions.

3. Tool Calling Support

How is support for tool_calling currently handled? I've attempted to use the qwen3-Coder-30B 4bit model with Cline, but unfortunately, it fails at successfully calling tools compared to similar GGUF models.

4. Production Readiness

Could you share how central mlx-lm.server is to the future of the MLX project? Are there concrete plans to further mature the server, aim for production readiness, and ensure continued compliance with the OpenAI API specification?

Thank you for your dedication and for advancing the open-source AI community on Apple Silicon! ❤️

awni · 2025-08-14T15:48:40Z

awni
Aug 14, 2025
Maintainer

Is the future direction of mlx-lm focused solely on fine-tuning and training models, or is there an intention for it to become a robust inference provider—on par with solutions like llama.cpp and Ollama?

We intend to continue to focus on both!

Are there plans to add more advanced controls such as context length or session parameters on mlx-lm.server, as supported in llama.cpp?

We'll prioritize features on a feature-by-feature basis. If there is something you need, I encourage you to file an issue. Regarding context length you can set the maximum size by specifying max_kv_size which will limit the context size.

How is support for tool_calling currently handled? I've attempted to use the qwen3-Coder-30B 4bit model with Cline, but unfortunately, it fails at successfully calling tools compared to similar GGUF models.

Would be great if you could file an issue with steps to repro. In general tool-calling is a fast moving target and model providers haven't really converged on a standard so it's not clear yet where we should support it in the stack. But I think we will at the very least aim to support the common cases when possible and hopefully more.

Could you share how central mlx-lm.server is to the future of the MLX project? Are there concrete plans to further mature the server, aim for production readiness, and ensure continued compliance with the OpenAI API specification?

We'll continue to maintain mlx_lm.server and keep in compatible with OpenAI API spec (unless there is a good reason to change that). Regarding production readiness, mlx_lm.server is mostly intended to be used as a local HTTP endpoint. We don't currently have plans to expand beyond that.

1 reply

iansmathew Aug 14, 2025
Author

@awni Thanks for the detailed reply. Excited to see the future of mlx-lm progress!!!
I will create issues for the relevant tool_calling problems as and when I get the time.

On the topic of tool calling, I see that mlx pulls relevant jinja templates. I suppose this is enough for now.

I believe the unsloth team is getting ready to support MLX models. Would be wonderful to see what that collaboration looks like!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions on the Future of mlx-lm.server: Production Readiness, Feature Roadmap, and Competitive Positioning #371

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Questions on the Future of mlx-lm.server: Production Readiness, Feature Roadmap, and Competitive Positioning #371

Uh oh!

iansmathew Aug 9, 2025

1. Role as Inference Provider vs Fine-Tuning Utility

2. Feature Development: Context Length and Advanced Controls

3. Tool Calling Support

4. Production Readiness

Replies: 1 comment · 1 reply

Uh oh!

awni Aug 14, 2025 Maintainer

Uh oh!

iansmathew Aug 14, 2025 Author

iansmathew
Aug 9, 2025

Replies: 1 comment 1 reply

awni
Aug 14, 2025
Maintainer

iansmathew Aug 14, 2025
Author