Questions on the Future of mlx-lm.server: Production Readiness, Feature Roadmap, and Competitive Positioning #371
Replies: 1 comment 1 reply
-
We intend to continue to focus on both!
We'll prioritize features on a feature-by-feature basis. If there is something you need, I encourage you to file an issue. Regarding context length you can set the maximum size by specifying
Would be great if you could file an issue with steps to repro. In general tool-calling is a fast moving target and model providers haven't really converged on a standard so it's not clear yet where we should support it in the stack. But I think we will at the very least aim to support the common cases when possible and hopefully more.
We'll continue to maintain |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello MLX Team,
I have a few questions regarding the current and future status of mlx-lm and its broader positioning in the self-hosted landscape. I would appreciate your insights on the following points:
1. Role as Inference Provider vs Fine-Tuning Utility
Is the future direction of mlx-lm focused solely on fine-tuning and training models, or is there an intention for it to become a robust inference provider—on par with solutions like llama.cpp and Ollama?
2. Feature Development: Context Length and Advanced Controls
Are there plans to add more advanced controls such as context length or session parameters on
mlx-lm.server
, as supported in llama.cpp? At present, it appears context length is not adjustable and is dynamically allocated. I have yet to face any issues with this, but I wonder if there is a possibility of running out of memory with long sessions.3. Tool Calling Support
How is support for tool_calling currently handled? I've attempted to use the qwen3-Coder-30B 4bit model with Cline, but unfortunately, it fails at successfully calling tools compared to similar GGUF models.
4. Production Readiness
Could you share how central
mlx-lm.server
is to the future of the MLX project? Are there concrete plans to further mature the server, aim for production readiness, and ensure continued compliance with the OpenAI API specification?Thank you for your dedication and for advancing the open-source AI community on Apple Silicon! ❤️
Beta Was this translation helpful? Give feedback.
All reactions