Are you planning to support this feature? I'm wanna use FastGen in my app but it's not currently support RESTful API asynchronously vLLM support it's very well: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/api_server.py I've also use Ray to deploy a server use MIIPipeline with dynamic batching but the performance is far behind vLLM default settings.