You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As of May 19th 2025, we are halting active development on torchchat.
The original intent of torchchat was to both demonstrate how to run LLM inference using PyTorch and improve the performance and functionality of the entire PyTorch ecosystem.
Since torchchat’s launch, we’ve seen vLLM become the dominant player for server-side LLM inference. We’re ecstatic to have vLLM join the PyTorch Ecosystem and recommend folks use them for hosting LLMs in server production environments. Given the growth of vLLM and others, we do not see the need to maintain an active demonstration of how to run LLM inference using PyTorch.
We are very proud of the performance and functionality improvements we saw in the PyTorch ecosystem over the last year, including:
The performance of LLM inference increase by multiples for every device we support (CUDA, CPU, MPS, ARM, etc)
Working code, demonstrating how to run LLM inference for all the major execution modes (Eager, Compile, AOTI and ET) giving users a starting point for using PyTorch for LLM inference from server to embedded devices and everything in between
Quantization expand to support the most popular schemes and bit sizes
There’s still plenty of exciting work to do across the LLM Inference space and PyTorch will stay invested in improving things.
We appreciate and thank everyone in the community for all that you’ve contributed.
As of May 19th 2025, we are halting active development on torchchat.
The original intent of torchchat was to both demonstrate how to run LLM inference using PyTorch and improve the performance and functionality of the entire PyTorch ecosystem.
Since torchchat’s launch, we’ve seen vLLM become the dominant player for server-side LLM inference. We’re ecstatic to have vLLM join the PyTorch Ecosystem and recommend folks use them for hosting LLMs in server production environments. Given the growth of vLLM and others, we do not see the need to maintain an active demonstration of how to run LLM inference using PyTorch.
We are very proud of the performance and functionality improvements we saw in the PyTorch ecosystem over the last year, including:
There’s still plenty of exciting work to do across the LLM Inference space and PyTorch will stay invested in improving things.
We appreciate and thank everyone in the community for all that you’ve contributed.
Thanks to our contributors:
@mikekgfb @Jack-Khuu @metascroy @malfet @larryliu0820 @kirklandsign @swolchok @vmpuri @kwen2501 @Gasoonjia @orionr @guangy10 @byjlw @lessw2020 @mergennachin @GregoryComer @shoumikhin @kimishpatel @manuelcandales @lucylq @desertfire @gabe-l-hart @seemethere @iseeyuan @jerryzh168 @leseb @yanbing-j @mreso @fduwjj @Olivia-liu @angelayi @JacobSzwejbka @ali-khosh @nlpfollower @songhappy @HDCharles @jenniew @silverguo @zhenyan-zhang-meta @ianbarber @dbort @kit1980 @mcr229 @georgehong @krammnic @xuedinge233 @anirudhs001 @shreyashah1903 @soumith @TheBetterSolution @codereba @jackzhxng @KPCOFGS @kuizhiqing @kartikayk @nobelchowdary @mike94043 @vladoovtcharov @prideout @sanchitintel @cbilgin @jeffdaily @infil00p @msaroufim @zhxchen17 @vmoens @wjunLu
-PyTorch Team
The text was updated successfully, but these errors were encountered: