Self-contained Navi 3(gfx110x) and Strix Halo (gfx1151) Pytorch Wheels #655
Replies: 23 comments 130 replies
-
Nice! That link is currently broken - drop the |
Beta Was this translation helpful? Give feedback.
-
Hey Scott, what is the best way to get torchvision and torchaudio from your wheel? Also, thanks for this! |
Beta Was this translation helpful? Give feedback.
-
Thank you for your wonderful works! Can we use triton and sageattention on this whl? I tried it on my side, but failed… |
Beta Was this translation helpful? Give feedback.
-
This is for the second(?) post: https://github.com/scottt/rocm-TheRock/releases/tag/v6.5.0rc-pytorch-gfx110x First of all, thank you so much already. Second of all, sorry for the textwall, i just wanna drop some infos and thoughts, just in case it might be helpful: Using this for 3 days now, with a RX 7800 XT on windows 11, amd adrenaline driver 25.4.1, with python 3.12.10, in a comfyui install on the Stability Matrix program. The first day i got a few errors at the VAE encode and/or decode of latent upscales, one time i got about 5-6 amd driver-timeout errors quickly one after another, but after that it somehow magically just worked, not giving another error again. Either that error cascade... fixed something, or disabled something not needed, or maybe it just needed a PC restart which i didnt do at first, im not sure. But for the next 2 days so far it just worked now. I went from around 1.8it/s for a raw XL gen with zluda, to around 2.4it/s, 17sec to 13sec for 32 steps, which is already a great improvement. But much more important is how much faster upscale/latent upscale/hires fix, and face detailer are for me now. Its around 2.4s/it (not it/s like before, but still fast enough for this) for a full 2x upscale hires fix run now, compared to around 8s/it before. And that only if i was lucky, and it didnt bug out with zluda for me, which happened more and more often. Sometimes it would take around 10-20 TIMES as long for a simple hires fix, randomly, i so far could never find out what exactly was the reason, nothing i tried helped, and i couldnt find much about that issue specifically, only similar issues. Anyways, i already suspected that to be on zludas side, and it seems it very likely was... Because since then, i have not had this bug with rocm one single time now. After that first day, ive not had another error at all so far (except one other driver-timeout because i tried a much too big/wrong upscale model, that one was on me), and it now seems to just work fast and consistently. I just wanna make clear how much of an impact even this early version made for me: I was THIS close to get an overpriced nvidia card because i was so tired of all the problems ive had with amd on windows already, and zluda seemingly working less and less reliable for me (consistent over many re-installs, updates, downgrades, on different webuis, forks, etc.)... and then someone showed me this. And i just installed a fresh comfyui version, then pip installed 3 files. Now its faster, and the main problem with the hires fix bug seems to be gone. This alone likely saved me from throwing almost a grand at team green, just to have it work like now. Thank you all so much, i can only hope amd directly takes things like this more seriously in the future. I kinda wanna also try to get triton and sageattention working with this, but im too happy right now to finally be able to generate normally, i might just wait. Im also 100% clueless about coding and programming, so i would just bumble around till something might or might not work... better leave it to people who actually know what theyre doing. But let me know if you need something tested with the 7800 XT, i want to try and help the little i can. Also i will help make this a quickly accessible option in the Stability Matrix app, because this kinda just... seems to completely replace zluda-based forks of webuis for at least the gpus supported by this so far. (And by help i mean that i mentioned it to the devs already, and will test what they want me to test, since im one of the few amd gpu test guys there.) C: |
Beta Was this translation helpful? Give feedback.
-
Hello, thank you for the builds, unfortunately under linux I get
|
Beta Was this translation helpful? Give feedback.
-
@scottt thanks for your builds, but I'm on latest python version 3.13, can you build it for that version? |
Beta Was this translation helpful? Give feedback.
-
I'm trying to use the pytorch wheel for my gpu ( GFX 1201 , RX9070, Python 3.12 ) on windows. I'm getting the following error: OSError: [WinError 1114] A dynamic link library (DLL) initialization routine failed. Error loading "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\lib\c10.dll" or one of its dependencies. Now I can see that the DLL file is present in the location and I've also verified dependencies using dependency walker. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the great work on this! I have been using Fedora 42 with
|
Beta Was this translation helpful? Give feedback.
-
Thanks, got it working on a RX 7900 XTX on Windows comfyui running Flux Dev It works good when running it with --use-quad-cross-attention (2.39s/it for 1336x768 and 1GB lora) but when using --use-pytorch-cross-attention it's slow (37.40s/it). It's not a problem since quad cross attention works fine but is it normal for pytorch cross attention to be slower? Is there any way to make the --fast option work with it in comfyui? The code seems to modify pytorch in some way.
|
Beta Was this translation helpful? Give feedback.
-
Thanks! I realized that RNDA3 pytorch wheel release is almost ready…if possible, could you provide latest wheel? I would like to have Comfyui performance comparison… |
Beta Was this translation helpful? Give feedback.
-
I've just installed in Python 3.12 (with Miniconda) on a GMKtec EVO X2 and I saw some numpy incompatibility.
i get ` If you are a user of the module, the easiest solution will be to (Triggered internally at D:\src\torch\csrc\utils\tensor_numpy.cpp:81.) |
Beta Was this translation helpful? Give feedback.
-
Then I installed ComfyUI with requirements.txt.
Then I load a basic image generation model and I get
but in reality, pip freeze shows me NumPy is installed. I think this numpy incompatibility relates to the message when I see by just importing torch. |
Beta Was this translation helpful? Give feedback.
-
(p312) PS D:\work\python\torch> python .\test.py rocm6.5rc在哪里下载 |
Beta Was this translation helpful? Give feedback.
-
After compiling everywhere but failed to even run a basic torch test, you saved my day. Really appreciated! |
Beta Was this translation helpful? Give feedback.
-
What else needs to be installed on Windows besides torch and torchaudio wheels?
|
Beta Was this translation helpful? Give feedback.
-
hi, Guys, who can tell me where to download ROCM6.5 rc? Thank you. |
Beta Was this translation helpful? Give feedback.
-
@scottt @jammm could you help addressing that? |
Beta Was this translation helpful? Give feedback.
-
I'm trying to install stable diffusion automatic1111 or forge with these wheels. Problem I am encountering, automatic1111 and forge were made with python 3.10.6. Despite this text in the rock I tried running them on python 3.11.9 but didn't work. Any options for other SD forks? Like I said comfyui is way too complicated to use, I need simple interface |
Beta Was this translation helpful? Give feedback.
-
For anyone coming here looking for gfx1151 things, TheRock's release docs now have full ROCm and Torch Python packages: https://github.com/ROCm/TheRock/blob/main/RELEASES.md python -m pip install --index-url https://d2awnip2yjpvqn.cloudfront.net/v2/gfx1151/ rocm[libraries,devel] python -m pip install --index-url https://d2awnip2yjpvqn.cloudfront.net/v2/gfx1151/ torch torchaudio torchvision pytorch-triton-rocm numpy Until recently I was using everything from this thread with custom built extra torch stuff and it was working pretty well other than some stability issues when heavily loaded. Now I've migrated to the above and can confirm that a AMD RYZEN AI MAX+ 395 w/ Strix Halo Radeon 8060S works well and is now more stable than it was running various LLM's and other AI workloads, including image/video gen via Torch + ROCm. I've additionally got it running in a VM with PCIe pass through with Proxmox and its all working pretty smooth now. Just wanted to leave this here for anyone who comes after. Other releases on that page as of writing are: gfx94X-dcgpu, gfx950-dcgpu, gfx110X-dgpu, gfx120X-all |
Beta Was this translation helpful? Give feedback.
-
Tried to figure out why performance of the official builds are so bad... ![]() |
Beta Was this translation helpful? Give feedback.
-
Mine is at 64/64 as well. Sure will be nice when fixed. Even some of the
text to image blows up if upscaled to a higher resolution. It freaks out at
the end of the workflow. Walks all over display memory. But if I change the
upscale to be the same size as the input, it works fine.
…On Thu, Aug 14, 2025, 1:29 PM Ben Jamin ***@***.***> wrote:
@RSabbagh52 <https://github.com/RSabbagh52> I have the same box :) what
split have you got the ram/vram in the bios set to?
I've got mine running in 64gb/64gb as most of the time I cant get larger
than ram models to load into vram so having 96gb vram isnt that useful
(except for having several models cached when using Ollama serve or
similar). An exception to that was in LMStudio where I could reliably get
it to load 80gb of model into vram. The half and half split seems to work
well in ComfyUI doing text/image to video like your wanting though.
—
Reply to this email directly, view it on GitHub
<#655 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANE7KUEQXILVLAEWIZVUUBL3NTWSTAVCNFSM6AAAAAB5OD6YSCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIMJQHEYDGNY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hey guys, today,when I was using the rocm7 rc20250814 version, there was a core dumped when I started comfyui. Before this version, it could start normally... I am very sad……When will a stable version appear? After all, I have been buying AI Max 395 for many days now(ㄒoㄒ) |
Beta Was this translation helpful? Give feedback.
-
Hello, and thank you for your incredible work on developing and providing these PyTorch builds. I am trying to get ComfyUI running on a Windows PC with a Strix Point APU (Radeon 890M, gfx1150), but I'm encountering a persistent HIP error: invalid device function during image generation (KSampler execution) that I haven't been able to solve. I was hoping to get some advice. EnvironmentAPU: AMD Ryzen AI 9 HX 370 (Radeon 890M, gfx1150) OS: Windows 11 Python: 3.12.10 (in a venv) HIP SDK: 5.7.1 (with lshqqytiger's DLL patch applied) PyTorch Build: I have tried several native ROCm builds from this repository, including v6.5.0rc and v6.0.0. Problem DescriptionComfyUI starts up and loads models correctly. However, when the KSampler begins the image generation process, it always fails with the following error: What I've TriedI have tried nearly every possible workaround to resolve this error: Environment Variable: Set HSA_OVERRIDE_GFX_VERSION=11.0.0. Command Line Arguments: --use-pytorch-cross-attention (to switch the attention implementation) --force-fp32 (to force 32-bit precision) --disable-ipex-optimize (to disable Intel optimizations) ComfyUI Settings: Tried multiple samplers and schedulers (e.g., euler, dpmpp_2m, karras). Library Dependencies: Resolved all version conflicts with NumPy and OpenCV. PyTorch Builds: Tested multiple .whl versions from this repository, but the result was identical. QuestionSince the error persists after all these countermeasures, I suspect there might still be a fundamental incompatibility between the current PyTorch builds and the gfx1150 architecture. Is there anything else I could try, or any point I might have missed? Any help or insight would be greatly appreciated. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Download the gfx1151 from https://github.com/scottt/rocm-TheRock/releases/v6.5.0rc-pytorch
Download the Windows gfx110x and gfx1201 wheel from https://github.com/scottt/rocm-TheRock/releases/tag/v6.5.0rc-pytorch-gfx110x
Features and Known Problems
torch.nn.functional.scaled_dot_product_attention
backed byaotriton
0.9.2Beta Was this translation helpful? Give feedback.
All reactions