Releases
rocm-6.4.0
rocBLAS 4.4.0 for ROCm 6.4.0
Compare
Sorry, something went wrong.
No results found
Added
rocTX support in rocBLAS (not available on Windows or in the static library version on Linux)
On gfx12, all functions now support full rocblas_int dynamic range for batch_count
--ninja build option
Support for GPU_TARGETS cmake variable
Changed
rocblas-test client removes the stress tests unless YAML-based testing or gtest_filter adds them
rocblas clients OpenMP default threading is reduced to be less than the logical core count
gemm_ex testing and timing reuses device memory
gemm_ex timing initializes matrices on device
Optimized
Significantly reduced workspace memory requirements for Level 1 ILP64: iamax and iamin
Reduced workspace memory requirements for Level 1 ILP64: dot, asum, nrm2
Improved the performance of Level 2 gemv for the problem sizes (TransA == N && m > 2*n) and (TransA == T)
Improved the performance of Level 3 syrk and herk for the problem size (k > 500 && n < 4000)
Resolved issues
gfx12: ger, geam, geam_ex, dgmm, trmm, symm, hemm, ILP64 gemm, and larger data support
Added a gfortran package dependency for Azure Linux OS
Outdated SLES OS package dependencies (cxxtools and joblib) in install.sh -d
Code object stripping for RPM packages
Upcoming changes
Deprecated the cmake variable AMDGPU_TARGETS. Use GPU_TARGETS instead.
You can’t perform that action at this time.