Releases: ROCm/rocBLAS
Releases · ROCm/rocBLAS
rocBLAS 5.1.0 for ROCm 7.1.0
Added
- Sample for clients using OpenMP threads calling rocBLAS functions.
- gfx1103, gfx1150, and gfx1151 enabled.
Changed
- By default, the Tensile build is no longer based on
tensile_tag.txtbut uses the same commit from shared/tensile in the rocm-libraries repository. The rmake or install-toption can build from another local path with a different commit.
Optimized
- Improved the performance of Level 2 gemv transposed (
TransA != N) for the problem sizes wheremis small andnis large on gfx90a and gfx942.
rocBLAS 5.0.2 for ROCm 7.0.2
Added
- Enabled gfx1150 and gfx1151.
- The
ROCBLAS_USE_HIPBLASLT_BATCHEDvariable to independently control the batched hipblaslt backend. SetROCBLAS_USE_HIPBLASLT_BATCHED=0to disable batched GEMM use of the hipblaslt backend.
rocBLAS 4.4.1 for ROCm 6.4.4
rocBLAS code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.
rocblas 5.0.0 for ROCm 7.0.1
rocBLAS code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.
rocBLAS 5.0.0 for ROCm 7.0.0
Added
- gfx950 support
ROCBLAS_LAYER = 8internal API logging forgemmdebugging- Support for AOCL 5.0 gcc build as a client reference library
- Allow
PkgConfigfor client reference library fallback detection
Changed
CMAKE_CXX_COMPILERis now passed on during compilation for a Tensile build- Change default atomics mode from
allowedtonot allowed
Removed
- Support code for non-production gfx targets
rocblas_hgemm_kernel_name,rocblas_sgemm_kernel_name, androcblas_dgemm_kernel_nameAPI functions- Use of
warpSizeas a constexpr - Use of deprecated behavior of
hipPeekLastError rocblas_float8.handrocblas_hip_f8_impl.hfilesrocblas_gemm_ex3,rocblas_gemm_batched_ex3,rocblas_gemm_strided_batched_ex3API functions
Optimized
- Optimized
gemmby usinggemvkernels when applicable - Optimized
gemvfor smallmandnwith a large batch count on gfx942 - Improved the performance of Level 1
dotfor all precisions and variants whenN > 100000000on gfx942 - Improved the performance of Level 1
asumandnrm2for all precisions and variants on gfx942 - Improved the performance of Level 2
sger(single precision) on gfx942 - Improved the performance of Level 3
dgmmfor all precisions and variants on gfx942
Resolved issues
- Fixed environment variable path-based logging to append multiple handle output to the same file
- Support numerics when
trsmis running withrocblas_status_perf_degraded - Fixed the build dependency installation of
joblibon some operating systems - Return
rocblas_status_internal_errorwhenrocblas_[set,get]_ [matrix,vector]is called with a host pointer in place of a device pointer - Reduced the default verbosity level for internal GEMM backend information
- Updated from the deprecated rocm-cmake to ROCmCMakeBuildTools
- Corrected AlmaLinux gfortran package dependencies
Upcoming changes
- Deprecated the use of negative indices to indicate the default solution is being used for
gemm_exwithrocblas_gemm_algo_solution_index
rocBLAS 4.4.1 for ROCm 6.4.3
rocBLAS code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.
rocBLAS 4.4.1 for ROCm 6.4.2
Resolved issues
- Zero imaginary portion of diagonal of C matrix for cherk/zherk for gfx90a/gfx942 with problem sizes
k > 500
rocBLAS 4.4.0 for ROCm 6.4.1
rocBLAS code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.
rocBLAS 4.4.0 for ROCm 6.4.0
Added
- rocTX support in rocBLAS (not available on Windows or in the static library version on Linux)
- On gfx12, all functions now support full
rocblas_intdynamic range forbatch_count --ninjabuild option- Support for GPU_TARGETS cmake variable
Changed
- rocblas-test client removes the stress tests unless YAML-based testing or
gtest_filteradds them - rocblas clients OpenMP default threading is reduced to be less than the logical core count
gemm_extesting and timing reuses device memorygemm_extiming initializes matrices on device
Optimized
- Significantly reduced workspace memory requirements for Level 1 ILP64:
iamaxandiamin - Reduced workspace memory requirements for Level 1 ILP64:
dot,asum,nrm2 - Improved the performance of Level 2 gemv for the problem sizes (
TransA == N && m > 2*n) and (TransA == T) - Improved the performance of Level 3 syrk and herk for the problem size (
k > 500 && n < 4000)
Resolved issues
- gfx12:
ger,geam,geam_ex,dgmm,trmm,symm,hemm, ILP64gemm, and larger data support - Added a
gfortranpackage dependency for Azure Linux OS - Outdated SLES OS package dependencies (
cxxtoolsandjoblib) ininstall.sh -d - Code object stripping for RPM packages
Upcoming changes
- Deprecated the cmake variable
AMDGPU_TARGETS. UseGPU_TARGETSinstead.
rocBLAS 4.3.0 for ROCm 6.3.3
rocBLAS code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.