Releases · ROCm/rocBLAS

30 Oct 05:52

rocm-ci

rocm-7.1.0

e36a556

rocBLAS 5.1.0 for ROCm 7.1.0 Latest

Latest

Added

Sample for clients using OpenMP threads calling rocBLAS functions.
gfx1103, gfx1150, and gfx1151 enabled.

Changed

By default, the Tensile build is no longer based on tensile_tag.txt but uses the same commit from shared/tensile in the rocm-libraries repository. The rmake or install -t option can build from another local path with a different commit.

Optimized

Improved the performance of Level 2 gemv transposed (TransA != N) for the problem sizes where m is small and n is large on gfx90a and gfx942.

Assets 2

10 Oct 12:12

rocm-ci

rocm-7.0.2

ac6c54b

rocBLAS 5.0.2 for ROCm 7.0.2

Added

Enabled gfx1150 and gfx1151.
The ROCBLAS_USE_HIPBLASLT_BATCHED variable to independently control the batched hipblaslt backend. Set ROCBLAS_USE_HIPBLASLT_BATCHED=0 to disable batched GEMM use of the hipblaslt backend.

Assets 2

24 Sep 14:02

rocm-ci

rocm-6.4.4

5566dab

rocBLAS 4.4.1 for ROCm 6.4.4

rocBLAS code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.

Assets 2

17 Sep 16:37

rocm-ci

rocm-7.0.1

c4ee96b

rocblas 5.0.0 for ROCm 7.0.1

rocBLAS code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.

Assets 2

16 Sep 06:32

rocm-ci

rocm-7.0.0

c4ee96b

rocBLAS 5.0.0 for ROCm 7.0.0

Added

gfx950 support
ROCBLAS_LAYER = 8 internal API logging for gemm debugging
Support for AOCL 5.0 gcc build as a client reference library
Allow PkgConfig for client reference library fallback detection

Changed

CMAKE_CXX_COMPILER is now passed on during compilation for a Tensile build
Change default atomics mode from allowed to not allowed

Removed

Support code for non-production gfx targets
rocblas_hgemm_kernel_name, rocblas_sgemm_kernel_name, and rocblas_dgemm_kernel_name API functions
Use of warpSize as a constexpr
Use of deprecated behavior of hipPeekLastError
rocblas_float8.h and rocblas_hip_f8_impl.h files
rocblas_gemm_ex3, rocblas_gemm_batched_ex3, rocblas_gemm_strided_batched_ex3 API functions

Optimized

Optimized gemm by using gemv kernels when applicable
Optimized gemv for small m and n with a large batch count on gfx942
Improved the performance of Level 1 dot for all precisions and variants when N > 100000000 on gfx942
Improved the performance of Level 1 asum and nrm2 for all precisions and variants on gfx942
Improved the performance of Level 2 sger (single precision) on gfx942
Improved the performance of Level 3 dgmm for all precisions and variants on gfx942

Resolved issues

Fixed environment variable path-based logging to append multiple handle output to the same file
Support numerics when trsm is running with rocblas_status_perf_degraded
Fixed the build dependency installation of joblib on some operating systems
Return rocblas_status_internal_error when rocblas_[set,get]_ [matrix,vector] is called with a host pointer in place of a device pointer
Reduced the default verbosity level for internal GEMM backend information
Updated from the deprecated rocm-cmake to ROCmCMakeBuildTools
Corrected AlmaLinux gfortran package dependencies

Upcoming changes

Deprecated the use of negative indices to indicate the default solution is being used for gemm_ex with rocblas_gemm_algo_solution_index

Assets 2

07 Aug 14:20

rocm-ci

rocm-6.4.3

f08d23e

rocBLAS 4.4.1 for ROCm 6.4.3

rocBLAS code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.

Assets 2

21 Jul 16:54

rocm-ci

rocm-6.4.2

f08d23e

rocBLAS 4.4.1 for ROCm 6.4.2

Resolved issues

Zero imaginary portion of diagonal of C matrix for cherk/zherk for gfx90a/gfx942 with problem sizes k > 500

Assets 2

20 May 13:16

rocm-ci

rocm-6.4.1

80e5394

rocBLAS 4.4.0 for ROCm 6.4.1

rocBLAS code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.

Assets 2

11 Apr 13:35

rocm-ci

rocm-6.4.0

80e5394

rocBLAS 4.4.0 for ROCm 6.4.0

Added

rocTX support in rocBLAS (not available on Windows or in the static library version on Linux)
On gfx12, all functions now support full rocblas_int dynamic range for batch_count
--ninja build option
Support for GPU_TARGETS cmake variable

Changed

rocblas-test client removes the stress tests unless YAML-based testing or gtest_filter adds them
rocblas clients OpenMP default threading is reduced to be less than the logical core count
gemm_ex testing and timing reuses device memory
gemm_ex timing initializes matrices on device

Optimized

Significantly reduced workspace memory requirements for Level 1 ILP64: iamax and iamin
Reduced workspace memory requirements for Level 1 ILP64: dot, asum, nrm2
Improved the performance of Level 2 gemv for the problem sizes (TransA == N && m > 2*n) and (TransA == T)
Improved the performance of Level 3 syrk and herk for the problem size (k > 500 && n < 4000)

Resolved issues

gfx12: ger, geam, geam_ex, dgmm, trmm, symm, hemm, ILP64 gemm, and larger data support
Added a gfortran package dependency for Azure Linux OS
Outdated SLES OS package dependencies (cxxtools and joblib) in install.sh -d
Code object stripping for RPM packages

Upcoming changes

Deprecated the cmake variable AMDGPU_TARGETS. Use GPU_TARGETS instead.

Assets 2

19 Feb 17:47

rocm-ci

rocm-6.3.3

8ebd6c1

rocBLAS 4.3.0 for ROCm 6.3.3

rocBLAS code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.

Assets 2

Releases: ROCm/rocBLAS

rocBLAS 5.1.0 for ROCm 7.1.0

Added

Changed

Optimized

Uh oh!

rocBLAS 5.0.2 for ROCm 7.0.2

Added

Uh oh!

rocBLAS 4.4.1 for ROCm 6.4.4

Uh oh!

rocblas 5.0.0 for ROCm 7.0.1

Uh oh!

rocBLAS 5.0.0 for ROCm 7.0.0

Added

Changed

Removed

Optimized

Resolved issues

Upcoming changes

Uh oh!

rocBLAS 4.4.1 for ROCm 6.4.3

Uh oh!

rocBLAS 4.4.1 for ROCm 6.4.2

Resolved issues

Uh oh!

rocBLAS 4.4.0 for ROCm 6.4.1

Uh oh!

rocBLAS 4.4.0 for ROCm 6.4.0

Added

Changed

Optimized

Resolved issues

Upcoming changes

Uh oh!

rocBLAS 4.3.0 for ROCm 6.3.3

Uh oh!