Skip to content

Releases: ROCm/rocSPARSE

rocSPARSE 4.1.0 for ROCm 7.1.0

30 Oct 05:52

Choose a tag to compare

Added

  • Added brain half float mixed precision to rocsparse_axpby where X and Y use bfloat16 and result and the compute type use float.
  • Added brain half float mixed precision to rocsparse_spvv where X and Y use bfloat16 and result and the compute type use float.
  • Added brain half float mixed precision to rocsparse_spmv where A and X use bfloat16 and Y and the compute type use float.
  • Added brain half float mixed precision to rocsparse_spmm where A and B use bfloat16 and C and the compute type use float.
  • Added brain half float mixed precision to rocsparse_sddmm where A and B use bfloat16 and C and the compute type use float.
  • Added brain half float mixed precision to rocsparse_sddmm where A and B and C use bfloat16 and the compute type use float.
  • Added half float mixed precision to rocsparse_sddmm where A and B and C use float16 and the compute type use float.
  • Added brain half float uniform precision to rocsparse_scatter and rocsparse_gather routines.

Optimized

  • Improved the user documentation.

Upcoming changes

  • Deprecate trace, debug, and bench logging using environment variable ROCSPARSE_LAYER.

rocSPARSE 4.0.3 for ROCm 7.0.2

10 Oct 12:12

Choose a tag to compare

Resolved issues

  • Resolved an issue causing premature deallocation of internal buffers still in use.

rocsparse 4.0.2 for ROCm 7.0.1

17 Sep 16:37

Choose a tag to compare

rocSPARSE code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.

rocSPARSE 4.0.2 for ROCm 7.0.0

16 Sep 06:32

Choose a tag to compare

Added

  • Adds SpGEAM generic routine for computing sparse matrix addition in CSR format
  • Adds v2_SpMV generic routine for computing sparse matrix vector multiplication. As opposed to the deprecated rocsparse_spmv routine, this routine does not use a fallback algorithm if a non-implemented configuration is encountered and will return an error in such a case. For the deprecated routine rocsparse_spmv, the user can enable warning messages in situations where a fallback algorithm is used by either calling upfront the routine rocsparse_enable_debug or exporting the variable ROCSPARSE_DEBUG (with the shell command export ROCSPARSE_DEBUG=1).
  • Adds half float mixed precision to rocsparse_axpby where X and Y use float16 and result and the compute type use float
  • Adds half float mixed precision to rocsparse_spvv where X and Y use float16 and result and the compute type use float
  • Adds half float mixed precision to rocsparse_spmv where A and X use float16 and Y and the compute type use float
  • Adds half float mixed precision to rocsparse_spmm where A and B use float16 and C and the compute type use float
  • Adds half float mixed precision to rocsparse_sddmm where A and B use float16 and C and the compute type use float
  • Adds half float uniform precision to rocsparse_scatter and rocsparse_gather routines
  • Adds half float uniform precision to rocsparse_sddmm routine
  • Added rocsparse_spmv_alg_csr_rowsplit algorithm.
  • Added support for gfx950
  • Add ROC-TX instrumentation support in rocSPARSE (not available on Windows or in the static library version on Linux).
  • Added the almalinux OS name to correct the gfortran dependency

Changed

  • Switch to defaulting to C++17 when building rocSPARSE from source. Previously rocSPARSE was using C++14 by default.

Optimized

  • Reduced the number of template instantiations in the library to further reduce the shared library binary size and improve compile times
  • Allow SpGEMM routines to use more shared memory when available. This can speed up performance for matrices with a large number of intermediate products.
  • Use of the rocsparse_spmv_alg_csr_adaptive or rocsparse_spmv_alg_csr_default algorithms in rocsparse_spmv to perform transposed sparse matrix multiplication (C=alpha*A^T*x+beta*y) resulted in unnecessary analysis on A and needless slowdown during the analysis phase. This has been fixed by skipping the analysis when performing the transposed sparse matrix multiplication.
  • Improved the user documentation

Resolved issues

  • Fixed an issue in the public headers where extern "C" was not wrapped by #ifdef __cplusplus, which caused failures when building C programs with rocSPARSE.
  • Fixed a memory access fault in the rocsparse_Xbsrilu0 routines.
  • Fixed failures that could occur in rocsparse_Xbsrsm_solve or rocsparse_spsm with BSR format when using host pointer mode.
  • Fixed ASAN compilation failures
  • Fixed failure that occurred when using const descriptor rocsparse_create_const_csr_descr with the generic routine rocsparse_sparse_to_sparse. Issue was not observed when using non-const descriptor rocsparse_create_csr_descr with rocsparse_sparse_to_sparse.
  • Fixed a memory leak in the rocsparse handle

Removed

  • The deprecated rocsparse_spmv_ex routine
  • The deprecated rocsparse_sbsrmv_ex, rocsparse_dbsrmv_ex, rocsparse_cbsrmv_ex, and rocsparse_zbsrmv_ex routines
  • The deprecated rocsparse_sbsrmv_ex_analysis, rocsparse_dbsrmv_ex_analysis, rocsparse_cbsrmv_ex_analysis, and rocsparse_zbsrmv_ex_analysis routines

Upcoming changes

  • Deprecated the rocsparse_spmv routine. Users should use the rocsparse_v2_spmv routine going forward.
  • Deprecated rocsparse_spmv_alg_csr_stream algorithm. Users should use the rocsparse_spmv_alg_csr_rowsplit algorithm going forward.
  • Deprecated the rocsparse_itilu0_alg_sync_split_fusion algorithm. Users should use one of rocsparse_itilu0_alg_async_inplace, rocsparse_itilu0_alg_async_split, or rocsparse_itilu0_alg_sync_split going forward.

rocSPARSE 3.4.0 for ROCm 6.4.4

24 Sep 14:02
8fbfc79

Choose a tag to compare

rocSPARSE code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.

rocSPARSE 3.4.0 for ROCm 6.4.3

07 Aug 14:20
8fbfc79

Choose a tag to compare

rocSPARSE code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.

rocSPARSE 3.4.0 for ROCm 6.4.2

21 Jul 16:54
8fbfc79

Choose a tag to compare

rocSPARSE code for ROCm 6.4.2 did not change. The library was rebuilt for the updated ROCm 6.4.2 stack.

rocSPARSE 3.4.0 for ROCm 6.4.1

20 May 13:16
4953add

Choose a tag to compare

rocSPARSE code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.

rocSPARSE 3.4.0 for ROCm 6.4.0

11 Apr 13:35
4953add

Choose a tag to compare

Added

  • Added support for rocsparse_matrix_type_triangular in rocsparse_spsv
  • Added test filters smoke, regression, and extended for emulation tests.
  • Added rocsparse_[s|d|c|z]csritilu0_compute_ex routines for iterative ILU
  • Added rocsparse_[s|d|c|z]csritsv_solve_ex routines for iterative triangular solve
  • Added GPU_TARGETS to replace the now deprecated AMDGPU_TARGETS in cmake files
  • Added BSR format to the SpMM generic routine rocsparse_spmm

Changed

  • By default, build rocsparse shared library using --offload-compress compiler option which compresses the fat binary. This significantly reduces the shared library binary size.

Optimized

  • Improved the performance of rocsparse_spmm when used with row order for B and C dense matrices and the row split algorithm, rocsparse_spmm_alg_csr_row_split.
  • Improved the adaptive CSR sparse matrix-vector multiplication algorithm when the sparse matrix has many empty rows at the beginning or at the end of the matrix. This improves the routines rocsparse_spmv and rocsparse_spmv_ex when the adaptive algorithm rocsparse_spmv_alg_csr_adaptive is used.
  • Improved stream CSR sparse matrix-vector multiplication algorithm when the sparse matrix size (number of rows) decreases. This improves the routines rocsparse_spmv and rocsparse_spmv_ex when the stream algorithm rocsparse_spmv_alg_csr_stream is used.
  • Compared to rocsparse_[s|d|c|z]csritilu0_compute, the routines rocsparse_[s|d|c|z]csritilu0_compute_ex introduce a number of free iterations. A free iteration is an iteration that does not compute the evaluation of the stopping criteria, if enabled. This allows the user to tune the algorithm for performance improvements.
  • Compared to rocsparse_[s|d|c|z]csritsv_solve, the routines rocsparse_[s|d|c|z]csritsv_solve_ex introduce a number of free iterations. A free iteration is an iteration that does not compute the evaluation of the stopping criteria. This allows the user to tune the algorithm for performance improvements.
  • Improved user documentation

Resolved issues

  • Fixed an issue in rocsparse_spgemm, rocsparse_[s|d|c|z]csrgemm, and rocsparse_[s|d|c|z]bsrgemm where incorrect results could be produced when rocSPARSE was built with optimization level O0. This was caused by a bug in the hash tables that could allow keys to be inserted twice.
  • Fixed an issue in the routine rocsparse_spgemm when using rocsparse_spgemm_stage_symbolic and rocsparse_spgemm_stage_numeric, where the routine would crash when alpha and beta were passed as host pointers and where beta != 0.
  • Fixed an issue in rocsparse_bsrilu0 where the algorithm was running out of bounds of the bsr_val array.

Upcoming changes

  • Deprecated rocsparse_[s|d|c|z]csritilu0_compute routines. Users should use the newly added rocsparse_[s|d|c|z]csritilu0_compute_ex routines going forward.
  • Deprecated rocsparse_[s|d|c|z]csritsv_solve routines. Users should use the newly added rocsparse_[s|d|c|z]csritsv_solve_ex routines going forward.
  • Deprecated AMDGPU_TARGETS using in cmake files. Users should use GPU_TARGETS going forward.

rocSPARSE 3.3.0 for ROCm 6.3.3

19 Feb 17:47
9f64dd5

Choose a tag to compare

rocSPARSE code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.