Releases: ROCm/rocSPARSE
Releases · ROCm/rocSPARSE
rocSPARSE 4.1.0 for ROCm 7.1.0
Added
- Added brain half float mixed precision to
rocsparse_axpbywhere X and Y use bfloat16 and result and the compute type use float. - Added brain half float mixed precision to
rocsparse_spvvwhere X and Y use bfloat16 and result and the compute type use float. - Added brain half float mixed precision to
rocsparse_spmvwhere A and X use bfloat16 and Y and the compute type use float. - Added brain half float mixed precision to
rocsparse_spmmwhere A and B use bfloat16 and C and the compute type use float. - Added brain half float mixed precision to
rocsparse_sddmmwhere A and B use bfloat16 and C and the compute type use float. - Added brain half float mixed precision to
rocsparse_sddmmwhere A and B and C use bfloat16 and the compute type use float. - Added half float mixed precision to
rocsparse_sddmmwhere A and B and C use float16 and the compute type use float. - Added brain half float uniform precision to
rocsparse_scatterandrocsparse_gatherroutines.
Optimized
- Improved the user documentation.
Upcoming changes
- Deprecate trace, debug, and bench logging using environment variable
ROCSPARSE_LAYER.
rocSPARSE 4.0.3 for ROCm 7.0.2
Resolved issues
- Resolved an issue causing premature deallocation of internal buffers still in use.
rocsparse 4.0.2 for ROCm 7.0.1
rocSPARSE code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.
rocSPARSE 4.0.2 for ROCm 7.0.0
Added
- Adds
SpGEAMgeneric routine for computing sparse matrix addition in CSR format - Adds
v2_SpMVgeneric routine for computing sparse matrix vector multiplication. As opposed to the deprecatedrocsparse_spmvroutine, this routine does not use a fallback algorithm if a non-implemented configuration is encountered and will return an error in such a case. For the deprecated routinerocsparse_spmv, the user can enable warning messages in situations where a fallback algorithm is used by either calling upfront the routinerocsparse_enable_debugor exporting the variableROCSPARSE_DEBUG(with the shell commandexport ROCSPARSE_DEBUG=1). - Adds half float mixed precision to
rocsparse_axpbywhere X and Y use float16 and result and the compute type use float - Adds half float mixed precision to
rocsparse_spvvwhere X and Y use float16 and result and the compute type use float - Adds half float mixed precision to
rocsparse_spmvwhere A and X use float16 and Y and the compute type use float - Adds half float mixed precision to
rocsparse_spmmwhere A and B use float16 and C and the compute type use float - Adds half float mixed precision to
rocsparse_sddmmwhere A and B use float16 and C and the compute type use float - Adds half float uniform precision to
rocsparse_scatterandrocsparse_gatherroutines - Adds half float uniform precision to
rocsparse_sddmmroutine - Added
rocsparse_spmv_alg_csr_rowsplitalgorithm. - Added support for gfx950
- Add ROC-TX instrumentation support in rocSPARSE (not available on Windows or in the static library version on Linux).
- Added the
almalinuxOS name to correct the gfortran dependency
Changed
- Switch to defaulting to C++17 when building rocSPARSE from source. Previously rocSPARSE was using C++14 by default.
Optimized
- Reduced the number of template instantiations in the library to further reduce the shared library binary size and improve compile times
- Allow SpGEMM routines to use more shared memory when available. This can speed up performance for matrices with a large number of intermediate products.
- Use of the
rocsparse_spmv_alg_csr_adaptiveorrocsparse_spmv_alg_csr_defaultalgorithms inrocsparse_spmvto perform transposed sparse matrix multiplication (C=alpha*A^T*x+beta*y) resulted in unnecessary analysis on A and needless slowdown during the analysis phase. This has been fixed by skipping the analysis when performing the transposed sparse matrix multiplication. - Improved the user documentation
Resolved issues
- Fixed an issue in the public headers where
extern "C"was not wrapped by#ifdef __cplusplus, which caused failures when building C programs with rocSPARSE. - Fixed a memory access fault in the
rocsparse_Xbsrilu0routines. - Fixed failures that could occur in
rocsparse_Xbsrsm_solveorrocsparse_spsmwith BSR format when using host pointer mode. - Fixed ASAN compilation failures
- Fixed failure that occurred when using const descriptor
rocsparse_create_const_csr_descrwith the generic routinerocsparse_sparse_to_sparse. Issue was not observed when using non-const descriptorrocsparse_create_csr_descrwithrocsparse_sparse_to_sparse. - Fixed a memory leak in the rocsparse handle
Removed
- The deprecated
rocsparse_spmv_exroutine - The deprecated
rocsparse_sbsrmv_ex,rocsparse_dbsrmv_ex,rocsparse_cbsrmv_ex, androcsparse_zbsrmv_exroutines - The deprecated
rocsparse_sbsrmv_ex_analysis,rocsparse_dbsrmv_ex_analysis,rocsparse_cbsrmv_ex_analysis, androcsparse_zbsrmv_ex_analysisroutines
Upcoming changes
- Deprecated the
rocsparse_spmvroutine. Users should use therocsparse_v2_spmvroutine going forward. - Deprecated
rocsparse_spmv_alg_csr_streamalgorithm. Users should use therocsparse_spmv_alg_csr_rowsplitalgorithm going forward. - Deprecated the
rocsparse_itilu0_alg_sync_split_fusionalgorithm. Users should use one ofrocsparse_itilu0_alg_async_inplace,rocsparse_itilu0_alg_async_split, orrocsparse_itilu0_alg_sync_splitgoing forward.
rocSPARSE 3.4.0 for ROCm 6.4.4
rocSPARSE code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.
rocSPARSE 3.4.0 for ROCm 6.4.3
rocSPARSE code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.
rocSPARSE 3.4.0 for ROCm 6.4.2
rocSPARSE code for ROCm 6.4.2 did not change. The library was rebuilt for the updated ROCm 6.4.2 stack.
rocSPARSE 3.4.0 for ROCm 6.4.1
rocSPARSE code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.
rocSPARSE 3.4.0 for ROCm 6.4.0
Added
- Added support for
rocsparse_matrix_type_triangularinrocsparse_spsv - Added test filters
smoke,regression, andextendedfor emulation tests. - Added
rocsparse_[s|d|c|z]csritilu0_compute_exroutines for iterative ILU - Added
rocsparse_[s|d|c|z]csritsv_solve_exroutines for iterative triangular solve - Added
GPU_TARGETSto replace the now deprecatedAMDGPU_TARGETSin cmake files - Added BSR format to the SpMM generic routine
rocsparse_spmm
Changed
- By default, build rocsparse shared library using
--offload-compresscompiler option which compresses the fat binary. This significantly reduces the shared library binary size.
Optimized
- Improved the performance of
rocsparse_spmmwhen used with row order forBandCdense matrices and the row split algorithm,rocsparse_spmm_alg_csr_row_split. - Improved the adaptive CSR sparse matrix-vector multiplication algorithm when the sparse matrix has many empty rows at the beginning or at the end of the matrix. This improves the routines
rocsparse_spmvandrocsparse_spmv_exwhen the adaptive algorithmrocsparse_spmv_alg_csr_adaptiveis used. - Improved stream CSR sparse matrix-vector multiplication algorithm when the sparse matrix size (number of rows) decreases. This improves the routines
rocsparse_spmvandrocsparse_spmv_exwhen the stream algorithmrocsparse_spmv_alg_csr_streamis used. - Compared to
rocsparse_[s|d|c|z]csritilu0_compute, the routinesrocsparse_[s|d|c|z]csritilu0_compute_exintroduce a number of free iterations. A free iteration is an iteration that does not compute the evaluation of the stopping criteria, if enabled. This allows the user to tune the algorithm for performance improvements. - Compared to
rocsparse_[s|d|c|z]csritsv_solve, the routinesrocsparse_[s|d|c|z]csritsv_solve_exintroduce a number of free iterations. A free iteration is an iteration that does not compute the evaluation of the stopping criteria. This allows the user to tune the algorithm for performance improvements. - Improved user documentation
Resolved issues
- Fixed an issue in
rocsparse_spgemm,rocsparse_[s|d|c|z]csrgemm, androcsparse_[s|d|c|z]bsrgemmwhere incorrect results could be produced when rocSPARSE was built with optimization levelO0. This was caused by a bug in the hash tables that could allow keys to be inserted twice. - Fixed an issue in the routine
rocsparse_spgemmwhen usingrocsparse_spgemm_stage_symbolicandrocsparse_spgemm_stage_numeric, where the routine would crash whenalphaandbetawere passed as host pointers and wherebeta != 0. - Fixed an issue in
rocsparse_bsrilu0where the algorithm was running out of bounds of thebsr_valarray.
Upcoming changes
- Deprecated
rocsparse_[s|d|c|z]csritilu0_computeroutines. Users should use the newly addedrocsparse_[s|d|c|z]csritilu0_compute_exroutines going forward. - Deprecated
rocsparse_[s|d|c|z]csritsv_solveroutines. Users should use the newly addedrocsparse_[s|d|c|z]csritsv_solve_exroutines going forward. - Deprecated
AMDGPU_TARGETSusing in cmake files. Users should useGPU_TARGETSgoing forward.
rocSPARSE 3.3.0 for ROCm 6.3.3
rocSPARSE code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.