Skip to content

Release 1.10.0

Latest
Compare
Choose a tag to compare
@MarcelKoch MarcelKoch released this 13 Jun 08:57
d4e0e9f

The Ginkgo team is proud to announce the new Ginkgo minor release 1.10.0.
This release brings new features such as:

  • Support for bfloat16 precision. The type gko::bfloat16 can now be selected in most instances as the value type
    of a matrix, solver, preconditioner, etc. If the selected backend supports bfloat16 as a native type, the native type
    is used within the kernels, otherwise they may incur a conversion overhead. The new behavior is enabled by default, but it can be
    turned off during CMake configuration.
  • Mixed precision support in our distributed matrix, provided the underlying matrix formats support mixed precision.
  • New pipelined CG solver. This specialization of the CG solver is suitable to reduce the communication overhead in
    large scale distributed computations.
  • New Chebyshev iteration solver.
  • An OpenMP implementation of the merge-path based SpMV algorithm.

And more!

If you face an issue, please first check our known issues page and the open issues list and if you do not
find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

  • For all platforms, CMake 3.16+
  • C++17 compliant compiler
  • Linux and macOS
    • GCC: 7.0+
    • clang: 5.0+
    • Intel compiler: 2019+
    • Apple Clang: 15.0 is tested. Earlier versions might also work.
    • NVHPC: 22.7+
    • Cray Compiler: 14.0.1+
    • CUDA module: CMake 3.18+, and CUDA 11.0+ or NVHPC 22.7+, Compute Capability 5.3+
    • HIP module: CMake 3.21+, and ROCm 4.5+
    • DPC++ module: Intel oneAPI 2023.1+ with oneMKL and oneDPL. Set the CXX compiler to dpcpp or icpx.
    • MPI: standard version 3.1+, ideally GPU Aware, for best performance
  • Windows
    • MinGW: GCC 7.0+
    • Microsoft Visual Studio: VS 2019+
    • CUDA module: CUDA 11.0+, Microsoft Visual Studio
    • OpenMP module: MinGW.

Behavior changes

  • A cmake format style has been added to uniformize formatting for CMake files. #1755
  • The file config for preconditioner Ic and Ilu now only takes value_type, not l_solver_type or u_solver_type parameters #1811, #1828
  • The distributed matrix now uses collective neighborhood communication if possible #1589

Deprecations

  • The experimental::EnableDistributedLinOp mixin has been removed, EnableLinOp can be used instead #1751.

Summary of previous deprecations

  • The Executor::run overload without a name as the first parameter has been deprecated #1667
  • The device_reset parameter of CUDA and HIP executors no longer has an effect, and its allocation_mode parameters have been deprecated in favor of the Allocator interface.
  • The CMake parameter GINKGO_BUILD_DPCPP has been deprecated in favor of GINKGO_BUILD_SYCL.
  • The gko::reorder::Rcm interface has been deprecated in favor of gko::experimental::reorder::Rcm based on Permutation.
  • The Permutation class' permute_mask functionality.
  • Multiple functions with typos (set_complex_subpsace(), range functions such as conj_operaton etc).
  • gko::lend() is not necessary anymore.
  • The classes RelativeResidualNorm and AbsoluteResidualNorm are deprecated in favor of ResidualNorm.
  • The class AmgxPgm is deprecated in favor of Pgm.
  • Default constructors for the CSR load_balance and automatical strategies
  • The PolymorphicObject's move-semantic copy_from variant
  • The templated SolverBase class.
  • The class MachineTopology is deprecated in favor of machine_topology.
  • Logger constructors and create functions with the executor parameter.
  • The virtual, protected, Dense functions compute_norm1_impl, add_scaled_impl, etc.
  • Logger events for solvers and criteria without the additional implicit_tau_sq parameter.
  • The global gko::solver::default_krylov_dim, use instead gko::solver::gmres_default_krylov_dim.
  • array::get_num_elems() has been renamed to get_size()
  • matrix_data::ensure_row_major_order() has been renamed to sort_row_major()
  • device_matrix_data::get_num_elems() has been renamed to get_num_stored_elements()
  • The CMake parameter GINKGO_COMPILER_FLAGS has been superseded by CMAKE_CXX_FLAGS, and GINKGO_CUDA_COMPILER_FLAGS has been superseded by CMAKE_CUDA_FLAGS
  • The std::initializer_list overloads of matrix create methods and constructors are deprecated in favor of explicit array parameters

Added features

  • Add a pipelined CG solver #1824, #1838, #1859
  • Add Coo Transpose/Conj-Transpose #1816
  • Add Chebyshev iteration solver #1289
  • Add a two-level Schwarz preconditioner #1431
  • Add simplified configuration for stopping criteria #1613
  • Add an example to show the distributed multigrid usage #1769
  • Add half precision support for MPI #1759
  • Add yaml-cpp reader to parse config files in YAML format #1677
  • Add local and distributed L1-Jacobi #1310, #1806
  • Add reusable permutation and transpose operations #1338
  • Add collective communication interface and dense/neighborhood implementation of the interface #1780
  • Add local-to-global index mapping #1707
  • Add Minres solver #975
  • Add array::copy_to_host utility function #1835
  • Add bfloat16 support and corresponding MPI functions #1825, #1827
  • Add mixed precision support for distributed matrix when the underlying matrix also supports mixed precision #1819.
  • Add distributed RowGatherer which is used by the distributed matrix to handle the communication #1589
  • Add complex type support for Dense transpose and Fbcsr on AMD GPUs #1839
  • Add OMP implementation for Merge-Path CSR #1810

Improvements

  • Improve performance of factorization validation in benchmarks #1766
  • Allow specifying a ValueType instead of a full SolverType in preconditioners Ic #1811 and Ilu #1828 and Ilu #1828. Note. It introduces the behavior changes for config usage. Please take a look at the behavior changes section.
  • Avoid refilling the constant scalar in the workspace in each apply #1846

Fixes

  • Fix an oneMKL GEMM issue on zero-sized matrix #1756
  • Fix error with ILU/IC generation and default algorithm on OpenMP #1783, #1855
  • Avoid NaN values being propagated through multiplications with zero scalars in linear combination apply and simple BLAS operations #1573
  • Fix IR move operation #1812
  • Fix CUDA 12.2 null rowptr issue when setting the cusparse CSR matrix #1843
  • Fix COO unsupported exception on an empty matrix with 16bit precision #1843
  • Fix METIS detection when GKLib is linked into the METIS library #1847
  • Fix bfloat16 issue on CUDA before cuda 12.2 and oneAPI before oneAPI 2024.2 #1848
  • Work around compiler bug related to warp ballot on H100 GPUs with CUDA 12.2 - 12.4 #1849
  • Fix a race condition in LU factorization #1850
  • Fix the 16bit precision NaN check in triangular solve #1860