Skip to content

Commit 20823e7

Browse files
authored
Merge Release v1.1.0 for develop
The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.1.0. This release brings several performance improvements, adds Windows support, adds support for factorizations inside Ginkgo and a new ILU preconditioner based on ParILU algorithm, among other things. For detailed information, check the respective issue. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). ### Additions + Upper and lower triangular solvers ([#327](#327), [#336](#336), [#341](#341), [#342](#342)) + New factorization support in Ginkgo, and addition of the ParILU algorithm ([#305](#305), [#315](#315), [#319](#319), [#324](#324)) + New ILU preconditioner ([#348](#348), [#353](#353)) + Windows MinGW and Cygwin support ([#347](#347)) + Windows Visual Studio support ([#351](#351)) + New example showing how to use ParILU as a preconditioner ([#358](#358)) + New example on using loggers for debugging ([#360](#360)) + Add two new 9pt and 27pt stencil examples ([#300](#300), [#306](#306)) + Allow benchmarking CuSPARSE spmv formats through Ginkgo's benchmarks ([#303](#303)) + New benchmark for sparse matrix format conversions ([#312](https://github.com/ginkgo-project/ginkgo/issues/312)[#317](https://github.com/ginkgo-project/ginkgo/issues/317)) + Add conversions between CSR and Hybrid formats ([#302](#302), [#310](#310)) + Support for sorting rows in the CSR format by column idices ([#322](#322)) + Addition of a CUDA COO SpMM kernel for improved performance ([#345](#345)) + Addition of a LinOp to handle perturbations of the form (identity + scalar * basis * projector) ([#334](#334)) + New sparsity matrix representation format with Reference and OpenMP kernels ([#349](#349), [#350](#350)) ### Fixes + Accelerate GMRES solver for CUDA executor ([#363](#363)) + Fix BiCGSTAB solver convergence ([#359](#359)) + Fix CGS logging by reporting the residual for every sub iteration ([#328](#328)) + Fix CSR,Dense->Sellp conversion's memory access violation ([#295](#295)) + Accelerate CSR->Ell,Hybrid conversions on CUDA ([#313](#313), [#318](#318)) + Fixed slowdown of COO SpMV on OpenMP ([#340](#340)) + Fix gcc 6.4.0 internal compiler error ([#316](#316)) + Fix compilation issue on Apple clang++ 10 ([#322](#322)) + Make Ginkgo able to compile on Intel 2017 and above ([#337](#337)) + Make the benchmarks spmv/solver use the same matrix formats ([#366](#366)) + Fix self-written isfinite function ([#348](#348)) + Fix Jacobi issues shown by cuda-memcheck ### Tools and ecosystem improvements + Multiple improvements to the CI system and tools ([#296](#296), [#311](#311), [#365](#365)) + Multiple improvements to the Ginkgo containers ([#328](#328), [#361](#361)) + Add sonarqube analysis to Ginkgo ([#304](#304), [#308](#308), [#309](#309)) + Add clang-tidy and iwyu support to Ginkgo ([#298](#298)) + Improve Ginkgo's support of xSDK M12 policy by adding the `TPL_` arguments to CMake ([#300](#300)) + Add support for the xSDK R7 policy ([#325](#325)) + Fix examples in html documentation ([#367](#367)) Related PR: #370
2 parents 1fbacd6 + c47a498 commit 20823e7

File tree

34 files changed

+176
-139
lines changed

34 files changed

+176
-139
lines changed

CHANGELOG.md

Lines changed: 71 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,80 @@
11
# Changelog
22

3-
This file may not always be up to date for the unreleased commits. For a
4-
comprehensive list, use the following commands:
3+
This file may not always be up to date in particular for the unreleased
4+
commits. For a comprehensive list, use the following command:
55
```bash
66
git log --first-parent
77
```
88

9-
## Unreleased
10-
### Added
11-
+ [2d3f0318](https://github.com/ginkgo-project/ginkgo/commit/2d3f0318ed9412a3522d12a85b863efad12fd033), [5e744cad](https://github.com/ginkgo-project/ginkgo/commit/5e744cad1ac0a86b58e3a982dfd7fff4123a7ae3), [22e4b07d](https://github.com/ginkgo-project/ginkgo/commit/22e4b07db7642b54e89c026372c9aae7554ff385), [a5d60de9](https://github.com/ginkgo-project/ginkgo/commit/a5d60de994d0d2073d6dfd4b170fccb557ab6663): Code quality tools in the CI system such as IWYU, clang-tidy and sonarqube.
12-
+ [e1ed14da](https://github.com/ginkgo-project/ginkgo/commit/e1ed14dae236cf4880e1aab418ba8b1784cc8c6e): Fully abide to the xSDK compatibility policies.
13-
+ [0c60deec](https://github.com/ginkgo-project/ginkgo/commit/0c60deec4ce806394fb3287735fad4fb9e7e5c71), [de51ee9a](https://github.com/ginkgo-project/ginkgo/commit/de51ee9a4fbec45d4af99c877a3a49ab94c8cdb5): Two new examples, a 9pt and 27pt stencil.
14-
+ [5e0ca656](https://github.com/ginkgo-project/ginkgo/commit/5e0ca656865f2fa8c35c3470bc6e531c7cf95b66): Benchmark support for cuSPARSE SpMVs.
15-
+ [2f2f09eb](https://github.com/ginkgo-project/ginkgo/commit/2f2f09eb8e653b2552fe97c997e6729c6a3dbcdc), [ec7918f0](https://github.com/ginkgo-project/ginkgo/commit/ec7918f0a3ddb8084a7b2854d4f4d88dc86a1c11): Benchmark support for conversion between SpMV formats.
16-
+ [c9be4445](https://github.com/ginkgo-project/ginkgo/commit/c9be444527fb985f9646c4ebb1b8fb7b9ef72615), [82e6da60](https://github.com/ginkgo-project/ginkgo/commit/82e6da6022a4a5405ad2b91f0f48ccc2490114cd): CSR conversions to and from Hybrid.
17-
+ [fce8dad4](https://github.com/ginkgo-project/ginkgo/commit/fce8dad411603fa517e56073c47b0582910a0b1a), [a3307f07](https://github.com/ginkgo-project/ginkgo/commit/a3307f0760174f7f8b9d4edf20688fe5e2ff9d7a): New ParILU preconditioner.
18-
+ [75a398fc](https://github.com/ginkgo-project/ginkgo/commit/75a398fc64aaa17e8ab343a84f4d8d8caa3ca662): Support for sorting CSR matrices. See also the ParILU commits.
19-
20-
### Changed
21-
+ [fe58c940](https://github.com/ginkgo-project/ginkgo/commit/fe58c940aa365d1c7434836150c53fdb4832c3ef): Fix the CUDA conversion from CSR and Dense to Sell-P.
22-
+ [75806c26](https://github.com/ginkgo-project/ginkgo/commit/75806c26ff6af86d2bb436c9b19a6df3d9be76ce), [c6229b80](https://github.com/ginkgo-project/ginkgo/commit/c6229b804e27c4adb02df17af46f925d48f312ff): General fixes to the CI system scripts.
23-
+ [8bf33e0e](https://github.com/ginkgo-project/ginkgo/commit/8bf33e0e3386d0e6a6c41631444deeea627d1d94), [37dfe3b8](https://github.com/ginkgo-project/ginkgo/commit/37dfe3b865a5902a5e395aa424e13190d1bd2c65): Improve CSR->ELL,Hybrid conversions.
24-
+ [c4f567eb](https://github.com/ginkgo-project/ginkgo/commit/c4f567ebc80b22252c5c5284a00e4d9f86d22e2c): Fix compilation with GCC 6.4.
25-
26-
### Removed
9+
## Version 1.1.0
10+
11+
The Ginkgo team is proud to announce the new minor release of Ginkgo version
12+
1.1.0. This release brings several performance improvements, adds Windows support,
13+
adds support for factorizations inside Ginkgo and a new ILU preconditioner
14+
based on ParILU algorithm, among other things. For detailed information, check the respective issue.
15+
16+
Supported systems and requirements:
17+
+ For all platforms, cmake 3.9+
18+
+ Linux and MacOS
19+
+ gcc: 5.3+, 6.3+, 7.3+, 8.1+
20+
+ clang: 3.9+
21+
+ Intel compiler: 2017+
22+
+ Apple LLVM: 8.0+
23+
+ CUDA module: CUDA 9.0+
24+
+ Windows
25+
+ MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, 8.1+
26+
+ Microsoft Visual Studio: VS 2017 15.7+
27+
+ CUDA module: CUDA 9.0+, Microsoft Visual Studio
28+
+ OpenMP module: MinGW or Cygwin.
29+
30+
31+
The current known issues can be found in the [known issues
32+
page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues).
33+
34+
35+
### Additions
36+
+ Upper and lower triangular solvers ([#327](https://github.com/ginkgo-project/ginkgo/issues/327), [#336](https://github.com/ginkgo-project/ginkgo/issues/336), [#341](https://github.com/ginkgo-project/ginkgo/issues/341), [#342](https://github.com/ginkgo-project/ginkgo/issues/342))
37+
+ New factorization support in Ginkgo, and addition of the ParILU
38+
algorithm ([#305](https://github.com/ginkgo-project/ginkgo/issues/305), [#315](https://github.com/ginkgo-project/ginkgo/issues/315), [#319](https://github.com/ginkgo-project/ginkgo/issues/319), [#324](https://github.com/ginkgo-project/ginkgo/issues/324))
39+
+ New ILU preconditioner ([#348](https://github.com/ginkgo-project/ginkgo/issues/348), [#353](https://github.com/ginkgo-project/ginkgo/issues/353))
40+
+ Windows MinGW and Cygwin support ([#347](https://github.com/ginkgo-project/ginkgo/issues/347))
41+
+ Windows Visual Studio support ([#351](https://github.com/ginkgo-project/ginkgo/issues/351))
42+
+ New example showing how to use ParILU as a preconditioner ([#358](https://github.com/ginkgo-project/ginkgo/issues/358))
43+
+ New example on using loggers for debugging ([#360](https://github.com/ginkgo-project/ginkgo/issues/360))
44+
+ Add two new 9pt and 27pt stencil examples ([#300](https://github.com/ginkgo-project/ginkgo/issues/300), [#306](https://github.com/ginkgo-project/ginkgo/issues/306))
45+
+ Allow benchmarking CuSPARSE spmv formats through Ginkgo's benchmarks ([#303](https://github.com/ginkgo-project/ginkgo/issues/303))
46+
+ New benchmark for sparse matrix format conversions ([#312](https://github.com/ginkgo-project/ginkgo/issues/312)[#317](https://github.com/ginkgo-project/ginkgo/issues/317))
47+
+ Add conversions between CSR and Hybrid formats ([#302](https://github.com/ginkgo-project/ginkgo/issues/302), [#310](https://github.com/ginkgo-project/ginkgo/issues/310))
48+
+ Support for sorting rows in the CSR format by column idices ([#322](https://github.com/ginkgo-project/ginkgo/issues/322))
49+
+ Addition of a CUDA COO SpMM kernel for improved performance ([#345](https://github.com/ginkgo-project/ginkgo/issues/345))
50+
+ Addition of a LinOp to handle perturbations of the form (identity + scalar *
51+
basis * projector) ([#334](https://github.com/ginkgo-project/ginkgo/issues/334))
52+
+ New sparsity matrix representation format with Reference and OpenMP
53+
kernels ([#349](https://github.com/ginkgo-project/ginkgo/issues/349), [#350](https://github.com/ginkgo-project/ginkgo/issues/350))
54+
55+
### Fixes
56+
+ Accelerate GMRES solver for CUDA executor ([#363](https://github.com/ginkgo-project/ginkgo/issues/363))
57+
+ Fix BiCGSTAB solver convergence ([#359](https://github.com/ginkgo-project/ginkgo/issues/359))
58+
+ Fix CGS logging by reporting the residual for every sub iteration ([#328](https://github.com/ginkgo-project/ginkgo/issues/328))
59+
+ Fix CSR,Dense->Sellp conversion's memory access violation ([#295](https://github.com/ginkgo-project/ginkgo/issues/295))
60+
+ Accelerate CSR->Ell,Hybrid conversions on CUDA ([#313](https://github.com/ginkgo-project/ginkgo/issues/313), [#318](https://github.com/ginkgo-project/ginkgo/issues/318))
61+
+ Fixed slowdown of COO SpMV on OpenMP ([#340](https://github.com/ginkgo-project/ginkgo/issues/340))
62+
+ Fix gcc 6.4.0 internal compiler error ([#316](https://github.com/ginkgo-project/ginkgo/issues/316))
63+
+ Fix compilation issue on Apple clang++ 10 ([#322](https://github.com/ginkgo-project/ginkgo/issues/322))
64+
+ Make Ginkgo able to compile on Intel 2017 and above ([#337](https://github.com/ginkgo-project/ginkgo/issues/337))
65+
+ Make the benchmarks spmv/solver use the same matrix formats ([#366](https://github.com/ginkgo-project/ginkgo/issues/366))
66+
+ Fix self-written isfinite function ([#348](https://github.com/ginkgo-project/ginkgo/issues/348))
67+
+ Fix Jacobi issues shown by cuda-memcheck
68+
69+
### Tools and ecosystem improvements
70+
+ Multiple improvements to the CI system and tools ([#296](https://github.com/ginkgo-project/ginkgo/issues/296), [#311](https://github.com/ginkgo-project/ginkgo/issues/311), [#365](https://github.com/ginkgo-project/ginkgo/issues/365))
71+
+ Multiple improvements to the Ginkgo containers ([#328](https://github.com/ginkgo-project/ginkgo/issues/328), [#361](https://github.com/ginkgo-project/ginkgo/issues/361))
72+
+ Add sonarqube analysis to Ginkgo ([#304](https://github.com/ginkgo-project/ginkgo/issues/304), [#308](https://github.com/ginkgo-project/ginkgo/issues/308), [#309](https://github.com/ginkgo-project/ginkgo/issues/309))
73+
+ Add clang-tidy and iwyu support to Ginkgo ([#298](https://github.com/ginkgo-project/ginkgo/issues/298))
74+
+ Improve Ginkgo's support of xSDK M12 policy by adding the `TPL_` arguments
75+
to CMake ([#300](https://github.com/ginkgo-project/ginkgo/issues/300))
76+
+ Add support for the xSDK R7 policy ([#325](https://github.com/ginkgo-project/ginkgo/issues/325))
77+
+ Fix examples in html documentation ([#367](https://github.com/ginkgo-project/ginkgo/issues/367))
2778

2879
## Version 1.0.0
2980
The Ginkgo team is proud to announce the first release of Ginkgo, the next-generation high-performance on-node sparse linear algebra library. Ginkgo leverages the features of modern C++ to give you a tool for the iterative solution of linear systems that is:

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
cmake_minimum_required(VERSION 3.9)
22

3-
project(Ginkgo LANGUAGES C CXX VERSION 1.0.0 DESCRIPTION "A numerical linear algebra library targeting many-core architectures")
3+
project(Ginkgo LANGUAGES C CXX VERSION 1.1.0 DESCRIPTION "A numerical linear algebra library targeting many-core architectures")
44
set(Ginkgo_VERSION_TAG "develop")
55
set(PROJECT_VERSION_TAG ${Ginkgo_VERSION_TAG})
66

INSTALL.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,14 @@ Ginkgo adds the following additional switches to control what is being built:
1717

1818
* `-DGINKGO_DEVEL_TOOLS={ON, OFF}` sets up the build system for development
1919
(requires clang-format, will also download git-cmake-format),
20-
default is `ON`
20+
default is `ON`.
2121
* `-DGINKGO_BUILD_TESTS={ON, OFF}` builds Ginkgo's tests
22-
(will download googletest), default is `ON`
22+
(will download googletest), default is `ON`.
2323
* `-DGINKGO_BUILD_BENCHMARKS={ON, OFF}` builds Ginkgo's benchmarks
24-
(will download gflags and rapidjson), default is `ON`
24+
(will download gflags and rapidjson), default is `ON`.
2525
* `-DGINKGO_BUILD_EXAMPLES={ON, OFF}` builds Ginkgo's examples, default is `ON`
26-
* `-DGINKGO_BUILD_EXTLIB_EXAMPLE={ON, OFF}` builds the interfacing example with deal.II, default is `OFF`
26+
* `-DGINKGO_BUILD_EXTLIB_EXAMPLE={ON, OFF}` builds the interfacing example
27+
with deal.II, default is `OFF`.
2728
* `-DGINKGO_BUILD_REFERENCE={ON, OFF}` build reference implementations of the
2829
kernels, useful for testing, default is `ON`
2930
* `-DGINKGO_BUILD_OMP={ON, OFF}` builds optimized OpenMP versions of the kernels,
@@ -42,21 +43,21 @@ Ginkgo adds the following additional switches to control what is being built:
4243
CMake package registry. The default is `OFF`.
4344
* `-DGINKGO_WITH_CLANG_TIDY={ON, OFF}` makes Ginkgo call `clang-tidy` to find
4445
programming issues. The path can be manually controlled with the CMake
45-
variable `-DGINKGO_CLANG_TIDY_PATH=<path>`.
46+
variable `-DGINKGO_CLANG_TIDY_PATH=<path>`. The default is `OFF`.
4647
* `-DGINKGO_WITH_IWYU={ON, OFF}` makes Ginkgo call `iwyu` to find include
4748
issues. The path can be manually controlled with the CMake variable
48-
`-DGINKGO_IWYU_PATH=<path>`.
49+
`-DGINKGO_IWYU_PATH=<path>`. The default is `OFF`.
4950
* `-DGINKGO_VERBOSE_LEVEL=integer` sets the verbosity of Ginkgo.
5051
* `0` disables all output in the main libraries,
5152
* `1` enables a few important messages related to unexpected behavior (default).
5253
* `-DCMAKE_INSTALL_PREFIX=path` sets the installation path for `make install`.
53-
The default value is usually something like `/usr/local`
54+
The default value is usually something like `/usr/local`.
5455
* `-DCMAKE_BUILD_TYPE=type` specifies which configuration will be used for
5556
this build of Ginkgo. The default is `RELEASE`. Supported values are CMake's
5657
standard build types such as `DEBUG` and `RELEASE` and the Ginkgo specific
5758
`COVERAGE`, `ASAN` (AddressSanitizer) and `TSAN` (ThreadSanitizer) types.
5859
* `-DBUILD_SHARED_LIBS={ON, OFF}` builds ginkgo as shared libraries (`OFF`)
59-
or as dynamic libraries (`ON`), default is `ON`
60+
or as dynamic libraries (`ON`), default is `ON`.
6061
* `-DGINKGO_JACOBI_FULL_OPTIMIZATIONS={ON, OFF}` use all the optimizations
6162
for the CUDA Jacobi algorithm. `OFF` by default. Setting this option to `ON`
6263
may lead to very slow compile time (>20 minutes) for the
@@ -92,7 +93,7 @@ Ginkgo adds the following additional switches to control what is being built:
9293
program, default is `windows_shared_library`.
9394
* `-DGINKGO_CHECK_PATH={ON, OFF}` checks if the environment variable PATH is valid.
9495
It is checked only when building shared libraries and executable program,
95-
default is `ON`
96+
default is `ON`.
9697

9798
For example, to build everything (in debug mode), use:
9899

@@ -135,7 +136,7 @@ Information, see the [CMake documentation for
135136
CMAKE_PREFIX_PATH](https://cmake.org/cmake/help/v3.9/variable/CMAKE_PREFIX_PATH.html)
136137
for details.
137138

138-
To manually configure the paths Ginkgo relies on the [standard xSDK Installation
139+
To manually configure the paths, Ginkgo relies on the [standard xSDK Installation
139140
policies](https://xsdk.info/policies/) for all packages except `CAS` (as it is
140141
neither a library nor a header, it cannot be expressed through the `TPL`
141142
format):

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,10 +57,10 @@ The prequirement needs to be verified
5757
* _cmake 3.9+_
5858
* C++11 compliant 64-bits compiler:
5959
* _MinGW : gcc 5.3+, 6.3+, 7.3+, 8.1+_
60-
* _CygWin : gcc 5.3+, 6.3+, 7.3+, 8.1+_
60+
* _Cygwin : gcc 5.3+, 6.3+, 7.3+, 8.1+_
6161
* _Microsoft Visual Studio : VS 2017 15.7+_
6262

63-
__NOTE:__ Need to add `--autocrlf=input` after `git clone` in _CygWin_.
63+
__NOTE:__ Need to add `--autocrlf=input` after `git clone` in _Cygwin_.
6464

6565
The Ginkgo CUDA module has the following __additional__ requirements:
6666

core/device_hooks/cuda_hooks.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,9 @@ namespace gko {
4343

4444
version version_info::get_cuda_version() noexcept
4545
{
46-
// We just return 1.0.0 with a special "not compiled" tag in placeholder
46+
// We just return 1.1.0 with a special "not compiled" tag in placeholder
4747
// modules.
48-
return {1, 0, 0, "not compiled"};
48+
return {1, 1, 0, "not compiled"};
4949
}
5050

5151

core/device_hooks/omp_hooks.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,9 @@ namespace gko {
3838

3939
version version_info::get_omp_version() noexcept
4040
{
41-
// We just return 1.0.0 with a special "not compiled" tag in placeholder
41+
// We just return 1.1.0 with a special "not compiled" tag in placeholder
4242
// modules.
43-
return {1, 0, 0, "not compiled"};
43+
return {1, 1, 0, "not compiled"};
4444
}
4545

4646

core/device_hooks/reference_hooks.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ namespace gko {
3939

4040
version version_info::get_reference_version() noexcept
4141
{
42-
// We just return 1.0.0 with a special "not compiled" tag in placeholder
42+
// We just return 1.1.0 with a special "not compiled" tag in placeholder
4343
// modules.
44-
return {1, 0, 0, "not compiled"};
44+
return {1, 1, 0, "not compiled"};
4545
}
4646

4747

cuda/CMakeLists.txt

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,16 @@ if (NOT BUILD_SHARED_LIBS)
66
set(CMAKE_CUDA_DEVICE_LINK_EXECUTABLE ${CMAKE_CUDA_DEVICE_LINK_EXECUTABLE} PARENT_SCOPE)
77
endif()
88

9-
# MSVC can not find CUDA automatically
10-
# Use CUDA_COMPILER PATH to define the CUDA TOOLKIT ROOT DIR
11-
if ("${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}" STREQUAL "")
12-
string(REPLACE "/bin/nvcc.exe" "" CMAKE_CUDA_ROOT_DIR ${CMAKE_CUDA_COMPILER})
13-
set(CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES "${CMAKE_CUDA_ROOT_DIR}/include")
14-
set(CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES "${CMAKE_CUDA_ROOT_DIR}/lib/x64")
15-
endif()
16-
17-
# This is modified from https://gitlab.kitware.com/cmake/community/wikis/FAQ#dynamic-replace
189
if(MSVC)
10+
# MSVC can not find CUDA automatically
11+
# Use CUDA_COMPILER PATH to define the CUDA TOOLKIT ROOT DIR
12+
if("${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}" STREQUAL "")
13+
string(REPLACE "/bin/nvcc.exe" "" CMAKE_CUDA_ROOT_DIR ${CMAKE_CUDA_COMPILER})
14+
set(CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES "${CMAKE_CUDA_ROOT_DIR}/include")
15+
set(CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES "${CMAKE_CUDA_ROOT_DIR}/lib/x64")
16+
endif()
17+
18+
# This is modified from https://gitlab.kitware.com/cmake/community/wikis/FAQ#dynamic-replace
1919
if(BUILD_SHARED_LIBS)
2020
ginkgo_switch_to_windows_dynamic("CUDA")
2121
else()

cuda/components/diagonal_block_manipulation.cuh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ __device__ __forceinline__ void extract_transposed_diag_blocks(
6767
auto bid = static_cast<size_type>(blockIdx.x) * warps_per_block *
6868
processed_blocks +
6969
threadIdx.z * processed_blocks;
70-
auto bstart = block_ptrs[bid];
70+
auto bstart = (bid < num_blocks) ? block_ptrs[bid] : zero<IndexType>();
7171
IndexType bsize = 0;
7272
#pragma unroll
7373
for (int b = 0; b < processed_blocks; ++b, ++bid) {
@@ -84,6 +84,7 @@ __device__ __forceinline__ void extract_transposed_diag_blocks(
8484
if (threadIdx.y == b && threadIdx.x < max_block_size) {
8585
workspace[threadIdx.x] = zero<ValueType>();
8686
}
87+
warp.sync();
8788
const auto row = bstart + i;
8889
const auto rstart = row_ptrs[row] + tid;
8990
const auto rend = row_ptrs[row + 1];
@@ -101,6 +102,7 @@ __device__ __forceinline__ void extract_transposed_diag_blocks(
101102
if (threadIdx.y == b && threadIdx.x < bsize) {
102103
block_row[i * increment] = workspace[threadIdx.x];
103104
}
105+
warp.sync();
104106
}
105107
}
106108
}

cuda/preconditioner/jacobi_generate_kernel.cu

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
4747
#include "cuda/components/thread_ids.cuh"
4848
#include "cuda/components/uninitialized_array.hpp"
4949
#include "cuda/components/warp_blas.cuh"
50+
#include "cuda/components/zero_array.hpp"
5051
#include "cuda/preconditioner/jacobi_common.hpp"
5152

5253

@@ -296,6 +297,7 @@ void generate(std::shared_ptr<const CudaExecutor> exec,
296297
Array<precision_reduction> &block_precisions,
297298
const Array<IndexType> &block_pointers, Array<ValueType> &blocks)
298299
{
300+
zero_array(blocks.get_num_elems(), blocks.get_data());
299301
select_generate(compiled_kernels(),
300302
[&](int compiled_block_size) {
301303
return max_block_size <= compiled_block_size;

0 commit comments

Comments
 (0)