You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* initialize nccl
* change year in header
* add implementation of nccl gbdt
* add nccl topology
* clean up
* clean up
* set nccl info
* support quantized training with categorical features on cpu
* remove white spaces
* add tests for quantized training with categorical features
* skip tests for cuda version
* fix cases when only 1 data block in row-wise quantized histogram construction with 8 inner bits
* remove useless capture
* fix inconsistency of gpu devices
* fix creating boosting object from file
* change num_gpu to num_gpus in test case
* fix objective initialization
fix lint errors
* fix c++ compilation warning
* fix lint errors
* fix compilation warnings
* change num_gpu to num_gpus in R test case
* add nccl synchronization in tree training
* fix global num data update
* fix ruff-format issues
* use global num data in split finder
* explicit initialization of NCCLInfo members
* fix compilation
* use CUDAVector
* use CUDAVector
* merge master
* use CUDAVector
* use CUDAVector for cuda tree and column data
* update gbdt
* changes for cuda tree
* use CUDAVector for cuda column data
* disable cuda by default
* fix single machine gbdt
* clean up
* fix typo
* fix lint issues
* use num_gpu instead of num_gpus
* fix compilation error
* fix cpp lint errors
* fix reset config for cuda data partition
* fix subrow copy in cuda column data
* fix cmakelint errors
* Update include/LightGBM/config.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update include/LightGBM/config.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update include/LightGBM/cuda/cuda_nccl_topology.hpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update include/LightGBM/config.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update src/treelearner/cuda/cuda_data_partition.cu
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update src/treelearner/cuda/cuda_data_partition.cu
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update src/treelearner/cuda/cuda_leaf_splits.cu
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* remove WARPSIZE before #6086 is merged
* Update src/treelearner/cuda/cuda_leaf_splits.cu
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* update docs
* Update src/treelearner/cuda/cuda_leaf_splits.cu
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* update documentation to indicate supporting of multi-node multi-gpu training of CUDA version
* add header guard
* update document for parameters
* fix lint errors
* fix header ordering
* update Nccl to NCCL
---------
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Copy file name to clipboardExpand all lines: docs/Installation-Guide.rst
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -692,9 +692,9 @@ Refer to `GPU Docker folder <https://github.com/microsoft/LightGBM/tree/master/d
692
692
Build CUDA Version
693
693
~~~~~~~~~~~~~~~~~~
694
694
695
-
The `original GPU version <#build-gpu-version>`__ of LightGBM (``device_type=gpu``) is based on OpenCL.
695
+
The `original GPU version <#build-gpu-version>`__ of LightGBM (``device_type=gpu``) is based on OpenCL, and only computes histograms on GPUs, with other parts of training in CPUs.
696
696
697
-
The CUDA-based version (``device_type=cuda``) is a separate implementation.
697
+
The CUDA-based version (``device_type=cuda``) is a separate implementation that runs significantly faster by putting all the training process on GPUs. It also supports multi-GPU, and multi-node multi-GPU training.
698
698
Use this version in Linux environments with an NVIDIA GPU with compute capability 6.0 or higher.
Copy file name to clipboardExpand all lines: docs/Parameters.rst
+17-1Lines changed: 17 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1373,8 +1373,18 @@ GPU Parameters
1373
1373
1374
1374
- ``-1`` means the default device in the selected platform
1375
1375
1376
+
- in multi-GPU case (``num_gpu>1``) means ID of the master GPU
1377
+
1376
1378
- **Note**: refer to `GPU Targets <./GPU-Targets.rst#query-opencl-devices-in-your-system>`__ for more details
1377
1379
1380
+
- ``gpu_device_id_list`` :raw-html:`<aid="gpu_device_id_list"title="Permalink to this parameter"href="#gpu_device_id_list">🔗︎</a>`, default = ``""``, type = string
1381
+
1382
+
- list of CUDA device IDs
1383
+
1384
+
- **Note**: can be used only in CUDA implementation (``device_type="cuda"``) and when ``num_gpu>1``
1385
+
1386
+
- if empty, the devices with the smallest IDs will be used
1387
+
1378
1388
- ``gpu_use_dp`` :raw-html:`<aid="gpu_use_dp"title="Permalink to this parameter"href="#gpu_use_dp">🔗︎</a>`, default = ``false``, type = bool
1379
1389
1380
1390
- set this to ``true`` to use double precision math on GPU (by default single precision is used)
@@ -1383,10 +1393,16 @@ GPU Parameters
1383
1393
1384
1394
- ``num_gpu`` :raw-html:`<aid="num_gpu"title="Permalink to this parameter"href="#num_gpu">🔗︎</a>`, default = ``1``, type = int, constraints: ``num_gpu > 0``
1385
1395
1386
-
- number of GPUs
1396
+
- number of GPUs used for training in this node
1387
1397
1388
1398
- **Note**: can be used only in CUDA implementation (``device_type="cuda"``)
1389
1399
1400
+
- if ``0``, only 1 GPU will be used
1401
+
1402
+
- used in both single-machine and distributed learning applications
1403
+
1404
+
- in distributed learning application, each machine can use different number of GPUs
Copy file name to clipboardExpand all lines: include/LightGBM/config.h
+10-1Lines changed: 10 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1125,16 +1125,25 @@ struct Config {
1125
1125
1126
1126
// desc = OpenCL device ID in the specified platform or CUDA device ID. Each GPU in the selected platform has a unique device ID
1127
1127
// desc = ``-1`` means the default device in the selected platform
1128
+
// desc = in multi-GPU case (``num_gpu>1``) means ID of the master GPU
1128
1129
// desc = **Note**: refer to `GPU Targets <./GPU-Targets.rst#query-opencl-devices-in-your-system>`__ for more details
1129
1130
int gpu_device_id = -1;
1130
1131
1132
+
// desc = list of CUDA device IDs
1133
+
// desc = **Note**: can be used only in CUDA implementation (``device_type="cuda"``) and when ``num_gpu>1``
1134
+
// desc = if empty, the devices with the smallest IDs will be used
1135
+
std::string gpu_device_id_list = "";
1136
+
1131
1137
// desc = set this to ``true`` to use double precision math on GPU (by default single precision is used)
1132
1138
// desc = **Note**: can be used only in OpenCL implementation (``device_type="gpu"``), in CUDA implementation only double precision is currently supported
1133
1139
bool gpu_use_dp = false;
1134
1140
1135
1141
// check = >0
1136
-
// desc = number of GPUs
1142
+
// desc = number of GPUs used for training in this node
1137
1143
// desc = **Note**: can be used only in CUDA implementation (``device_type="cuda"``)
1144
+
// desc = if ``0``, only 1 GPU will be used
1145
+
// desc = used in both single-machine and distributed learning applications
1146
+
// desc = in distributed learning application, each machine can use different number of GPUs
0 commit comments