Got the following error; this happened both, when I installed it directly on the machine as well as when I tried to use docker (Second was done to ensure it was not a configuration error)
Machine is a server on Ubuntu 24 with 4*4090 (normal nanogpt did run). Not sure, why it says invalid device. All tests for identifying devices ran normal (four devices with 0..3 numbers) also used in that way in other programs.
==>>> sudo docker run -it --rm --gpus all -v $(pwd):/modded-nanogpt modded-nanogpt sh run.sh
W1226 16:51:59.401000 7 site-packages/torch/distributed/run.py:792]
W1226 16:51:59.401000 7 site-packages/torch/distributed/run.py:792] *****************************************
W1226 16:51:59.401000 7 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1226 16:51:59.401000 7 site-packages/torch/distributed/run.py:792] *****************************************
Traceback (most recent call last):
File "/modded-nanogpt/train_gpt2.py", line 431, in
torch.cuda.set_device(device)
File "/usr/local/lib/python3.12/site-packages/torch/cuda/init.py", line 476, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
using device: cuda:0
Traceback (most recent call last):
File "/modded-nanogpt/train_gpt2.py", line 431, in
torch.cuda.set_device(device)
File "/usr/local/lib/python3.12/site-packages/torch/cuda/init.py", line 476, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Traceback (most recent call last):
File "/modded-nanogpt/train_gpt2.py", line 431, in
torch.cuda.set_device(device)
File "/usr/local/lib/python3.12/site-packages/torch/cuda/init.py", line 476, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Traceback (most recent call last):
File "/modded-nanogpt/train_gpt2.py", line 431, in
torch.cuda.set_device(device)
File "/usr/local/lib/python3.12/site-packages/torch/cuda/init.py", line 476, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
using device: cuda:3
using device: cuda:1
using device: cuda:2
W1226 16:52:43.205000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 74 closing signal SIGTERM
W1226 16:52:43.216000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 75 closing signal SIGTERM
W1226 16:52:43.226000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 76 closing signal SIGTERM
W1226 16:52:43.234000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 77 closing signal SIGTERM
W1226 16:52:43.243000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 78 closing signal SIGTERM
W1226 16:52:43.250000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 79 closing signal SIGTERM
W1226 16:52:43.255000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 80 closing signal SIGTERM
E1226 16:52:45.454000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 7 (pid: 81) of binary: /usr/local/bin/python
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/distributed/run.py", line 918, in main
run(args)
File "/usr/local/lib/python3.12/site-packages/torch/distributed/run.py", line 909, in run
elastic_launch(
File "/usr/local/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train_gpt2.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-12-26_16:52:43
host : 30c961130fdb
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 81)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
===> The state of the machine.
nvidia-smi
Thu Dec 26 16:55:32 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:25:00.0 Off | Off |
| 32% 29C P8 6W / 450W | 4MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:41:00.0 Off | Off |
| 31% 30C P8 5W / 450W | 4MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 4090 Off | 00000000:A1:00.0 Off | Off |
| 32% 29C P8 4W / 450W | 4MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 4090 Off | 00000000:C1:00.0 Off | Off |
| 31% 27C P8 4W / 450W | 4MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Got the following error; this happened both, when I installed it directly on the machine as well as when I tried to use docker (Second was done to ensure it was not a configuration error)
Machine is a server on Ubuntu 24 with 4*4090 (normal nanogpt did run). Not sure, why it says invalid device. All tests for identifying devices ran normal (four devices with 0..3 numbers) also used in that way in other programs.
==>>> sudo docker run -it --rm --gpus all -v $(pwd):/modded-nanogpt modded-nanogpt sh run.sh
W1226 16:51:59.401000 7 site-packages/torch/distributed/run.py:792]
W1226 16:51:59.401000 7 site-packages/torch/distributed/run.py:792] *****************************************
W1226 16:51:59.401000 7 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1226 16:51:59.401000 7 site-packages/torch/distributed/run.py:792] *****************************************
Traceback (most recent call last):
File "/modded-nanogpt/train_gpt2.py", line 431, in
torch.cuda.set_device(device)
File "/usr/local/lib/python3.12/site-packages/torch/cuda/init.py", line 476, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.using device: cuda:0
Traceback (most recent call last):
File "/modded-nanogpt/train_gpt2.py", line 431, in
torch.cuda.set_device(device)
File "/usr/local/lib/python3.12/site-packages/torch/cuda/init.py", line 476, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.Traceback (most recent call last):
File "/modded-nanogpt/train_gpt2.py", line 431, in
torch.cuda.set_device(device)
File "/usr/local/lib/python3.12/site-packages/torch/cuda/init.py", line 476, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.Traceback (most recent call last):
File "/modded-nanogpt/train_gpt2.py", line 431, in
torch.cuda.set_device(device)
File "/usr/local/lib/python3.12/site-packages/torch/cuda/init.py", line 476, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.using device: cuda:3
using device: cuda:1
using device: cuda:2
W1226 16:52:43.205000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 74 closing signal SIGTERM
W1226 16:52:43.216000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 75 closing signal SIGTERM
W1226 16:52:43.226000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 76 closing signal SIGTERM
W1226 16:52:43.234000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 77 closing signal SIGTERM
W1226 16:52:43.243000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 78 closing signal SIGTERM
W1226 16:52:43.250000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 79 closing signal SIGTERM
W1226 16:52:43.255000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 80 closing signal SIGTERM
E1226 16:52:45.454000 7 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 7 (pid: 81) of binary: /usr/local/bin/python
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/distributed/run.py", line 918, in main
run(args)
File "/usr/local/lib/python3.12/site-packages/torch/distributed/run.py", line 909, in run
elastic_launch(
File "/usr/local/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train_gpt2.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-12-26_16:52:43
host : 30c961130fdb
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 81)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
===> The state of the machine.
nvidia-smi
Thu Dec 26 16:55:32 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:25:00.0 Off | Off |
| 32% 29C P8 6W / 450W | 4MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:41:00.0 Off | Off |
| 31% 30C P8 5W / 450W | 4MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 4090 Off | 00000000:A1:00.0 Off | Off |
| 32% 29C P8 4W / 450W | 4MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 4090 Off | 00000000:C1:00.0 Off | Off |
| 31% 27C P8 4W / 450W | 4MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+