Closed
Description
🐛 Bug
I'm trying to get L-BFGS optimizer to work on TPU but I'm facing huge slowdowns at the first call of the step().
The original code can be found here. I also saw a similar issue on LAMB optimizers. I was also facing a deprecation warning in _add()
in this step. I am unable to see the exact line in the code that is the major cause of the slowdown.
In my analysis, I estimated that max_iter used in this loop is 30
and is fast for the 1st and 2nd iteration but it suddenly becomes very slow later. What else can I do? Here, I have taken the VM configuration as specified in the Fairseq tutorial. Should I increase the number of cores for my problem? Do I need to shift my variables in the fitting
code to xla_device?
To Reproduce
Steps to reproduce the behaviour:
- I take a GCP VM instance and a TPU processing node by following these steps.
- Then, I follow the steps given in the SMPL-X repository to install SMPL-X on the VM.
- In the code, I change all the instances of
device
as given here todevice = xm.xla_device()
and optimizer.step toloss = xm.optimizer_step(optimizer, optimizer_args={'closure': closure}, barrier = True)
.
Environment
- Reproducible on XLA backend [CPU/TPU]:
- torch_xla version: 1.6
- OS: Linux transformer-tutorial 4.9.0-13-amd64 Initial import as a separate torch_xla extension #1 SMP Debian 4.9.228-1 (2020-07-05) x86_64 GNU/Linux
- GCC version: 6.3.0
- Python version: 3.6 (64-bit runtime)
- Is CUDA available: False
- CUDA runtime version: No CUDA
- GPU models and configuration: No CUDA
- Nvidia driver version: No CUDA
- cuDNN version: No CUDA
- numpy 1.16.3 pip
- numpy 1.19.1 py36hbc911f0_0
- numpy-base 1.19.1 py36hfa32c7d_0
- numpydoc 1.1.0 py_0
- torch 1.6.0 pip
- torch-xla 1.6 pip
- torchgeometry 0.1.2 pip
- torchvision 0.7.0 pip
- blas 1.0 mkl
- mkl 2020.3 intel_279 intel
- mkl-service 2.3.0 py36he904b0f_0
- mkl_fft 1.1.0 py36h23d657b_0
- mkl_random 1.1.1 py36h0573a6f_0
- torchvision 0.7.0 pip