Skip to content

Trying to implement L-BFGS optimizers #2545

Closed
@Anirudh257

Description

@Anirudh257

🐛 Bug

I'm trying to get L-BFGS optimizer to work on TPU but I'm facing huge slowdowns at the first call of the step(). The original code can be found here. I also saw a similar issue on LAMB optimizers. I was also facing a deprecation warning in _add() in this step. I am unable to see the exact line in the code that is the major cause of the slowdown.

In my analysis, I estimated that max_iter used in this loop is 30 and is fast for the 1st and 2nd iteration but it suddenly becomes very slow later. What else can I do? Here, I have taken the VM configuration as specified in the Fairseq tutorial. Should I increase the number of cores for my problem? Do I need to shift my variables in the fitting
code to xla_device?

To Reproduce

Steps to reproduce the behaviour:

  1. I take a GCP VM instance and a TPU processing node by following these steps.
  2. Then, I follow the steps given in the SMPL-X repository to install SMPL-X on the VM.
  3. In the code, I change all the instances of device as given here to device = xm.xla_device() and optimizer.step to loss = xm.optimizer_step(optimizer, optimizer_args={'closure': closure}, barrier = True).

Environment

  • Reproducible on XLA backend [CPU/TPU]:
  • torch_xla version: 1.6
  • OS: Linux transformer-tutorial 4.9.0-13-amd64 Initial import as a separate torch_xla extension #1 SMP Debian 4.9.228-1 (2020-07-05) x86_64 GNU/Linux
  • GCC version: 6.3.0
  • Python version: 3.6 (64-bit runtime)
  • Is CUDA available: False
  • CUDA runtime version: No CUDA
  • GPU models and configuration: No CUDA
  • Nvidia driver version: No CUDA
  • cuDNN version: No CUDA
  • numpy 1.16.3 pip
  • numpy 1.19.1 py36hbc911f0_0
  • numpy-base 1.19.1 py36hfa32c7d_0
  • numpydoc 1.1.0 py_0
  • torch 1.6.0 pip
  • torch-xla 1.6 pip
  • torchgeometry 0.1.2 pip
  • torchvision 0.7.0 pip
  • blas 1.0 mkl
  • mkl 2020.3 intel_279 intel
  • mkl-service 2.3.0 py36he904b0f_0
  • mkl_fft 1.1.0 py36h23d657b_0
  • mkl_random 1.1.1 py36h0573a6f_0
  • torchvision 0.7.0 pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleHas not had recent activity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions