-
Notifications
You must be signed in to change notification settings - Fork 427
Commit cded081
Batching L-BFGS-B (using as much as possible from scipy)
Summary:
This PR changes our acq optimization in the cases where we use L-BFGS-B to use a properly batched version of L-BFGS-B.
This PR goes a little bit into the internals of scipy to make their FORTRAN code work as good as possible for us.
For this, I have edited their scipy/optimize/_lbfgsb_py.py file to handle batched functions as well. The fortran code is still running sequentially: just the evaluations are run in parallel.
In a toy test I did locally on Ackley_4d it led to speed-ups of **2.5x** in generation time (which dominates the overall time clearly) using it as a drop-in replacement in the standard SingleTaskGP generation strategy, and a better performance (but I think performance should be judged after the nightly has run, as I only looked at 1 problem and 2 seeds).
The speedups are a lower-bound (+ some error), as we can adapt the batch sizes to be larger now and will likely get further speedups through that (they should be very sizable for PFNs, but also for GPs, esp. when on GPUs).
Why does this speedup things?
1. We converge faster, as each subproblem is simpler.
2. When some sub-problem is not converged only that sub-problem is re-evaluated.
How is this slower?
1. Based on my measurements it looks like the LBFGS-B implementation is really fast for "larger" dimensionalities when it comes to step times, while we perform a for-loop there. (It is much slower converging, though, as one can imagine with batching a problem like we do.) This might be possible to speedup again via threading/multi-processing, but *beware* pytorch cuda tends to work *really* badly with multi-processing.
### Benchmarking
**[new stop crit]**
Benchmarking (nightly) currently running: https://www.internalfb.com/mlhub/flow/745576495/overview
**[no batch_limit]**
Another benchmark, which does the following:
- going back to our previous defaults for tolerance on the function optimum (a slightly different stopping criterion)
- no more batch limits, as they are not needed anymore. this should yield big speedups. I have removed them for initialization, too, even though that is kinda unrelated (the speedup is not too high compared to the speedup via the classic `batch_limit`)
- I have improved the algorithm to make optimal batches, as unexpected to me the optimizer function sometimes needs to be called multiple times before consuming an evaluation again
https://internalfb.com/intern/fblearner/details/746051698/?notif_channel=cli
**[no batch_limit & 128 restarts]**
Another benchmark, increasing the `num_restarts` to 128 per default.
https://internalfb.com/intern/fblearner/details/746054543/?notif_channel=cli
**[no batch_limit & new stop crit]**
https://internalfb.com/intern/fblearner/details/746392062/?notif_channel=cli
**[no batch_limit & new stop crit & 32 restarts]**
https://internalfb.com/intern/fblearner/details/746394172/?notif_channel=cli
### Results
| Version | Final Score Improvement Share | Average Score Improvement Share | Gen Time Speedup |
| --- | --- | --- | --- |
| [new stop crit] | 59% | 58% | 1.4x |
| [no batch_limit] | 55% | 52% | 2.4x |
| [no batch_limit & 128 restarts] | 61% | 52% | -2x (slowdown) |
| [no batch_limit & new stop crit] | TBD |TBD|TBD|
| [no_batch_limit & new stop crit & 32 restarts | TBD | TBD| TBD|
Note: The negative value in the last row indicates a slowdown instead of a speedup.
TODO
~~Batching can be further improved by waiting until all states are either done or need to do a function evaluation.~~ (Done, yielded like 10-20% speedups on Ackley4)
WARNINGS:
1 Currently this method uses slightly different defaults as `scipy.optimize.minimize`. Did not test, which ones are better. This should prob be done before merging.
2 This is already working but it is not yet turned on per-default, which it should be before shipping (so: DO NOT MERGE JUST YET) + there should be some tests..
3 It is not yet tested for multi-objective and batched BO, where some shapes are different and thus things might break.
4 Currently the code for LbfgsInvHessProduct is copied over, as we can see that that changes with the newest SciPy version, potentially creating conflicts. But it might still be better to try and copy less.
Differential Revision: D759776891 parent 29e85e0 commit cded081Copy full SHA for cded081
File tree
Expand file treeCollapse file tree
2 files changed
+396
-333
lines changedFilter options
- botorch/generation
Expand file treeCollapse file tree
2 files changed
+396
-333
lines changed
0 commit comments