Commit cace65c
Fix AdEMAMix scheduler guard and add state_dict round-trip test (#1861)
* Fix AdEMAMix scheduler guard and add state_dict round-trip test (#1382)
Fix potential division-by-zero in AdEMAMix update_step when t_alpha or
t_beta3 is 0 (e.g. from get_config defaults). Change scheduler guards
from `is None` to falsy checks so that 0, None, and False all correctly
skip the scheduler path. Also change get_config defaults for t_alpha and
t_beta3 from 0 to None to match the intended semantics.
Add test_ademamix_state_dict_no_nan which saves and loads AdEMAMix
state_dict (8-bit, 32-bit, with and without schedulers) and verifies:
- loaded state matches original byte-for-byte
- training resumes without NaN or Inf
- two optimizers loaded from the same checkpoint produce identical updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: Fix ruff format violation in test_linear4bit.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 505a00a commit cace65c
File tree
3 files changed
+113
-5
lines changed- bitsandbytes/optim
- tests
3 files changed
+113
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
180 | 180 | | |
181 | 181 | | |
182 | 182 | | |
183 | | - | |
| 183 | + | |
184 | 184 | | |
185 | 185 | | |
186 | 186 | | |
| |||
201 | 201 | | |
202 | 202 | | |
203 | 203 | | |
204 | | - | |
| 204 | + | |
205 | 205 | | |
206 | 206 | | |
207 | 207 | | |
208 | 208 | | |
209 | 209 | | |
210 | | - | |
| 210 | + | |
211 | 211 | | |
212 | 212 | | |
213 | 213 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
341 | 341 | | |
342 | 342 | | |
343 | 343 | | |
344 | | - | |
345 | | - | |
| 344 | + | |
| 345 | + | |
346 | 346 | | |
347 | 347 | | |
348 | 348 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
592 | 592 | | |
593 | 593 | | |
594 | 594 | | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
0 commit comments