Skip to content

lr_warmup should not be passed when adafactor is used as the optimizer #617

@martianunlimited

Description

@martianunlimited

It's not really a major bug, more of an inconvenience, but the gui passes lr_warmup to the training command when it is non-zero, causing library/train_util.py to raise a ValueError if the optimizer is AdaFactor. Error is raised mid-way after latents are cached, just before dataloader is created instead of the start of the calling train_network, wasting time, and making it harder to pick out what caused the error for users who are not used to reading stack traces.

ValueError: adafactor:0.0001 does not require num_warmup_steps. Set None or 0.

Suggested fix in the order of preference:
a) Pop-up a warning in the gui if user did not set lr_warmup to 0 when optimizer is set to AdaFactor (recommended)
or
b) Raise error at train_network.py when invalid combination of optimizer and lr_warmup is used.
c) Change train_util.py to raise a warning and ignore value lr_warmup (not-recommended)

Unless there is differing opinions as to why a) and b) is not the way to go, I will go ahead and send a push request for a) and b) over the weekend with said "fix".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions