Skip to content

add GLORA #2568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

add GLORA #2568

wants to merge 8 commits into from

Conversation

not-lain
Copy link

@not-lain not-lain commented Jun 3, 2025

based on main...Arnav0400:peft:main
and #1510 as well as #780

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, this looks already quite mature. I especially like that you've gone through the trouble to add tests! I've added some comments on things that I think can be improved still.

I think it would make sense to also add a test case in test_custom_models.py since this will give a good coverage over the existing functionality in PEFT. Feel free to ask if you need assistance in doing so.

As I understand the paper they use random sampling + evolutionary search to select the best configuration for the different parameters (i.e., how to structure, say, A: LoRA, vector, scalar, none). PEFT aims to not interfere with the training process, as such I would suggest to not use random sampling of this configuration during init but to give the user the option to set the config once (via GloraConfig, similar to how eval_config works). If the user is keen to try search methods, such as evolutionary search, they can still use either GloraModel.set_adapter_config or, this would need to be added, something like glora.Linear.set_adapter_config, so that the user can do this for each layer individually (to support the ES case). Maybe there's a better name instead of set_adapter_config to set it apart from PEFT adapters. WDYT?

Comment on lines 61 to 64
try:
rank = int(config.split("_")[1])
except Exception:
rank = 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should simply raise an exception since it indicates an user error. Assuming a default will lead to unexpected behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
rank = int(config.split("_")[1])
except Exception:
rank = 4
rank = int(config.split("_")[1])

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(leaving this comment for future me to reflect on this one)
IMO rank should be extracted from config["r"] I may need to go back to the original implementation and check this in more depth

Comment on lines +97 to +101
if self.eval_config is None:
warnings.warn("eval_config not set for GLora layer, using a random config for merge.")
path_config = random.choice(self.configs)
else:
path_config = self.eval_config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should raise an error explaining what to do to fix it instead of simply assuming a random config which might or might not work.

Ideally there's a test in tests/test_glora.py to test this behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if we move eval_config to the GLoraConfig class and set a default value for that over there

Comment on lines +74 to +77
def tearDown(self):
gc.collect()
torch.cuda.empty_cache()
gc.collect()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a test file explicitly for GPU/accelerator tests - all other tests assume to run on CPU, therefore this is not necessary.

Comment on lines +182 to +187
@unittest.skipIf(
not torch.cuda.is_available()
or not hasattr(torch.cuda, "is_bf16_supported")
or not torch.cuda.is_bf16_supported(),
"BF16 not supported or no CUDA",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To also support XPU/Intel:

from accelerate.utils.imports import is_xpu_available
from accelerate.utils.imports import is_bf16_available

[...]

    @unittest.skipIf(
        not torch.cuda.is_available()
        or not is_xpu_available()
        or not is_bf16_available()
        "BF16 not supported or no CUDA/XPU",
    )

return ["decoded text"]


class GLORATester(unittest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you added tests! We're in the process of migrating existing tests to pytest. It's totally OK to keep these as is but if you want to convert these already to pytest I wouldn't complain :)

Comment on lines +147 to +148
except AttributeError:
return getattr(self.model, name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In LoRA we catch this case in reference to #1892, I think this code is susceptible to this as well.

Suggested change
except AttributeError:
return getattr(self.model, name)
except AttributeError:
if name == "model":
raise
return getattr(self.model, name)

Comment on lines +178 to +193
if isinstance(target, GLoraLinear):
if target.eval_config is None:
raise ValueError(
f"eval_config not set for GLoraLinear layer {key}. Cannot merge deterministically. Please call model.set_adapter_eval_config(...) before merging."
)

target.merge()
new_module = nn.Linear(target.in_features, target.out_features, bias=(target.bias is not None))
new_module.weight.data = target.weight.data.clone() # Get merged weight
if target.bias is not None:
new_module.bias.data = target.bias.data.clone() # Get merged bias

self._replace_module(parent, target_name, new_module.to(target.weight.device), target)

if isinstance(target, ModulesToSaveWrapper):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to do implement this similar to how LoRA does it:

                if hasattr(target, "unload_and_optionally_merge_module"):
                    # if layers have special unloading method, like MultiheadAttention, use that
                    unloaded_module = target.unload_and_optionally_merge_module(
                        merge=merge, safe_merge=safe_merge, adapter_names=adapter_names
                    )
                    self._replace_module(parent, target_name, unloaded_module, target)
                elif hasattr(target, "base_layer"):
                    if merge:
                        target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
                    self._replace_module(parent, target_name, target.get_base_layer(), target)

Of course this assumes that the target (e.g., glora.Linear) implements unload_and_optionally_merge_module but I would have suggested to implement it anyway since it is a good interface to have.

Comment on lines +222 to +232
def get_peft_config_as_dict(self, inference: bool = False) -> dict[str, Any]:
config_dict = {}
for adapter_name, peft_config_obj in self.peft_config.items():
config = asdict(peft_config_obj)
if inference:
config["inference_mode"] = True
for k, v in config.items():
if isinstance(v, Enum):
config[k] = v.value
config_dict[adapter_name] = config
return config_dict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be discussed in another PR, as it's not part of the BaseTuner and is part of each model in the library.
IMO we should make a different PR to shift this method to the BaseTuner.

Comment on lines +108 to +132
def _find_and_replace(self, adapter_name: str):
glora_config = self.peft_config[adapter_name]
is_target_modules_in_base_model = False
key_list = [key for key, _ in self.model.named_modules()] # Cache keys

for key in key_list:
if not self._check_target_module_exists(glora_config, key):
continue

is_target_modules_in_base_model = True
parent, target, target_name = _get_submodules(self.model, key)

if isinstance(target, GLoraLinear):
warnings.warn(
f"Module {key} is already a GLoraLinear. Skipping replacement for new adapter '{adapter_name}'. Multiple GLORA adapters on the same layer might need explicit support in GLoraLinear."
)
elif isinstance(target, nn.Linear):
new_module = self._create_new_module(glora_config, adapter_name, target)
self._replace_module(parent, target_name, new_module, target)

if not is_target_modules_in_base_model:
raise ValueError(
f"Target modules {glora_config.target_modules} not found in the base model. "
f"Please check the target modules and try again."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality is already implemented in BaseTuner.inject_adapter. I think _find_and_replace as well as add_adapter can be removed entirely?

Copy link
Author

@not-lain not-lain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for the review!
will ping you once I resolve all the pointers

Comment on lines 61 to 64
try:
rank = int(config.split("_")[1])
except Exception:
rank = 4
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
rank = int(config.split("_")[1])
except Exception:
rank = 4
rank = int(config.split("_")[1])

Comment on lines +222 to +232
def get_peft_config_as_dict(self, inference: bool = False) -> dict[str, Any]:
config_dict = {}
for adapter_name, peft_config_obj in self.peft_config.items():
config = asdict(peft_config_obj)
if inference:
config["inference_mode"] = True
for k, v in config.items():
if isinstance(v, Enum):
config[k] = v.value
config_dict[adapter_name] = config
return config_dict
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be discussed in another PR, as it's not part of the BaseTuner and is part of each model in the library.
IMO we should make a different PR to shift this method to the BaseTuner.

Comment on lines +132 to +134
TRANSFORMERS_MODELS_TO_GLORA_TARGET_MODULES_MAPPING = (
TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING # need to check this later
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not test this yet, I was waiting for the workflow to run and see how it goes, this choice so far is based on intuition and nothing more, will inspect this in more detail further in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants