Skip to content

Conversation

@molbap
Copy link
Contributor

@molbap molbap commented Apr 9, 2025

What does this PR do?

A continuation of #36798 , now:

  • The debugger will only output the first and last layer of a sequence of layers.
  • mean/stds are added as well, and a ..._SUMMARY.json file will contain only statistics, not full tensors.
  • General printing improvements.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

if hasattr(value, "_local_tensor"):
# DTensor-like handling, just use local tensor attribute
return {
torch.set_printoptions(sci_mode=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could ahve max line width increased as well1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's done in the repr_to_list method, will unify this ;)

@ArthurZucker
Copy link
Collaborator

Can we make sure hooks are removed after we exit the context manager?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@molbap molbap marked this pull request as ready for review April 17, 2025 15:31
@molbap
Copy link
Contributor Author

molbap commented Apr 18, 2025

cc @eustlb @zucchini-nlp @qubvel @yonigozlan , my fellow model adders, could be helpful! Check out the doc (and @Cyrilvallez but I think you've seen/used it already maybe)

@Cyrilvallez
Copy link
Member

Hey @molbap, I just had a random thought while reviewing a model and was thinking about your super nice util, so dropping it here right now as I'm seeing the tag (this is NOT intended as a follow-up to your util hahaha, just to see if it could be helpful for everyone/others have ideas about that):

Another super cool helper IMO would be some kind of library scanner to find close models for modular. E.g., I want to find a MLP with only 2 Linear layers and Dropout -> helper find the different related models
As far as I know we cannot easily do it with usual IDE search tools as most of the time we don't know (and don't really care, as we have a weight converter most of the time anyway) the actual names of those layers

@qubvel
Copy link
Contributor

qubvel commented Apr 18, 2025

Following @Cyrilvallez's idea, it might indeed be very helpful to run such a tool even across the library to find identical modules that may have differently named attributes but are identical actually. Unfortunately, we cannot rename attributes because it would break the weights loading (or we would need a hook for renaming), but at least we can choose one as a standard and leave comments for the rest, indicating that e.g. LlamaMLP is identical to this one and is preferred to be used for modular purposes

Copy link
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for updating!

The debugger will only output the first and last layer of a sequence of layers.

Is it a configurable option? I remember for mllama we messed up self/cross attention layers order, so the diff appears after layer 4.

Comment on lines 27 to 28
if is_vision_available():
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, artifact of make fixup

from torch import nn


class ToyModel(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be also under the if is_torch_available(), otherwise we don't need a guard actually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah indeed!



[[autodoc]] model_addition_debugger
### Reading results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for the above lines actually, but can't comment there.

# call forward method (not .generate!)
with model_addition_debugger_context(model, "optional_path_to_your_output_file.json"):
    output = model.forward(**inputs)

Should we call model.forward? Why not model(**intpus)? Are we avoiding the top-level hooks for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, top-level works as well - it's just to be explicit and oppose model.forward to model.generate! Can explain (as the latter would create a several 100MB json file...)

@molbap
Copy link
Contributor Author

molbap commented Apr 18, 2025

Ah that's true, I'll add a configuration option now to output all the layers and add to the doc.

For the modular scanner: indeed, it'd be a very nice util for model adders as well, agree that we can't easily modularize existing code (we can only if we don't change the names), but I'm pretty sure we could add a name mapping util to make sure to preserve naming. E.g. "this module is identical to that one, but rename self.attn to self.self_attn".

@molbap
Copy link
Contributor Author

molbap commented Apr 18, 2025

Added:

  • configurable do_prune_layers and associated test @qubvel (+ torch guarding the whole thing, why not)
  • more documentation
  • docstrings because I'm feeling nice today

image

merging!

@molbap molbap merged commit 4afd3f4 into main Apr 18, 2025
21 checks passed
@molbap molbap deleted the model_debugger_upgrades branch April 18, 2025 14:45
@ArthurZucker
Copy link
Collaborator

Very nice!

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* debugging improvements

* add debugging details

* add more debugging details

* debug more

* clean up layers + output

* add summary json file

* cleanup

* copies 👀

* remove hooks + add documentation

* draft a small test, why not

* respect the format (respect it)

* fixup imports

* nit

* add tests and configurable pruning of layers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants