Skip to content

Add Dia model #38405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 217 commits into from
Jun 26, 2025
Merged

Add Dia model #38405

merged 217 commits into from
Jun 26, 2025

Conversation

buttercrab
Copy link
Contributor

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@vasqu
Copy link
Contributor

vasqu commented Jun 23, 2025

run-slow: dia

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/dia']
quantizations: [] ...

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice 🚀


@abstractmethod
def encode(self, input_values: torch.Tensor, *args, **kwargs):
"""Encode raw audio into discrete audio codebooks (with x channels)"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would just add whether we want it to be padded or not, batched or not, etc. It should support batched inputs, IDK for lists but say that output comes from a feature extractor maybe.

I think it should inherit from PreTrainedModel or should we say we have to inherite from DacPreTrainedModel + this?
Should be alright as well

Copy link
Contributor

@vasqu vasqu Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's more part of the feature extractor / the interaction between model and feature extractor:

  • Feature extractor outputs the input values (as batch or not)
  • Relevant stuff like padding is also included there, e.g. padding via hop_length
  • Additionally stuff like padding masks is also given if necessary but models seem to differ there if they need it

Added a bit more to the doc tho to ref to feature extraction; trying to change the inheritance, that's a good point!

super().__init__(*args, **kwargs)

# Legacy behaviour just uses the tokenizer while new models use the processor as a whole
# at any given time
self.legacy = legacy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something like self.no_processor or more descriptive will help users!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a compilation test as well!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, will be moved to another PR - this is mostly for torch.export as base static + generate works with compile (and is tested through a common test)

@vasqu
Copy link
Contributor

vasqu commented Jun 25, 2025

run-slow: dia

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/dia']
quantizations: [] ...

@vasqu
Copy link
Contributor

vasqu commented Jun 25, 2025

addressed the review comments, lmk if there's more @ArthurZucker

will check slow runs on ci in a sec

@vasqu
Copy link
Contributor

vasqu commented Jun 25, 2025

run-slow: dia

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/dia']
quantizations: [] ...

@vasqu
Copy link
Contributor

vasqu commented Jun 25, 2025

run-slow: dia

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/dia']
quantizations: [] ...

@vasqu vasqu enabled auto-merge (squash) June 26, 2025 09:58
@vasqu vasqu merged commit 583db52 into huggingface:main Jun 26, 2025
20 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu vasqu mentioned this pull request Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants