Add Dia model #38405

buttercrab · 2025-05-27T14:40:47Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…d-dia

vasqu · 2025-06-23T17:14:25Z

run-slow: dia

github-actions · 2025-06-23T17:15:51Z

This comment contains run-slow, running the specified jobs:

models: ['models/dia']
quantizations: [] ...

ArthurZucker

Super nice 🚀

src/transformers/generation/logits_process.py

docs/source/en/model_doc/dia.md

ArthurZucker · 2025-06-25T11:55:43Z

src/transformers/modeling_utils.py

+
+    @abstractmethod
+    def encode(self, input_values: torch.Tensor, *args, **kwargs):
+        """Encode raw audio into discrete audio codebooks (with x channels)"""


i would just add whether we want it to be padded or not, batched or not, etc. It should support batched inputs, IDK for lists but say that output comes from a feature extractor maybe.

I think it should inherit from PreTrainedModel or should we say we have to inherite from DacPreTrainedModel + this?
Should be alright as well

I think that's more part of the feature extractor / the interaction between model and feature extractor:

Feature extractor outputs the input values (as batch or not)

Relevant stuff like padding is also included there, e.g. padding via hop_length

Additionally stuff like padding masks is also given if necessary but models seem to differ there if they need it

Added a bit more to the doc tho to ref to feature extraction; trying to change the inheritance, that's a good point!

src/transformers/models/dia/generation_dia.py

src/transformers/models/dia/processing_dia.py

src/transformers/pipelines/text_to_audio.py

ArthurZucker · 2025-06-25T12:05:09Z

src/transformers/pipelines/text_to_audio.py

        super().__init__(*args, **kwargs)

+        # Legacy behaviour just uses the tokenizer while new models use the processor as a whole
+        # at any given time
+        self.legacy = legacy


maybe something like self.no_processor or more descriptive will help users!

tests/models/dia/test_modeling_dia.py

ArthurZucker · 2025-06-25T12:06:46Z

tests/models/dia/test_modeling_dia.py

let's add a compilation test as well!

Discussed offline, will be moved to another PR - this is mostly for torch.export as base static + generate works with compile (and is tested through a common test)

vasqu · 2025-06-25T16:30:32Z

run-slow: dia

github-actions · 2025-06-25T16:31:57Z

This comment contains run-slow, running the specified jobs:

models: ['models/dia']
quantizations: [] ...

vasqu · 2025-06-25T16:41:40Z

addressed the review comments, lmk if there's more @ArthurZucker

will check slow runs on ci in a sec

vasqu · 2025-06-25T16:42:47Z

run-slow: dia

github-actions · 2025-06-25T16:44:17Z

This comment contains run-slow, running the specified jobs:

models: ['models/dia']
quantizations: [] ...

vasqu · 2025-06-25T16:48:16Z

run-slow: dia

github-actions · 2025-06-25T16:49:46Z

This comment contains run-slow, running the specified jobs:

models: ['models/dia']
quantizations: [] ...

HuggingFaceDocBuilderDev · 2025-06-26T11:04:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

buttercrab and others added 30 commits April 28, 2025 05:06

add dia model

86d38f1

add tokenizer files

ad7302e

cleanup some stuff

3aefa5f

Merge branch 'main' into add-dia

782af0e

brut copy paste code

c61f885

Merge branch 'add-dia' of github.com:huggingface/transformers into ad…

810a8df

…d-dia

rough cleanup of the modeling code

4ac684e

nuke some stuff

9cc0d6b

more nuking

d7491ab

more cleanups

a447adf

updates

5f3b0c3

add mulitLayerEmbedding vectorization

6427323

nits

79e4f03

more modeling simplifications

df780fd

updates

f7b2c08

update rope

fdabeb5

update rope

9861ab5

just fixup

007d480

update configuration files

14a502e

more cleanup!

d9e9585

default config values

677481b

update

919ef03

forgotten comma

73acbdd

another comma!

87375ef

update, more cleanups

f1dfefd

just more nits

1311792

more config cleanups

f795327

time for the encoder

68650ef

fix

10066b6

sa=mall nit

738b858

vasqu added 8 commits June 24, 2025 10:40

nits

005224b

another round of smaller things

756b408

Merge branch 'main' into add-dia

6afb932

docs + some fixes (generate one might be big)

a144382

msytery solved

3427824

small fix on conversion

8e9daf6

add abstract audio tokenizer, change init check to abstract class

d55130b

nits

d2597ae

ArthurZucker approved these changes Jun 25, 2025

View reviewed changes

vasqu added 5 commits June 25, 2025 16:28

update docs + fix some processing :D

75ed1ac

change inheritance scheme for audio tokenizer

4d87181

delete dead / unnecessary code in copied generate loop

96ca4e2

last nits on new pipeline behavior (+ todo on tests) + style

5828d1b

Merge branch 'main' into add-dia

279944d

trigger

798330e

vasqu enabled auto-merge (squash) June 26, 2025 09:58

vasqu and others added 2 commits June 26, 2025 11:59

Merge branch 'main' into add-dia

9d1ea00

fixup loss

fa59c1e

vasqu merged commit 583db52 into huggingface:main Jun 26, 2025
20 checks passed

vasqu mentioned this pull request Jun 27, 2025

Add dia #37941

Closed

Add Dia model #38405

Add Dia model #38405

Uh oh!

Conversation

buttercrab commented May 27, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

vasqu commented Jun 23, 2025

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ArthurZucker Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

vasqu commented Jun 25, 2025

Uh oh!

vasqu commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

vasqu commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 26, 2025

Uh oh!

Uh oh!

vasqu Jun 25, 2025 •

edited

Loading