Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/en/attention_interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ If in doubt about what args/kwargs a given model sends to the attention function
## Accessing current available implementations

Most of the time, you will simply need to `register` a new function. If, however, you need to access an existing one,
and/or perform a few checks, the prefered way is to use the global `ALL_ATTENTION_FUNCTIONS`. It behaves the same way you
and/or perform a few checks, the preferred way is to use the global `ALL_ATTENTION_FUNCTIONS`. It behaves the same way you
would expect from a usual Python dictionary:

```python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/it/perf_train_cpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Questa guida si concentra su come addestrare in maniera efficiente grandi modell

## Mixed precision con IPEX

IPEX è ottimizzato per CPU con AVX-512 o superiore, e funziona per le CPU con solo AVX2. Pertanto, si prevede che le prestazioni saranno più vantaggiose per le le CPU Intel con AVX-512 o superiori, mentre le CPU con solo AVX2 (ad esempio, le CPU AMD o le CPU Intel più vecchie) potrebbero ottenere prestazioni migliori con IPEX, ma non sono garantite. IPEX offre ottimizzazioni delle prestazioni per l'addestramento della CPU sia con Float32 che con BFloat16. L'uso di BFloat16 è l'argomento principale delle seguenti sezioni.
IPEX è ottimizzato per CPU con AVX-512 o superiore, e funziona per le CPU con solo AVX2. Pertanto, si prevede che le prestazioni saranno più vantaggiose per le CPU Intel con AVX-512 o superiori, mentre le CPU con solo AVX2 (ad esempio, le CPU AMD o le CPU Intel più vecchie) potrebbero ottenere prestazioni migliori con IPEX, ma non sono garantite. IPEX offre ottimizzazioni delle prestazioni per l'addestramento della CPU sia con Float32 che con BFloat16. L'uso di BFloat16 è l'argomento principale delle seguenti sezioni.

Il tipo di dati a bassa precisione BFloat16 è stato supportato in modo nativo su 3rd Generation Xeon® Scalable Processors (aka Cooper Lake) con AVX512 e sarà supportata dalla prossima generazione di Intel® Xeon® Scalable Processors con Intel® Advanced Matrix Extensions (Intel® AMX) instruction set con prestazioni ulteriormente migliorate. L'Auto Mixed Precision per il backende della CPU è stato abilitato da PyTorch-1.10. allo stesso tempo, il supporto di Auto Mixed Precision con BFloat16 per CPU e l'ottimizzazione degli operatori BFloat16 è stata abilitata in modo massiccio in Intel® Extension per PyTorch, and parzialmente aggiornato al branch master di PyTorch. Gli utenti possono ottenere prestazioni migliori ed users experience con IPEX Auto Mixed Precision..

Expand Down
4 changes: 2 additions & 2 deletions src/transformers/models/moshi/modeling_moshi.py
Original file line number Diff line number Diff line change
Expand Up @@ -2277,7 +2277,7 @@ def generate(
generation_config, kwargs = self._prepare_generation_config(kwargs.pop("generation_config", None), **kwargs)

input_ids, user_audio_codes, moshi_audio_codes, concat_unconditional_inputs = (
self._check_and_maybe_initalize_inputs(
self._check_and_maybe_initialize_inputs(
input_ids=input_ids,
user_input_values=user_input_values,
user_audio_codes=user_audio_codes,
Expand Down Expand Up @@ -2707,7 +2707,7 @@ def get_unconditional_inputs(self, num_samples=1):
attention_mask=attention_mask,
)

def _check_and_maybe_initalize_inputs(
def _check_and_maybe_initialize_inputs(
self,
input_ids=None,
user_input_values=None,
Expand Down
10 changes: 5 additions & 5 deletions src/transformers/models/rag/modeling_rag.py
Original file line number Diff line number Diff line change
Expand Up @@ -593,8 +593,8 @@ def forward(
context_input_ids,
context_attention_mask,
retrieved_doc_embeds,
retrived_doc_input_ids,
retrived_doc_attention_mask,
retrieved_doc_input_ids,
retrieved_doc_attention_mask,
retrieved_doc_ids,
) = (
retriever_outputs["context_input_ids"],
Expand All @@ -608,10 +608,10 @@ def forward(
context_input_ids = context_input_ids.to(input_ids)
context_attention_mask = context_attention_mask.to(input_ids)

retrived_doc_input_ids = retrived_doc_input_ids.to(input_ids)
retrived_doc_attention_mask = retrived_doc_attention_mask.to(input_ids)
retrieved_doc_input_ids = retrieved_doc_input_ids.to(input_ids)
retrieved_doc_attention_mask = retrieved_doc_attention_mask.to(input_ids)
retrieved_doc_embeds = self.ctx_encoder(
retrived_doc_input_ids, attention_mask=retrived_doc_attention_mask, return_dict=True
retrieved_doc_input_ids, attention_mask=retrieved_doc_attention_mask, return_dict=True
).pooler_output
retrieved_doc_embeds = retrieved_doc_embeds.view(
-1, n_docs, question_encoder_last_hidden_state.shape[1]
Expand Down
6 changes: 3 additions & 3 deletions src/transformers/models/seamless_m4t/modeling_seamless_m4t.py
Original file line number Diff line number Diff line change
Expand Up @@ -3391,7 +3391,7 @@ def generate(
`Union[SeamlessM4TGenerationOutput, Tuple[Tensor]]`:
- If `return_intermediate_token_ids`, returns [`SeamlessM4TGenerationOutput`].
- If not `return_intermediate_token_ids`, returns a tuple composed of waveforms of shape `(batch_size,
sequence_length)`and and `waveform_lengths` which gives the length of each sample.
sequence_length)` and `waveform_lengths` which gives the length of each sample.
"""
batch_size = len(input_ids) if input_ids is not None else len(kwargs.get("inputs_embeds"))

Expand Down Expand Up @@ -3721,7 +3721,7 @@ def generate(
`Union[SeamlessM4TGenerationOutput, Tuple[Tensor]]`:
- If `return_intermediate_token_ids`, returns [`SeamlessM4TGenerationOutput`].
- If not `return_intermediate_token_ids`, returns a tuple composed of waveforms of shape `(batch_size,
sequence_length)`and and `waveform_lengths` which gives the length of each sample.
sequence_length)` and `waveform_lengths` which gives the length of each sample.
"""
batch_size = len(input_features) if input_features is not None else len(kwargs.get("inputs_embeds"))

Expand Down Expand Up @@ -4132,7 +4132,7 @@ def generate(
`Union[SeamlessM4TGenerationOutput, Tuple[Tensor], ModelOutput]`:
- If `generate_speech` and `return_intermediate_token_ids`, returns [`SeamlessM4TGenerationOutput`].
- If `generate_speech` and not `return_intermediate_token_ids`, returns a tuple composed of waveforms of
shape `(batch_size, sequence_length)`and and `waveform_lengths` which gives the length of each sample.
shape `(batch_size, sequence_length)` and `waveform_lengths` which gives the length of each sample.
- If `generate_speech=False`, it will returns `ModelOutput`.
"""
if input_ids is None and input_features is None and kwargs.get("inputs_embeds", None) is None:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3691,7 +3691,7 @@ def generate(
`Union[SeamlessM4Tv2GenerationOutput, Tuple[Tensor]]`:
- If `return_intermediate_token_ids`, returns [`SeamlessM4Tv2GenerationOutput`].
- If not `return_intermediate_token_ids`, returns a tuple composed of waveforms of shape `(batch_size,
sequence_length)`and and `waveform_lengths` which gives the length of each sample.
sequence_length)` and `waveform_lengths` which gives the length of each sample.
"""
batch_size = len(input_ids) if input_ids is not None else len(kwargs.get("inputs_embeds"))

Expand Down Expand Up @@ -4062,7 +4062,7 @@ def generate(
`Union[SeamlessM4Tv2GenerationOutput, Tuple[Tensor]]`:
- If `return_intermediate_token_ids`, returns [`SeamlessM4Tv2GenerationOutput`].
- If not `return_intermediate_token_ids`, returns a tuple composed of waveforms of shape `(batch_size,
sequence_length)`and and `waveform_lengths` which gives the length of each sample.
sequence_length)` and `waveform_lengths` which gives the length of each sample.
"""
batch_size = len(input_features) if input_features is not None else len(kwargs.get("inputs_embeds"))

Expand Down Expand Up @@ -4514,7 +4514,7 @@ def generate(
`Union[SeamlessM4Tv2GenerationOutput, Tuple[Tensor], ModelOutput]`:
- If `generate_speech` and `return_intermediate_token_ids`, returns [`SeamlessM4Tv2GenerationOutput`].
- If `generate_speech` and not `return_intermediate_token_ids`, returns a tuple composed of waveforms of
shape `(batch_size, sequence_length)`and and `waveform_lengths` which gives the length of each sample.
shape `(batch_size, sequence_length)` and `waveform_lengths` which gives the length of each sample.
- If `generate_speech=False`, it will returns `ModelOutput`.
"""
if input_ids is None and input_features is None and kwargs.get("inputs_embeds", None) is None:
Expand Down
2 changes: 1 addition & 1 deletion tests/models/colpali/test_modeling_colpali.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ def test_model_parallelism(self):
pass

@unittest.skip(
reason="PaliGemmma's SigLip encoder uses the same initialization scheme as the Flax original implementation"
reason="PaliGemma's SigLip encoder uses the same initialization scheme as the Flax original implementation"
)
def test_initialization(self):
pass
Expand Down
4 changes: 2 additions & 2 deletions tests/models/deepseek_v3/test_modeling_deepseek_v3.py
Original file line number Diff line number Diff line change
Expand Up @@ -431,7 +431,7 @@ def test_model_rope_scaling(self):

def test_past_key_values_format(self):
"""
Overwritting to pass the expected cache shapes (Deepseek-V3 uses MLA so the cache shapes are non-standard)
Overwriting to pass the expected cache shapes (Deepseek-V3 uses MLA so the cache shapes are non-standard)
"""
config, inputs = self.model_tester.prepare_config_and_inputs_for_common()
batch_size, seq_length = inputs["input_ids"].shape
Expand All @@ -451,7 +451,7 @@ def test_past_key_values_format(self):
@slow
def test_eager_matches_sdpa_generate(self):
"""
Overwritting the common test as the test is flaky on tiny models
Overwriting the common test as the test is flaky on tiny models
"""
max_new_tokens = 30

Expand Down
2 changes: 1 addition & 1 deletion tests/models/marian/test_tokenization_marian.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ def test_tokenizer_integration(self):
decode_kwargs={"use_source_tokenizer": True},
)

def test_tokenizer_integration_seperate_vocabs(self):
def test_tokenizer_integration_separate_vocabs(self):
tokenizer = MarianTokenizer.from_pretrained("hf-internal-testing/test-marian-two-vocabs")

source_text = "Tämä on testi"
Expand Down
4 changes: 2 additions & 2 deletions tests/models/opt/test_modeling_flax_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def __init__(
embed_dim=16,
word_embed_proj_dim=16,
initializer_range=0.02,
attn_implemetation="eager",
attn_implementation="eager",
):
self.parent = parent
self.batch_size = batch_size
Expand All @@ -92,7 +92,7 @@ def __init__(
self.word_embed_proj_dim = word_embed_proj_dim
self.initializer_range = initializer_range
self.is_encoder_decoder = False
self.attn_implementation = attn_implemetation
self.attn_implementation = attn_implementation

def prepare_config_and_inputs(self):
input_ids = np.clip(ids_tensor([self.batch_size, self.seq_length - 1], self.vocab_size), 3, self.vocab_size)
Expand Down
2 changes: 1 addition & 1 deletion tests/models/paligemma/test_modeling_paligemma.py
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ def test_model_parallelism(self):
pass

@unittest.skip(
reason="PaliGemmma's SigLip encoder uses the same initialization scheme as the Flax original implementation"
reason="PaliGemma's SigLip encoder uses the same initialization scheme as the Flax original implementation"
)
def test_initialization(self):
pass
Expand Down
2 changes: 1 addition & 1 deletion tests/models/paligemma2/test_modeling_paligemma2.py
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ def test_model_parallelism(self):
pass

@unittest.skip(
reason="PaliGemmma's SigLip encoder uses the same initialization scheme as the Flax original implementation"
reason="PaliGemma's SigLip encoder uses the same initialization scheme as the Flax original implementation"
)
def test_initialization(self):
pass
Expand Down
4 changes: 2 additions & 2 deletions utils/check_copies.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
- The list of models in the main README.md matches the ones in the localized READMEs,
- Files that are registered as full copies of one another in the `FULL_COPIES` constant of this script.

This also checks the list of models in the README is complete (has all models) and add a line to complete if there is
This also checks the list of models in the README is complete (has all models) and adds a line to complete if there is
a model missing.

Use from the root of the repo with:
Expand Down Expand Up @@ -420,7 +420,7 @@ def find_code_in_transformers(

# Detail: the `Copied from` statement is originally designed to work with the last part of `TRANSFORMERS_PATH`,
# (which is `transformers`). The same should be applied for `MODEL_TEST_PATH`. However, its last part is `models`
# (to only check and search in it) which is a bit confusing. So we keep the copied statement staring with
# (to only check and search in it) which is a bit confusing. So we keep the copied statement starting with
# `tests.models.` and change it to `tests` here.
if base_path == MODEL_TEST_PATH:
base_path = "tests"
Expand Down