Skip to content

Commit ca93bdb

Browse files
committed
Merge remote-tracking branch 'origin/main' into tp_llama
2 parents f312e55 + 1349321 commit ca93bdb

26 files changed

+149
-204
lines changed

docs/source/en/model_doc/llava.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,10 @@ LLaVa also supports batched inference. Here is how you can do it:
8585
import requests
8686
from PIL import Image
8787
import torch
88-
from transformers import AutoProcessor, LLavaForConditionalGeneration
88+
from transformers import AutoProcessor, LlavaForConditionalGeneration
8989

9090
# Load the model in half-precision
91-
model = LLavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", torch_dtype=torch.float16, device_map="auto")
91+
model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", torch_dtype=torch.float16, device_map="auto")
9292
processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
9393

9494
# Get two different images

docs/source/en/quantization/overview.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -45,19 +45,19 @@ In short, supporting a wide range of quantization methods allows you to pick the
4545

4646
Use the table below to help you decide which quantization method to use.
4747

48-
| Quantization method | On the fly quantization | CPU | CUDA GPU | RoCm GPU (AMD) | Metal (Apple Silicon) | torch.compile() support | Number of bits | Supports fine-tuning (through PEFT) | Serializable with 🤗 transformers | 🤗 transformers support | Link to library |
49-
|-------------------------------------|-------------------------|-----|----------|----------------|-----------------------|-------------------------|----------------|-------------------------------------|--------------|------------------------|---------------------------------------------|
50-
| [AQLM](./aqlm) | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 🟢 | 1 / 2 | 🟢 | 🟢 | 🟢 | https://github.com/Vahe1994/AQLM |
51-
| [AWQ](./awq) | 🔴 | 🔴 | 🟢 | 🟢 | 🔴 | ? | 4 | 🟢 | 🟢 | 🟢 | https://github.com/casper-hansen/AutoAWQ |
52-
| [bitsandbytes](./bitsandbytes) | 🟢 | 🟡 * | 🟢 | 🟡 * | 🔴 ** | 🔴 (soon!) | 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/bitsandbytes-foundation/bitsandbytes |
53-
| [compressed-tensors](./compressed_tensors) | 🔴 | 🟢 | 🟢 | 🟢 | 🔴 | 🔴 | 1 - 8 | 🟢 | 🟢 | 🟢 | https://github.com/neuralmagic/compressed-tensors |
54-
| [EETQ](./eetq) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | ? | 8 | 🟢 | 🟢 | 🟢 | https://github.com/NetEase-FuXi/EETQ |
55-
| GGUF / GGML (llama.cpp) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🔴 | 1 - 8 | 🔴 | [See GGUF section](../gguf) | [See GGUF section](../gguf) | https://github.com/ggerganov/llama.cpp |
56-
| [GPTQ](./gptq) | 🔴 | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 2 - 3 - 4 - 8 | 🟢 | 🟢 | 🟢 | https://github.com/AutoGPTQ/AutoGPTQ |
57-
| [HQQ](./hqq) | 🟢 | 🟢 | 🟢 | 🔴 | 🔴 | 🟢 | 1 - 8 | 🟢 | 🔴 | 🟢 | https://github.com/mobiusml/hqq/ |
58-
| [Quanto](./quanto) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🟢 | 2 / 4 / 8 | 🔴 | 🔴 | 🟢 | https://github.com/huggingface/quanto |
59-
| [FBGEMM_FP8](./fbgemm_fp8.md) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 8 | 🔴 | 🟢 | 🟢 | https://github.com/pytorch/FBGEMM |
60-
| [torchao](./torchao.md) | 🟢 | | 🟢 | 🔴 | partial support (int4 weight only) | | 4 / 8 | | 🟢🔴 | 🟢 | https://github.com/pytorch/ao |
48+
| Quantization method | On the fly quantization | CPU | CUDA GPU | RoCm GPU (AMD) | Metal (Apple Silicon) | Intel GPU | torch.compile() support | Number of bits | Supports fine-tuning (through PEFT) | Serializable with 🤗 transformers | 🤗 transformers support | Link to library |
49+
|-------------------------------------|-------------------------|-----|----------|----------------|-----------------------|-----------|-------------------------|----------------|-------------------------------------|--------------|------------------------|---------------------------------------------|
50+
| [AQLM](./aqlm) | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 🔴 | 🟢 | 1 / 2 | 🟢 | 🟢 | 🟢 | https://github.com/Vahe1994/AQLM |
51+
| [AWQ](./awq) | 🔴 | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | ? | 4 | 🟢 | 🟢 | 🟢 | https://github.com/casper-hansen/AutoAWQ |
52+
| [bitsandbytes](./bitsandbytes) | 🟢 | 🟡 * | 🟢 | 🟡 * | 🔴 ** | 🟡 * | 🔴 (soon!) | 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/bitsandbytes-foundation/bitsandbytes |
53+
| [compressed-tensors](./compressed_tensors) | 🔴 | 🟢 | 🟢 | 🟢 | 🔴 | 🔴 | 🔴 | 1 - 8 | 🟢 | 🟢 | 🟢 | https://github.com/neuralmagic/compressed-tensors |
54+
| [EETQ](./eetq) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | ? | 8 | 🟢 | 🟢 | 🟢 | https://github.com/NetEase-FuXi/EETQ |
55+
| GGUF / GGML (llama.cpp) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 1 - 8 | 🔴 | [See GGUF section](../gguf) | [See GGUF section](../gguf) | https://github.com/ggerganov/llama.cpp |
56+
| [GPTQ](./gptq) | 🔴 | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 🔴 | 2 - 3 - 4 - 8 | 🟢 | 🟢 | 🟢 | https://github.com/AutoGPTQ/AutoGPTQ |
57+
| [HQQ](./hqq) | 🟢 | 🟢 | 🟢 | 🔴 | 🔴 | 🔴 | 🟢 | 1 - 8 | 🟢 | 🔴 | 🟢 | https://github.com/mobiusml/hqq/ |
58+
| [Quanto](./quanto) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🔴 | 🟢 | 2 / 4 / 8 | 🔴 | 🔴 | 🟢 | https://github.com/huggingface/quanto |
59+
| [FBGEMM_FP8](./fbgemm_fp8.md) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 8 | 🔴 | 🟢 | 🟢 | https://github.com/pytorch/FBGEMM |
60+
| [torchao](./torchao.md) | 🟢 | | 🟢 | 🔴 | partial support (int4 weight only) | 🔴 | | 4 / 8 | | 🟢🔴 | 🟢 | https://github.com/pytorch/ao |
6161

6262
<Tip>
6363

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@
117117
"fugashi>=1.0",
118118
"GitPython<3.1.19",
119119
"hf-doc-builder>=0.3.0",
120-
"huggingface-hub>=0.23.2,<1.0",
120+
"huggingface-hub>=0.24.0,<1.0",
121121
"importlib_metadata",
122122
"ipadic>=1.0.0,<2.0",
123123
"isort>=5.5.4",

src/transformers/dependency_versions_table.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
"fugashi": "fugashi>=1.0",
2525
"GitPython": "GitPython<3.1.19",
2626
"hf-doc-builder": "hf-doc-builder>=0.3.0",
27-
"huggingface-hub": "huggingface-hub>=0.23.2,<1.0",
27+
"huggingface-hub": "huggingface-hub>=0.24.0,<1.0",
2828
"importlib_metadata": "importlib_metadata",
2929
"ipadic": "ipadic>=1.0.0,<2.0",
3030
"isort": "isort>=5.5.4",

src/transformers/integrations/integration_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -918,7 +918,7 @@ def on_train_end(self, args, state, control, model=None, tokenizer=None, **kwarg
918918
if self._log_model.is_enabled and self._initialized and state.is_world_process_zero:
919919
from ..trainer import Trainer
920920

921-
fake_trainer = Trainer(args=args, model=model, processing_class=tokenizer)
921+
fake_trainer = Trainer(args=args, model=model, processing_class=tokenizer, eval_dataset=["fake"])
922922
with tempfile.TemporaryDirectory() as temp_dir:
923923
fake_trainer.save_model(temp_dir)
924924
metadata = (

src/transformers/modeling_utils.py

Lines changed: 1 addition & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@
9595
replace_return_docstrings,
9696
strtobool,
9797
)
98-
from .utils.hub import convert_file_size_to_int, create_and_tag_model_card, get_checkpoint_shard_files
98+
from .utils.hub import create_and_tag_model_card, get_checkpoint_shard_files
9999
from .utils.import_utils import (
100100
ENV_VARS_TRUE_VALUES,
101101
is_sagemaker_mp_enabled,
@@ -382,92 +382,6 @@ def check_support_param_buffer_assignment(model_to_load, state_dict, start_prefi
382382
return False
383383

384384

385-
def shard_checkpoint(
386-
state_dict: Dict[str, torch.Tensor], max_shard_size: Union[int, str] = "10GB", weights_name: str = WEIGHTS_NAME
387-
):
388-
"""
389-
Splits a model state dictionary in sub-checkpoints so that the final size of each sub-checkpoint does not exceed a
390-
given size.
391-
392-
The sub-checkpoints are determined by iterating through the `state_dict` in the order of its keys, so there is no
393-
optimization made to make each sub-checkpoint as close as possible to the maximum size passed. For example, if the
394-
limit is 10GB and we have weights of sizes [6GB, 6GB, 2GB, 6GB, 2GB, 2GB] they will get sharded as [6GB], [6+2GB],
395-
[6+2+2GB] and not [6+2+2GB], [6+2GB], [6GB].
396-
397-
<Tip warning={true}>
398-
399-
If one of the model's weight is bigger than `max_shard_size`, it will end up in its own sub-checkpoint which will
400-
have a size greater than `max_shard_size`.
401-
402-
</Tip>
403-
404-
Args:
405-
state_dict (`Dict[str, torch.Tensor]`): The state dictionary of a model to save.
406-
max_shard_size (`int` or `str`, *optional*, defaults to `"10GB"`):
407-
The maximum size of each sub-checkpoint. If expressed as a string, needs to be digits followed by a unit
408-
(like `"5MB"`).
409-
weights_name (`str`, *optional*, defaults to `"pytorch_model.bin"`):
410-
The name of the model save file.
411-
"""
412-
logger.warning(
413-
"Note that `shard_checkpoint` is deprecated and will be removed in v4.44. We recommend you using "
414-
"split_torch_state_dict_into_shards from huggingface_hub library"
415-
)
416-
max_shard_size = convert_file_size_to_int(max_shard_size)
417-
418-
sharded_state_dicts = [{}]
419-
last_block_size = 0
420-
total_size = 0
421-
storage_id_to_block = {}
422-
423-
for key, weight in state_dict.items():
424-
# when bnb serialization is used the weights in the state dict can be strings
425-
# check: https://github.com/huggingface/transformers/pull/24416 for more details
426-
if isinstance(weight, str):
427-
continue
428-
else:
429-
storage_id = id_tensor_storage(weight)
430-
431-
# If a `weight` shares the same underlying storage as another tensor, we put `weight` in the same `block`
432-
if storage_id in storage_id_to_block and weight.device != torch.device("meta"):
433-
block_id = storage_id_to_block[storage_id]
434-
sharded_state_dicts[block_id][key] = weight
435-
continue
436-
437-
weight_size = weight.numel() * dtype_byte_size(weight.dtype)
438-
# If this weight is going to tip up over the maximal size, we split, but only if we have put at least one
439-
# weight in the current shard.
440-
if last_block_size + weight_size > max_shard_size and len(sharded_state_dicts[-1]) > 0:
441-
sharded_state_dicts.append({})
442-
last_block_size = 0
443-
444-
sharded_state_dicts[-1][key] = weight
445-
last_block_size += weight_size
446-
total_size += weight_size
447-
storage_id_to_block[storage_id] = len(sharded_state_dicts) - 1
448-
449-
# If we only have one shard, we return it
450-
if len(sharded_state_dicts) == 1:
451-
return {weights_name: sharded_state_dicts[0]}, None
452-
453-
# Otherwise, let's build the index
454-
weight_map = {}
455-
shards = {}
456-
for idx, shard in enumerate(sharded_state_dicts):
457-
shard_file = weights_name.replace(".bin", f"-{idx+1:05d}-of-{len(sharded_state_dicts):05d}.bin")
458-
shard_file = shard_file.replace(
459-
".safetensors", f"-{idx + 1:05d}-of-{len(sharded_state_dicts):05d}.safetensors"
460-
)
461-
shards[shard_file] = shard
462-
for key in shard.keys():
463-
weight_map[key] = shard_file
464-
465-
# Add the metadata
466-
metadata = {"total_size": total_size}
467-
index = {"metadata": metadata, "weight_map": weight_map}
468-
return shards, index
469-
470-
471385
def load_sharded_checkpoint(model, folder, strict=True, prefer_safe=True):
472386
"""
473387
This is the same as

src/transformers/models/blip_2/modeling_blip_2.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2203,7 +2203,7 @@ def forward(
22032203
logger.warning_once(
22042204
"Expanding inputs for image tokens in BLIP-2 should be done in processing. "
22052205
"Please follow instruction here (https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042) to update your BLIP-2 model. "
2206-
"Using processors without these attributes in the config is deprecated and will throw an error in v4.47."
2206+
"Using processors without these attributes in the config is deprecated and will throw an error in v4.50."
22072207
)
22082208
inputs_embeds = torch.cat([language_model_inputs, inputs_embeds.to(language_model_inputs.device)], dim=1)
22092209
attention_mask = torch.cat(
@@ -2326,7 +2326,7 @@ def generate(
23262326
logger.warning_once(
23272327
"Expanding inputs for image tokens in BLIP-2 should be done in processing. "
23282328
"Please follow instruction here (https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042) to update your BLIP-2 model. "
2329-
"Using processors without these attributes in the config is deprecated and will throw an error in v4.47."
2329+
"Using processors without these attributes in the config is deprecated and will throw an error in v4.50."
23302330
)
23312331
inputs_embeds = torch.cat([language_model_inputs, inputs_embeds.to(language_model_inputs.device)], dim=1)
23322332
attention_mask = torch.cat(

src/transformers/models/blip_2/processing_blip_2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ def __call__(
153153
logger.warning_once(
154154
"Expanding inputs for image tokens in BLIP-2 should be done in processing. "
155155
"Please follow instruction here (https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042) to update your BLIP-2 model. "
156-
"Using processors without these attributes in the config is deprecated and will throw an error in v4.47."
156+
"Using processors without these attributes in the config is deprecated and will throw an error in v4.50."
157157
)
158158

159159
# cast to desired return tensors type

src/transformers/models/instructblip/modeling_instructblip.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1471,7 +1471,7 @@ def forward(
14711471
logger.warning_once(
14721472
"Expanding inputs for image tokens in InstructBLIP should be done in processing. "
14731473
"Please follow instruction here (https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042) to update your InstructBLIP model. "
1474-
"Using processors without these attributes in the config is deprecated and will throw an error in v4.47."
1474+
"Using processors without these attributes in the config is deprecated and will throw an error in v4.50."
14751475
)
14761476
inputs_embeds = torch.cat([language_model_inputs, inputs_embeds.to(language_model_inputs.device)], dim=1)
14771477
attention_mask = torch.cat(
@@ -1610,7 +1610,7 @@ def generate(
16101610
logger.warning_once(
16111611
"Expanding inputs for image tokens in InstructBLIP should be done in processing. "
16121612
"Please follow instruction here (https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042) to update your InstructBLIP model. "
1613-
"Using processors without these attributes in the config is deprecated and will throw an error in v4.47."
1613+
"Using processors without these attributes in the config is deprecated and will throw an error in v4.50."
16141614
)
16151615
inputs_embeds = torch.cat([language_model_inputs, inputs_embeds.to(language_model_inputs.device)], dim=1)
16161616
attention_mask = torch.cat(

src/transformers/models/instructblip/processing_instructblip.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ def __call__(
148148
logger.warning_once(
149149
"Expanding inputs for image tokens in InstructBLIP should be done in processing. "
150150
"Please follow instruction here (https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042) to update your InstructBLIP model. "
151-
"Using processors without these attributes in the config is deprecated and will throw an error in v4.47."
151+
"Using processors without these attributes in the config is deprecated and will throw an error in v4.50."
152152
)
153153

154154
# cast to desired return tensors type after concatenating

0 commit comments

Comments
 (0)