-
Notifications
You must be signed in to change notification settings - Fork 31.7k
[generate] remove cache v4.47 deprecations #36212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! I believe the v4.49 deprecation can also be removed since we released it, almost?
|
I'm running into an issue after this change. Here is a minimal reproducer: from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForLanguageModeling
from datasets import load_dataset
model_id = "hf-internal-testing/tiny-random-MistralForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
def process(samples):
tokenized = tokenizer(samples["quote"], truncation=True, max_length=128)
return tokenized
data = load_dataset("ybelkada/english_quotes_copy")
data = data.map(process, batched=True)
trainer = Trainer(
model=model,
train_dataset=data["train"],
args=TrainingArguments(
num_train_epochs=1,
max_steps=5,
per_device_train_batch_size=4,
output_dir="/tmp/mistral"
),
data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
trainer.train()The error I get is: Re-adding the I know that it has been deprecated, but given the reproducer, I'm not sure how I would even account for that. Is the checkpoint outdated or what else can I do? |
|
ping @gante |
|
Writing some findings: When we compute the loss, we return a When we iterate over the fields while unpacking, i.e. Placing a breakpoint inside At a first glance, and given my very limited knowledge of torch parallel frameworks, it seems like it was expecting the model to return a The fact that it was working before was luck: the old argument in |
|
The issue is, Gather receives a list of In this scenario: outputs = [DynamicCache(...), DynamicCache(...)]
y = list(zip(*outputs)) # PyTorch does this in Gather
# y = [(k1, v1), (k2, v2)]Because PyTorch is casting each call of I'm not totally familiar with the API we want to pursue, but Gather on the cache currently basically doesn't work as the resulting I do see only 1 option to fix this without touching def __init__(self, cache: Optional[Tuple[torch.Tensor, torch.Tensor]] = None) -> None:
super().__init__()
... |
What does this PR do?
(see title)