add sdpa to OPT #33298

avishaiElmakies · 2024-09-04T12:37:05Z

adds SDPA to OPT model

impl inspired by gemma2 and llama.

part of #28005.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

I think @amyeroberts @fxmarty

some notes:

I did some refactoring to the model code. created a function _update_key_and_values. the code was used by all 3 attention implementations. self._shape to _shape. and moved masks logic to self._update_casual_mask.
I created a test that makes sure the generate of eager and SDPA are equivalent. the test is similar to the one llama
I seem to fail 3 implementations of common tests test_eager_matches_sdpa_inference_0_float16, test_eager_matches_sdpa_inference_1_bfloat16, test_eager_matches_sdpa_inference_2_float32. I took inspiration from gemma, which seems to ignore those tests as well. should i ignore them as well? I think it is related also related to The implementations of LlamaAttention and LlamaSdpaAttention are not equivalent. #32086 since my code is similar to that. Also seems the affect equivalence with flax/tf beacuse the default will become sdpa.

would love some feedback!

amyeroberts

Thanks for working on adding this!

src/transformers/models/opt/modeling_opt.py

…ntainability

avishaiElmakies · 2024-09-10T12:26:37Z

@amyeroberts fixed the comments. would love to understand what to do with the tests that are failing

amyeroberts · 2024-09-10T18:39:14Z

@avishaiElmakies Similar to my comment for DinoV2 -- other PRs can be a good reference for how to fix. There you will see that many of the tests enforce the test model to use eager attention. This will resolve the TF/flax equivalence issues.

For the sdpa equivalence tests, I'm not sure. As the flash attention implementation follows llama, following llama's tests for sdpa seems reasonable cc @ArthurZucker here who might now more about this

avishaiElmakies · 2024-09-17T11:44:28Z

Hi @amyeroberts. looked at the faiiling test

changed the test test_eager_matches_sdpa_generate to match the one in llama, fixed test_pt_tf_model_equivalence.

the tests in the file test_modelling_tf_opt.py. test_xla_generate_contrastive and test_xla_generate_slow. don't seem related to this PR. they also fail in main.

the only tests which I would love some guidance on are the test_eager_matches_sdpa_inference. generating seems to work the same with both implementations. from what I understand and after looking at it the behavior for both implementations is different in the inference test. as stated by #32086. fixing this might require a refactor, to make sure behaviors match.

would love some guidance.

EDIT: I noticed when running the tests using your setup, it fails on test_eager_matches_sdpa_generate. but when running the test on my machine it passes.

HuggingFaceDocBuilderDev · 2024-09-18T10:23:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu · 2024-09-21T01:25:42Z

The code looks eerily similar to bart's implementation, it might be more beneficial to let it "inherit" from bart via # Copied from ... statements and adjust it to fit in with barts uses of attention masks etc instead of manually figuring stuff out 👀

Hence, it's also more appropriate to look into encoder-decoder models like bart instead of llama who are decoder only regarding the attention implementation.

src/transformers/models/opt/modeling_opt.py

vasqu · 2024-09-21T02:43:02Z

Ok #17437 seems to be the cause why the copied from doesn't exist in the first place. Is there a way to just ignore a few lines since it seems a bit too much to me to remove the whole copied from then 👀

avishaiElmakies · 2024-09-21T05:45:41Z

@vasqu I can take inspiration from Bart instead of llama,if that helps. What do you think?

…e#31704) * Add compressed-tensors HFQuantizer implementation * flag serializable as False * run * revive lines deleted by ruff * fixes to load+save from sparseml, edit config to quantization_config, and load back * address satrat comment * compressed_tensors to compressed-tensors and revert back is_serializable * rename quant_method from sparseml to compressed-tensors * tests * edit tests * clean up tests * make style * cleanup * cleanup * add test skip for when compressed tensors is not installed * remove pydantic import + style * delay torch import in test * initial docs * update main init for compressed tensors config * make fix-copies * docstring * remove fill_docstring * Apply suggestions from code review Co-authored-by: Marc Sun <[email protected]> * review comments * review comments * comments - suppress warnings on state dict load, tests, fixes * bug-fix - remove unnecessary call to apply quant lifecycle * run_compressed compatability * revert changes not needed for compression * no longer need unexpected keys fn * unexpected keys not needed either * Apply suggestions from code review Co-authored-by: Marc Sun <[email protected]> * add to_diff_dict * update docs and expand testing * Update _toctree.yml with compressed-tensors * Update src/transformers/utils/quantization_config.py Co-authored-by: Arthur <[email protected]> * update doc * add note about saving a loaded model --------- Co-authored-by: George Ohashi <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Sara Adkins <[email protected]> Co-authored-by: Sara Adkins <[email protected]> Co-authored-by: Arthur <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Dipika <[email protected]>

avishaiElmakies · 2024-09-25T13:12:04Z

@amyeroberts is there some code to run those benchmarks?

EDIT: nvm think i found it.

avishaiElmakies · 2024-09-25T15:39:42Z

@amyeroberts ran benchmarks

used code from here #31031

local resources - (L40S-45GB, PyTorch 2.4.0, OS Debian GNU/Linux 11

Training

batch_size	seq_len	Time per batch (eager - s)	Time per batch (sdpa - s)	Speedup (%)	Eager peak mem (MB)	sdpa peak mem (MB)	Mem saving (%)
1	128	0.047	0.037	26.360	1474.611	1474.32	0.019
1	256	0.046	0.037	24.335	1498.541	1499.49	-0.063
1	512	0.046	0.037	24.959	1973.544	1551.35	27.215
1	1024	0.062	0.038	65.135	4867.113	1698.35	186.578
1	2048	0.230	0.039	483.933	15662.224	2715.75	476.718
2	128	0.045	0.037	20.455	1498.164	1499.49	-0.089
2	256	0.046	0.037	24.027	1569.367	1551.35	1.161
2	512	0.045	0.037	20.965	3257.074	1698.35	91.778
2	1024	0.122	0.038	225.958	9054.405	2715.75	233.403
2	2048	0.464	0.067	593.646	30572.058	4750.55	543.548
4	128	0.045	0.037	21.918	1549.448	1551.35	-0.123
4	256	0.044	0.038	18.084	2451.768	1698.35	44.361
4	512	0.069	0.037	84.421	5833.180	2715.75	114.791
4	1024	0.262	0.062	319.475	17427.842	4750.55	266.860
4	2048	OOM	0.062	Eager OOM	OOM	4750.55	Eager OOM
8	128	0.044	0.037	18.436	2049.115	1697.78	20.694
8	256	0.048	0.036	32.887	4222.567	2715.75	55.484
8	512	0.153	0.06	154.862	10985.391	4750.55	131.245
8	1024	0.526	0.122	330.697	34175.763	8821.18	287.428
8	2048	OOM	0.122	Eager OOM	OOM	8821.18	Eager OOM

Inference

batch_size	seq_len	Per token latency eager (ms)	Per token latency SDPA (ms)	Speedup (%)	Mem eager (MB)	Mem BT (MB)	Mem saved (%)
1	128	11.634	8.647	34.546	717.676	717.674	0
1	256	11.593	8.86	30.851	742.852	742.845	0.001
1	512	11.515	8.816	30.614	798.232	799.593	-0.17
1	1024	11.556	8.915	29.628	917.265	895.538	2.426
2	128	12.724	11.002	15.659	762.434	762.431	0
2	256	12.704	11.063	14.83	816.809	816.733	0.009
2	512	12.757	10.947	16.535	917.383	918.339	-0.104
2	1024	13.018	11.018	18.147	1162.65	1114.81	4.291
4	128	12.739	10.959	16.243	856.335	856.483	-0.017
4	256	12.718	10.837	17.355	957.298	957.674	-0.039
4	512	12.813	10.822	18.393	1158.44	1158.45	-0.001
4	1024	13.416	11.06	21.301	1653.42	1557.19	6.18
8	128	12.763	10.891	17.193	1036.13	1036.51	-0.036
8	256	12.89	11.104	16.085	1236.98	1236.87	0.01
8	512	13.327	10.939	21.836	1642.29	1641.78	0.031
8	1024	15.181	11.175	35.848	2634.98	2443.35	7.843

updated model card opt.md as well

vasqu · 2024-09-25T16:44:57Z

@avishaiElmakies Can you make the [slow-run] opt commit; it's needed to start slow runs. Great benchmarks btw!

Ci errors are not related.

avishaiElmakies · 2024-09-25T16:58:59Z

@vasqu commited.

Yeah, the benchmarks look great. I was not expecting this much of an improvement when training

vasqu · 2024-09-25T17:00:22Z

slow runs cc @amyeroberts

avishaiElmakies · 2024-09-25T17:25:45Z

@amyeroberts it seems it have skipped the tests, do you know what happend?

amyeroberts · 2024-09-27T10:28:42Z

@avishaiElmakies I think it's because the message in the commit message is [slow-run] rather than [run-slow]. Could you try again with [run-slow] opt?

avishaiElmakies · 2024-09-27T13:13:11Z

@amyeroberts sorry about that, commited with [run-slow] opt

vasqu · 2024-10-01T14:04:25Z

The failures seem unrelated to me tbh. XLA should be tf stuff, no? @amyeroberts

amyeroberts · 2024-10-02T13:19:12Z

@vasqu Yes, it should be TF stuff, although the TF files are touched in this PR so they should be passing too. Could you confirm if the tests are passing on main or not?

cc @Rocketknight1 @ArthurZucker

avishaiElmakies · 2024-10-02T13:37:43Z

@amyeroberts at the time those tests were failing in main as well

Rocketknight1 · 2024-10-02T15:55:54Z

Confirmed that test_xla_generate_contrastive and test_xla_generate_slow are failing on main for me in TF - I think it's fine to just add skips to those tests in this PR for now, since it's not related to this PR.

vasqu · 2024-10-02T16:03:42Z

Passing locally on my machine with a GPU (tf_opt_logs.txt) and not with a CPU (tf_opt_logs_cpu.txt).

Seems like a CPU issue then? FYI @Rocketknight1
cc @amyeroberts

ArthurZucker · 2024-10-04T15:38:52Z

Yep, we can add the wrapper require_accelerator!

vasqu · 2024-10-04T16:37:49Z

@ArthurZucker There's a proper fix in #33903 since the GPU is just silently ignoring errors instead of throwing errors in TF even though we go beyond the possible positional embedding range.

avishaiElmakies · 2024-10-08T21:23:50Z

@ArthurZucker @amyeroberts do i need to do anything or can this be merged?

ArthurZucker · 2024-10-10T09:49:48Z

Sorry for the delay and thanks for the contribution! 🤗

* add sdpa to OPT * chore: remove redundant whitespace in OPTDecoder class * fixup * bug fix * add sdpa and attention generate test * fixup * Refactor OPTAttention forward method for improved readability and maintainability * undo refactor for _shape and key,val states * add OPT to doc, fixup didn't find it for some reason * change order * change default attn_implemntation in testing to eager * [run-slow] opt * change test_eager_matches_sdpa_generate to the one llama * Update default attention implementation in testing common * [run-slow] opt * remove uneeded print * [run-slow] opt * refactor model testers to have attn_implementation="eager" * [run-slow] opt * convert test_eager_matches_sdpa_generate to opt-350M * bug fix when creating mask for opt * [run-slow] opt * if layer head mask default to eager * if head mask is not none fall to eager * [run-slow] opt * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: amyeroberts <[email protected]> * Clean up Unpack imports (huggingface#33631) clean up Unpack imports * Fix DPT /Dinov2 sdpa regression on main (huggingface#33660) * fallback to eager if output attentions. * fix copies * handle dependency errors in check_imports (huggingface#33622) * handle dependency errors in check_imports * change log level to warning * add back self.max_position_embeddings = config.max_position_embeddings (huggingface#33550) * add back self.max_position_embeddings = config.max_position_embeddings * fix-copies * Fix Llava conversion for LlavaQwen2ForCausalLM with Clip vision tower (huggingface#33613) fix llavaqwen2 model conversion * Uniformize kwargs for Udop processor and update docs (huggingface#33628) * Add optional kwargs and uniformize udop * cleanup Unpack * nit Udop * Generation: deprecate `PreTrainedModel` inheriting from `GenerationMixin` (huggingface#33203) * Enable BNB multi-backend support (huggingface#31098) * enable cpu bnb path * fix style * fix code style * fix 4 bit path * Update src/transformers/utils/import_utils.py Co-authored-by: Aarni Koskela <[email protected]> * add multi backend refactor tests * fix style * tweak 4bit quantizer + fix corresponding tests * tweak 8bit quantizer + *try* fixing corresponding tests * fix dequant bnb 8bit * account for Intel CPU in variability of expected outputs * enable cpu and xpu device map * further tweaks to account for Intel CPU * fix autocast to work with both cpu + cuda * fix comments * fix comments * switch to testing_utils.torch_device * allow for xpu in multi-gpu tests * fix tests 4bit for CPU NF4 * fix bug with is_torch_xpu_available needing to be called as func * avoid issue where test reports attr err due to other failure * fix formatting * fix typo from resolving of merge conflict * polish based on last PR review Co-authored-by: Marc Sun <[email protected]> * fix CI * Update src/transformers/integrations/integration_utils.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/integrations/integration_utils.py Co-authored-by: Arthur <[email protected]> * fix error log * fix error msg * add \n in error log * make quality * rm bnb cuda restriction in doc * cpu model don't need dispatch * fix doc * fix style * check cuda avaliable in testing * fix tests * Update docs/source/en/model_doc/chameleon.md Co-authored-by: Marc Sun <[email protected]> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Aarni Koskela <[email protected]> * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Aarni Koskela <[email protected]> * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Aarni Koskela <[email protected]> * fix doc * fix check multibackends * fix import sort * remove check torch in bnb * docs: update bitsandbytes references with multi-backend info * docs: fix small mistakes in bnb paragraph * run formatting * reveret bnb check * move bnb multi-backend check to import_utils * Update src/transformers/utils/import_utils.py Co-authored-by: Aarni Koskela <[email protected]> * fix bnb check * minor fix for bnb * check lib first * fix code style * Revert "run formatting" This reverts commit ac108c6. * fix format * give warning when bnb version is low and no cuda found] * fix device assignment check to be multi-device capable * address akx feedback on get_avlbl_dev fn * revert partially, as we don't want the function that public, as docs would be too much (enforced) --------- Co-authored-by: Aarni Koskela <[email protected]> Co-authored-by: Titus von Koeller <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Arthur <[email protected]> * Fix error string after refactoring into get_chat_template (huggingface#33652) * Fix error string after refactoring into get_chat_template * Take suggestion from CR Co-authored-by: Matt <[email protected]> --------- Co-authored-by: Matt <[email protected]> * uniformize git processor (huggingface#33668) * uniformize git processor * update doctring * Modular `transformers`: modularity and inheritance for new model additions (huggingface#33248) * update exampel * update * push the converted diff files for testing and ci * correct one example * fix class attributes and docstring * nits * oups * fixed config! * update * nitd * class attributes are not matched against the other, this is missing * fixed overwriting self.xxx now onto the attributes I think * partial fix, now order with docstring * fix docstring order? * more fixes * update * fix missing docstrings! * examples don't all work yet * fixup * nit * updated * hick * update * delete * update * update * update * fix * all default * no local import * fix more diff * some fix related to "safe imports" * push fixed * add helper! * style * add a check * all by default * add the * update * FINALLY! * nit * fix config dependencies * man that is it * fix fix * update diffs * fix the last issue * re-default to all * alll the fixes * nice * fix properties vs setter * fixup * updates * update dependencies * make sure to install what needs to be installed * fixup * quick fix for now * fix! * fixup * update * update * updates * whitespaces * nit * fix * simplify everything, and make it file agnostic (should work for image processors) * style * finish fixing all import issues * fixup * empty modeling should not be written! * Add logic to find who depends on what * update * cleanup * update * update gemma to support positions * some small nits * this is the correct docstring for gemma2 * fix merging of docstrings * update * fixup * update * take doc into account * styling * update * fix hidden activation * more fixes * final fixes! * fixup * fixup instruct blip video * update * fix bugs * align gemma2 with the rest as well * updats * revert * update * more reversiom * grind * more * arf * update * order will matter * finish del stuff * update * rename to modular * fixup * nits * update makefile * fixup * update order of the checks! * fix * fix docstring that has a call inside * fiix conversion check * style * add some initial documentation * update * update doc * some fixup * updates * yups * Mostly todo gimme a minut * update * fixup * revert some stuff * Review docs for the modular transformers (huggingface#33472) Docs * good update * fixup * mmm current updates lead to this code * okay, this fixes it * cool * fixes * update * nit * updates * nits * fix doc * update * revert bad changes * update * updates * proper update * update * update? * up * update * cool * nits * nits * bon bon * fix * ? * minimise changes * update * update * update * updates? * fixed gemma2 * kind of a hack * nits * update * remove `diffs` in favor of `modular` * fix make fix copies --------- Co-authored-by: Lysandre Debut <[email protected]> * Fix CIs post merging modular transformers (huggingface#33681) update * Fixed docstring for cohere model regarding unavailability of prune_he… (huggingface#33253) * Fixed docstring for cohere model regarding unavailability of prune_head() methods The docstring mentions that cohere model supports prune_heads() methods. I have fixed the docstring by explicitly mentioning that it doesn't support that functionality. * Update src/transformers/models/cohere/modeling_cohere.py --------- Co-authored-by: Lysandre Debut <[email protected]> * Generation tests: update imagegpt input name, remove unused functions (huggingface#33663) * Improve Error Messaging for Flash Attention 2 on CPU (huggingface#33655) Update flash-attn error message on CPU Rebased to latest branch * Gemma2: fix config initialization (`cache_implementation`) (huggingface#33684) * Fix ByteLevel alphabet missing when Sequence pretokenizer is used (huggingface#33556) * Fix ByteLevel alphabet missing when Sequence pretokenizer is used * Fixed formatting with `ruff`. * Uniformize kwargs for image-text-to-text processors (huggingface#32544) * uniformize FUYU processor kwargs * Uniformize instructblip processor kwargs * Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2 * Uniformize llava_next processor * Fix save_load test for processor with chat_template only as extra init args * Fix import Unpack * Fix Fuyu Processor import * Fix FuyuProcessor import * Fix FuyuProcessor * Add defaults for specific kwargs kosmos2 * Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs * Add tests processor Udop * remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature * Fix overwrite tests kwargs processors * Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop * Fix processing test fuyu * remove unnecessary pad_token check in instructblip ProcessorTest * Fix BC tests and cleanup * FIx imports fuyu * Uniformize Pix2Struct * Fix wrong name for FuyuProcessorKwargs * Fix slow tests reversed inputs align fuyu llava-next, change udop warning * Fix wrong logging import udop * Add check images text input order * Fix copies * change text pair handling when positional arg * rebase on main, fix imports in test_processing_common * remove optional args and udop uniformization from this PR * fix failing tests * remove unnecessary test, fix processing utils and test processing common * cleanup Unpack * cleanup * fix conflict grounding dino * 🚨🚨 Setting default behavior of assisted decoding (huggingface#33657) * tests: fix pytorch tensor placement errors (huggingface#33485) This commit fixes the following errors: * Fix "expected all tensors to be on the same device" error * Fix "can't convert device type tensor to numpy" According to pytorch documentation torch.Tensor.numpy(force=False) performs conversion only if tensor is on CPU (plus few other restrictions) which is not the case. For our case we need force=True since we just need a data and don't care about tensors coherency. Fixes: huggingface#33517 See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html Signed-off-by: Dmitry Rogozhkin <[email protected]> * bump tokenizers, fix added tokens fast (huggingface#32535) * update based on tokenizers release * update * nits * update * revert re addition * don't break that yet * fmt * revert unwanted * update tokenizers version * update dep table * update * update in conversion script as well * some fix * revert * fully revert * fix training * remove set trace * fixup * update * update * [Pixtral] Improve docs, rename model (huggingface#33491) * Improve docs, rename model * Fix style * Update repo id * fix code quality after merge * HFQuantizer implementation for compressed-tensors library (huggingface#31704) * Add compressed-tensors HFQuantizer implementation * flag serializable as False * run * revive lines deleted by ruff * fixes to load+save from sparseml, edit config to quantization_config, and load back * address satrat comment * compressed_tensors to compressed-tensors and revert back is_serializable * rename quant_method from sparseml to compressed-tensors * tests * edit tests * clean up tests * make style * cleanup * cleanup * add test skip for when compressed tensors is not installed * remove pydantic import + style * delay torch import in test * initial docs * update main init for compressed tensors config * make fix-copies * docstring * remove fill_docstring * Apply suggestions from code review Co-authored-by: Marc Sun <[email protected]> * review comments * review comments * comments - suppress warnings on state dict load, tests, fixes * bug-fix - remove unnecessary call to apply quant lifecycle * run_compressed compatability * revert changes not needed for compression * no longer need unexpected keys fn * unexpected keys not needed either * Apply suggestions from code review Co-authored-by: Marc Sun <[email protected]> * add to_diff_dict * update docs and expand testing * Update _toctree.yml with compressed-tensors * Update src/transformers/utils/quantization_config.py Co-authored-by: Arthur <[email protected]> * update doc * add note about saving a loaded model --------- Co-authored-by: George Ohashi <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Sara Adkins <[email protected]> Co-authored-by: Sara Adkins <[email protected]> Co-authored-by: Arthur <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Dipika <[email protected]> * update model card for opt * add batch size to inference table * [slow-run] opt * [run-slow] opt --------- Signed-off-by: Dmitry Rogozhkin <[email protected]> Co-authored-by: Avishai Elmakies <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Pablo Montalvo <[email protected]> Co-authored-by: chengchengpei <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: Joao Gante <[email protected]> Co-authored-by: jiqing-feng <[email protected]> Co-authored-by: Aarni Koskela <[email protected]> Co-authored-by: Titus von Koeller <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Arthur <[email protected]> Co-authored-by: Tibor Reiss <[email protected]> Co-authored-by: Matt <[email protected]> Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: Muhammad Naufil <[email protected]> Co-authored-by: sizhky <[email protected]> Co-authored-by: Umar Butler <[email protected]> Co-authored-by: Jonathan Mamou <[email protected]> Co-authored-by: Dmitry Rogozhkin <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Arthur Zucker <[email protected]> Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: George Ohashi <[email protected]> Co-authored-by: Sara Adkins <[email protected]> Co-authored-by: Sara Adkins <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Dipika <[email protected]>

Avishai Elmakies added 6 commits August 29, 2024 17:28

add sdpa to OPT

35e1a64

chore: remove redundant whitespace in OPTDecoder class

908e39b

fixup

c84a4dd

bug fix

be32f92

add sdpa and attention generate test

8063994

fixup

248029a

amyeroberts reviewed Sep 5, 2024

View reviewed changes

src/transformers/models/opt/modeling_opt.py Outdated Show resolved Hide resolved

src/transformers/models/opt/modeling_opt.py Outdated Show resolved Hide resolved

Avishai Elmakies added 4 commits September 8, 2024 11:58

Refactor OPTAttention forward method for improved readability and mai…

b66e3d8

…ntainability

undo refactor for _shape and key,val states

579d60e

add OPT to doc, fixup didn't find it for some reason

b105376

change order

c349632

Avishai Elmakies and others added 2 commits September 11, 2024 12:51

change default attn_implemntation in testing to eager

6dba8b0

Merge branch 'main' into spda_opt

989625b

amyeroberts added the run-slow label Sep 12, 2024

Avishai Elmakies added 4 commits September 16, 2024 13:30

[run-slow] opt

1d21751

change test_eager_matches_sdpa_generate to the one llama

7233fda

Update default attention implementation in testing common

9bacdeb

[run-slow] opt

5b38f78

Avishai Elmakies added 4 commits September 17, 2024 15:51

remove uneeded print

3f24a04

[run-slow] opt

2efd25a

refactor model testers to have attn_implementation="eager"

bdd9cb2

[run-slow] opt

f80e3b3

vasqu reviewed Sep 21, 2024

View reviewed changes

src/transformers/models/opt/modeling_opt.py Show resolved Hide resolved

ArthurZucker and others added 2 commits September 25, 2024 16:08

fix code quality after merge

34a9142

Merge branch 'main' into spda_opt

3e69375

Avishai Elmakies added 2 commits September 25, 2024 18:40

update model card for opt

a9b18dc

add batch size to inference table

9876dbb

[slow-run] opt

ff35bbc

[run-slow] opt

cfd1209

vasqu mentioned this pull request Oct 3, 2024

[TF] Fix Tensorflow XLA Generation on limited seq_len models #33903

Merged

5 tasks

ArthurZucker merged commit a265600 into huggingface:main Oct 10, 2024

avishaiElmakies deleted the spda_opt branch October 10, 2024 10:08

matthewdouglas mentioned this pull request Oct 25, 2024

Fix bnb training test failure #34414

Merged

5 tasks

add sdpa to OPT #33298

add sdpa to OPT #33298

Uh oh!

Conversation

avishaiElmakies commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

Who can review?

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

avishaiElmakies commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amyeroberts commented Sep 10, 2024

Uh oh!

avishaiElmakies commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 18, 2024

Uh oh!

vasqu commented Sep 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vasqu commented Sep 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avishaiElmakies commented Sep 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avishaiElmakies commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avishaiElmakies commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Training

Inference

Uh oh!

vasqu commented Sep 25, 2024

Uh oh!

avishaiElmakies commented Sep 25, 2024

Uh oh!

vasqu commented Sep 25, 2024

Uh oh!

avishaiElmakies commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amyeroberts commented Sep 27, 2024

Uh oh!

avishaiElmakies commented Sep 27, 2024

Uh oh!

vasqu commented Oct 1, 2024

Uh oh!

amyeroberts commented Oct 2, 2024

Uh oh!

avishaiElmakies commented Oct 2, 2024

Uh oh!

Rocketknight1 commented Oct 2, 2024

Uh oh!

vasqu commented Oct 2, 2024

Uh oh!

ArthurZucker commented Oct 4, 2024

Uh oh!

vasqu commented Oct 4, 2024

Uh oh!

avishaiElmakies commented Oct 8, 2024

Uh oh!

ArthurZucker commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

avishaiElmakies commented Sep 4, 2024 •

edited

Loading

avishaiElmakies commented Sep 10, 2024 •

edited

Loading

avishaiElmakies commented Sep 17, 2024 •

edited

Loading

vasqu commented Sep 21, 2024 •

edited

Loading

vasqu commented Sep 21, 2024 •

edited

Loading

avishaiElmakies commented Sep 21, 2024 •

edited

Loading

avishaiElmakies commented Sep 25, 2024 •

edited

Loading

avishaiElmakies commented Sep 25, 2024 •

edited

Loading

avishaiElmakies commented Sep 25, 2024 •

edited

Loading