Skip to content

Conversation

@avishaiElmakies
Copy link
Contributor

@avishaiElmakies avishaiElmakies commented Sep 4, 2024

adds SDPA to OPT model

impl inspired by gemma2 and llama.

part of #28005.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

I think @amyeroberts @fxmarty

some notes:

  • I did some refactoring to the model code. created a function _update_key_and_values. the code was used by all 3 attention implementations. self._shape to _shape. and moved masks logic to self._update_casual_mask.
  • I created a test that makes sure the generate of eager and SDPA are equivalent. the test is similar to the one llama
  • I seem to fail 3 implementations of common tests test_eager_matches_sdpa_inference_0_float16, test_eager_matches_sdpa_inference_1_bfloat16, test_eager_matches_sdpa_inference_2_float32. I took inspiration from gemma, which seems to ignore those tests as well. should i ignore them as well? I think it is related also related to The implementations of LlamaAttention and LlamaSdpaAttention are not equivalent. #32086 since my code is similar to that. Also seems the affect equivalence with flax/tf beacuse the default will become sdpa.

would love some feedback!

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on adding this!

@avishaiElmakies
Copy link
Contributor Author

avishaiElmakies commented Sep 10, 2024

@amyeroberts fixed the comments. would love to understand what to do with the tests that are failing

@amyeroberts
Copy link
Contributor

@avishaiElmakies Similar to my comment for DinoV2 -- other PRs can be a good reference for how to fix. There you will see that many of the tests enforce the test model to use eager attention. This will resolve the TF/flax equivalence issues.

For the sdpa equivalence tests, I'm not sure. As the flash attention implementation follows llama, following llama's tests for sdpa seems reasonable cc @ArthurZucker here who might now more about this

@avishaiElmakies
Copy link
Contributor Author

avishaiElmakies commented Sep 17, 2024

Hi @amyeroberts. looked at the faiiling test

changed the test test_eager_matches_sdpa_generate to match the one in llama, fixed test_pt_tf_model_equivalence.

the tests in the file test_modelling_tf_opt.py. test_xla_generate_contrastive and test_xla_generate_slow. don't seem related to this PR. they also fail in main.

the only tests which I would love some guidance on are the test_eager_matches_sdpa_inference. generating seems to work the same with both implementations. from what I understand and after looking at it the behavior for both implementations is different in the inference test. as stated by #32086. fixing this might require a refactor, to make sure behaviors match.

would love some guidance.

EDIT: I noticed when running the tests using your setup, it fails on test_eager_matches_sdpa_generate. but when running the test on my machine it passes.
image

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu
Copy link
Contributor

vasqu commented Sep 21, 2024

The code looks eerily similar to bart's implementation, it might be more beneficial to let it "inherit" from bart via # Copied from ... statements and adjust it to fit in with barts uses of attention masks etc instead of manually figuring stuff out 👀

Hence, it's also more appropriate to look into encoder-decoder models like bart instead of llama who are decoder only regarding the attention implementation.

@vasqu
Copy link
Contributor

vasqu commented Sep 21, 2024

Ok #17437 seems to be the cause why the copied from doesn't exist in the first place. Is there a way to just ignore a few lines since it seems a bit too much to me to remove the whole copied from then 👀

@avishaiElmakies
Copy link
Contributor Author

avishaiElmakies commented Sep 21, 2024

@vasqu I can take inspiration from Bart instead of llama,if that helps. What do you think?

ArthurZucker and others added 2 commits September 25, 2024 16:08
…e#31704)

* Add compressed-tensors HFQuantizer implementation

* flag serializable as False

* run

* revive lines deleted by ruff

* fixes to load+save from sparseml, edit config to quantization_config, and load back

* address satrat comment

* compressed_tensors to compressed-tensors and revert back is_serializable

* rename quant_method from sparseml to compressed-tensors

* tests

* edit tests

* clean up tests

* make style

* cleanup

* cleanup

* add test skip for when compressed tensors is not installed

* remove pydantic import + style

* delay torch import in test

* initial docs

* update main init for compressed tensors config

* make fix-copies

* docstring

* remove fill_docstring

* Apply suggestions from code review

Co-authored-by: Marc Sun <[email protected]>

* review comments

* review comments

* comments - suppress warnings on state dict load, tests, fixes

* bug-fix - remove unnecessary call to apply quant lifecycle

* run_compressed compatability

* revert changes not needed for compression

* no longer need unexpected keys fn

* unexpected keys not needed either

* Apply suggestions from code review

Co-authored-by: Marc Sun <[email protected]>

* add to_diff_dict

* update docs and expand testing

* Update _toctree.yml with compressed-tensors

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <[email protected]>

* update doc

* add note about saving a loaded model

---------

Co-authored-by: George Ohashi <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: Sara Adkins <[email protected]>
Co-authored-by: Sara Adkins <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Dipika <[email protected]>
@avishaiElmakies
Copy link
Contributor Author

avishaiElmakies commented Sep 25, 2024

@amyeroberts is there some code to run those benchmarks?

EDIT: nvm think i found it.

@avishaiElmakies
Copy link
Contributor Author

avishaiElmakies commented Sep 25, 2024

@amyeroberts ran benchmarks

used code from here #31031

local resources - (L40S-45GB, PyTorch 2.4.0, OS Debian GNU/Linux 11

Training

batch_size seq_len Time per batch (eager - s) Time per batch (sdpa - s) Speedup (%) Eager peak mem (MB) sdpa peak mem (MB) Mem saving (%)
1 128 0.047 0.037 26.360 1474.611 1474.32 0.019
1 256 0.046 0.037 24.335 1498.541 1499.49 -0.063
1 512 0.046 0.037 24.959 1973.544 1551.35 27.215
1 1024 0.062 0.038 65.135 4867.113 1698.35 186.578
1 2048 0.230 0.039 483.933 15662.224 2715.75 476.718
2 128 0.045 0.037 20.455 1498.164 1499.49 -0.089
2 256 0.046 0.037 24.027 1569.367 1551.35 1.161
2 512 0.045 0.037 20.965 3257.074 1698.35 91.778
2 1024 0.122 0.038 225.958 9054.405 2715.75 233.403
2 2048 0.464 0.067 593.646 30572.058 4750.55 543.548
4 128 0.045 0.037 21.918 1549.448 1551.35 -0.123
4 256 0.044 0.038 18.084 2451.768 1698.35 44.361
4 512 0.069 0.037 84.421 5833.180 2715.75 114.791
4 1024 0.262 0.062 319.475 17427.842 4750.55 266.860
4 2048 OOM 0.062 Eager OOM OOM 4750.55 Eager OOM
8 128 0.044 0.037 18.436 2049.115 1697.78 20.694
8 256 0.048 0.036 32.887 4222.567 2715.75 55.484
8 512 0.153 0.06 154.862 10985.391 4750.55 131.245
8 1024 0.526 0.122 330.697 34175.763 8821.18 287.428
8 2048 OOM 0.122 Eager OOM OOM 8821.18 Eager OOM

Inference

batch_size seq_len Per token latency eager (ms) Per token latency SDPA (ms) Speedup (%) Mem eager (MB) Mem BT (MB) Mem saved (%)
1 128 11.634 8.647 34.546 717.676 717.674 0
1 256 11.593 8.86 30.851 742.852 742.845 0.001
1 512 11.515 8.816 30.614 798.232 799.593 -0.17
1 1024 11.556 8.915 29.628 917.265 895.538 2.426
2 128 12.724 11.002 15.659 762.434 762.431 0
2 256 12.704 11.063 14.83 816.809 816.733 0.009
2 512 12.757 10.947 16.535 917.383 918.339 -0.104
2 1024 13.018 11.018 18.147 1162.65 1114.81 4.291
4 128 12.739 10.959 16.243 856.335 856.483 -0.017
4 256 12.718 10.837 17.355 957.298 957.674 -0.039
4 512 12.813 10.822 18.393 1158.44 1158.45 -0.001
4 1024 13.416 11.06 21.301 1653.42 1557.19 6.18
8 128 12.763 10.891 17.193 1036.13 1036.51 -0.036
8 256 12.89 11.104 16.085 1236.98 1236.87 0.01
8 512 13.327 10.939 21.836 1642.29 1641.78 0.031
8 1024 15.181 11.175 35.848 2634.98 2443.35 7.843

updated model card opt.md as well

@vasqu
Copy link
Contributor

vasqu commented Sep 25, 2024

@avishaiElmakies Can you make the [slow-run] opt commit; it's needed to start slow runs. Great benchmarks btw!

Ci errors are not related.

@avishaiElmakies
Copy link
Contributor Author

@vasqu commited.

Yeah, the benchmarks look great. I was not expecting this much of an improvement when training

@vasqu
Copy link
Contributor

vasqu commented Sep 25, 2024

slow runs cc @amyeroberts

@avishaiElmakies
Copy link
Contributor Author

avishaiElmakies commented Sep 25, 2024

@amyeroberts it seems it have skipped the tests, do you know what happend?

@amyeroberts
Copy link
Contributor

@avishaiElmakies I think it's because the message in the commit message is [slow-run] rather than [run-slow]. Could you try again with [run-slow] opt?

@avishaiElmakies
Copy link
Contributor Author

@amyeroberts sorry about that, commited with [run-slow] opt

@vasqu
Copy link
Contributor

vasqu commented Oct 1, 2024

The failures seem unrelated to me tbh. XLA should be tf stuff, no? @amyeroberts

@amyeroberts
Copy link
Contributor

@vasqu Yes, it should be TF stuff, although the TF files are touched in this PR so they should be passing too. Could you confirm if the tests are passing on main or not?

cc @Rocketknight1 @ArthurZucker

@avishaiElmakies
Copy link
Contributor Author

@amyeroberts at the time those tests were failing in main as well

@Rocketknight1
Copy link
Member

Confirmed that test_xla_generate_contrastive and test_xla_generate_slow are failing on main for me in TF - I think it's fine to just add skips to those tests in this PR for now, since it's not related to this PR.

@vasqu
Copy link
Contributor

vasqu commented Oct 2, 2024

Passing locally on my machine with a GPU (tf_opt_logs.txt) and not with a CPU (tf_opt_logs_cpu.txt).

Seems like a CPU issue then? FYI @Rocketknight1
cc @amyeroberts

@ArthurZucker
Copy link
Collaborator

Yep, we can add the wrapper require_accelerator!

@vasqu
Copy link
Contributor

vasqu commented Oct 4, 2024

@ArthurZucker There's a proper fix in #33903 since the GPU is just silently ignoring errors instead of throwing errors in TF even though we go beyond the possible positional embedding range.

@avishaiElmakies
Copy link
Contributor Author

@ArthurZucker @amyeroberts do i need to do anything or can this be merged?

@ArthurZucker ArthurZucker merged commit a265600 into huggingface:main Oct 10, 2024
@ArthurZucker
Copy link
Collaborator

Sorry for the delay and thanks for the contribution! 🤗

@avishaiElmakies avishaiElmakies deleted the spda_opt branch October 10, 2024 10:08
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* add sdpa to OPT

* chore: remove redundant whitespace in OPTDecoder class

* fixup

* bug fix

* add sdpa and attention generate test

* fixup

* Refactor OPTAttention forward method for improved readability and maintainability

* undo refactor for _shape and key,val states

* add OPT to doc, fixup didn't find it for some reason

* change order

* change default attn_implemntation in testing to eager

* [run-slow] opt

* change test_eager_matches_sdpa_generate to the one llama

* Update default attention implementation in testing common

* [run-slow] opt

* remove uneeded print

* [run-slow] opt

* refactor model testers to have attn_implementation="eager"

* [run-slow] opt

* convert test_eager_matches_sdpa_generate to opt-350M

* bug fix when creating mask for opt

* [run-slow] opt

* if layer head mask default to eager

* if head mask is not none fall to eager

* [run-slow] opt

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: amyeroberts <[email protected]>

* Clean up Unpack imports (huggingface#33631)

clean up Unpack imports

* Fix DPT /Dinov2 sdpa regression on main (huggingface#33660)

* fallback to eager if output attentions.

* fix copies

* handle dependency errors in check_imports (huggingface#33622)

* handle dependency errors in check_imports

* change log level to warning

* add back self.max_position_embeddings = config.max_position_embeddings (huggingface#33550)

* add back self.max_position_embeddings = config.max_position_embeddings

* fix-copies

* Fix Llava conversion for LlavaQwen2ForCausalLM with Clip vision tower (huggingface#33613)

fix llavaqwen2 model conversion

* Uniformize kwargs for Udop processor and update docs (huggingface#33628)

* Add optional kwargs and uniformize udop

* cleanup Unpack

* nit Udop

* Generation: deprecate `PreTrainedModel` inheriting from `GenerationMixin`  (huggingface#33203)

* Enable BNB multi-backend support (huggingface#31098)

* enable cpu bnb path

* fix style

* fix code style

* fix 4 bit path

* Update src/transformers/utils/import_utils.py

Co-authored-by: Aarni Koskela <[email protected]>

* add multi backend refactor tests

* fix style

* tweak 4bit quantizer + fix corresponding tests

* tweak 8bit quantizer + *try* fixing corresponding tests

* fix dequant bnb 8bit

* account for Intel CPU in variability of expected outputs

* enable cpu and xpu device map

* further tweaks to account for Intel CPU

* fix autocast to work with both cpu + cuda

* fix comments

* fix comments

* switch to testing_utils.torch_device

* allow for xpu in multi-gpu tests

* fix tests 4bit for CPU NF4

* fix bug with is_torch_xpu_available needing to be called as func

* avoid issue where test reports attr err due to other failure

* fix formatting

* fix typo from resolving of merge conflict

* polish based on last PR review

Co-authored-by: Marc Sun <[email protected]>

* fix CI

* Update src/transformers/integrations/integration_utils.py

Co-authored-by: Arthur <[email protected]>

* Update src/transformers/integrations/integration_utils.py

Co-authored-by: Arthur <[email protected]>

* fix error log

* fix error msg

* add \n in error log

* make quality

* rm bnb cuda restriction in doc

* cpu model don't need dispatch

* fix doc

* fix style

* check cuda avaliable in testing

* fix tests

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: Marc Sun <[email protected]>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Aarni Koskela <[email protected]>

* Update tests/quantization/bnb/test_4bit.py

Co-authored-by: Aarni Koskela <[email protected]>

* Update tests/quantization/bnb/test_4bit.py

Co-authored-by: Aarni Koskela <[email protected]>

* fix doc

* fix check multibackends

* fix import sort

* remove check torch in bnb

* docs: update bitsandbytes references with multi-backend info

* docs: fix small mistakes in bnb paragraph

* run formatting

* reveret bnb check

* move bnb multi-backend check to import_utils

* Update src/transformers/utils/import_utils.py

Co-authored-by: Aarni Koskela <[email protected]>

* fix bnb check

* minor fix for bnb

* check lib first

* fix code style

* Revert "run formatting"

This reverts commit ac108c6.

* fix format

* give warning when bnb version is low and no cuda found]

* fix device assignment check to be multi-device capable

* address akx feedback on get_avlbl_dev fn

* revert partially, as we don't want the function that public, as docs would be too much (enforced)

---------

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Titus von Koeller <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: Arthur <[email protected]>

* Fix error string after refactoring into get_chat_template (huggingface#33652)

* Fix error string after refactoring into get_chat_template

* Take suggestion from CR

Co-authored-by: Matt <[email protected]>

---------

Co-authored-by: Matt <[email protected]>

* uniformize git processor (huggingface#33668)

* uniformize git processor

* update doctring

* Modular `transformers`: modularity and inheritance for new model additions (huggingface#33248)

* update exampel

* update

* push the converted diff files for testing and ci

* correct one example

* fix class attributes and docstring

* nits

* oups

* fixed config!

* update

* nitd

* class attributes are not matched against the other, this is missing

* fixed overwriting self.xxx now onto the attributes I think

* partial fix, now order with docstring

* fix docstring order?

* more fixes

* update

* fix missing docstrings!

* examples don't all work yet

* fixup

* nit

* updated

* hick

* update

* delete

* update

* update

* update

* fix

* all default

* no local import

* fix more diff

* some fix related to "safe imports"

* push fixed

* add helper!

* style

* add a check

* all by default

* add the

* update

* FINALLY!

* nit

* fix config dependencies

* man that is it

* fix fix

* update diffs

* fix the last issue

* re-default to all

* alll the fixes

* nice

* fix properties vs setter

* fixup

* updates

* update dependencies

* make sure to install what needs to be installed

* fixup

* quick fix for now

* fix!

* fixup

* update

* update

* updates

* whitespaces

* nit

* fix

* simplify everything, and make it file agnostic (should work for image processors)

* style

* finish fixing all import issues

* fixup

* empty modeling should not be written!

* Add logic to find who depends on what

* update

* cleanup

* update

* update gemma to support positions

* some small nits

* this is the correct docstring for gemma2

* fix merging of docstrings

* update

* fixup

* update

* take doc into account

* styling

* update

* fix hidden activation

* more fixes

* final fixes!

* fixup

* fixup instruct  blip video

* update

* fix bugs

* align gemma2 with the rest as well

* updats

* revert

* update

* more reversiom

* grind

* more

* arf

* update

* order will matter

* finish del stuff

* update

* rename to modular

* fixup

* nits

* update makefile

* fixup

* update order of the checks!

* fix

* fix docstring that has a call inside

* fiix conversion check

* style

* add some initial documentation

* update

* update doc

* some fixup

* updates

* yups

* Mostly todo gimme a minut

* update

* fixup

* revert some stuff

* Review docs for the modular transformers (huggingface#33472)

Docs

* good update

* fixup

* mmm current updates lead to this code

* okay, this fixes it

* cool

* fixes

* update

* nit

* updates

* nits

* fix doc

* update

* revert bad changes

* update

* updates

* proper update

* update

* update?

* up

* update

* cool

* nits

* nits

* bon bon

* fix

* ?

* minimise changes

* update

* update

* update

* updates?

* fixed gemma2

* kind of a hack

* nits

* update

* remove `diffs` in favor of `modular`

* fix make fix copies

---------

Co-authored-by: Lysandre Debut <[email protected]>

* Fix CIs post merging modular transformers (huggingface#33681)

update

* Fixed docstring for cohere model regarding unavailability of prune_he… (huggingface#33253)

* Fixed docstring for cohere model regarding unavailability of prune_head() methods

The docstring mentions that cohere model supports prune_heads() methods. I have fixed the docstring by explicitly mentioning that it doesn't support that functionality.

* Update src/transformers/models/cohere/modeling_cohere.py

---------

Co-authored-by: Lysandre Debut <[email protected]>

* Generation tests: update imagegpt input name, remove unused functions (huggingface#33663)

* Improve Error Messaging for Flash Attention 2 on CPU (huggingface#33655)

Update flash-attn error message on CPU

Rebased to latest branch

* Gemma2: fix config initialization (`cache_implementation`) (huggingface#33684)

* Fix ByteLevel alphabet missing when Sequence pretokenizer is used (huggingface#33556)

* Fix ByteLevel alphabet missing when Sequence pretokenizer is used

* Fixed formatting with `ruff`.

* Uniformize kwargs for image-text-to-text processors (huggingface#32544)

* uniformize FUYU processor kwargs

* Uniformize instructblip processor kwargs

* Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2

* Uniformize llava_next processor

* Fix save_load test for processor with chat_template only as extra init args

* Fix import Unpack

* Fix Fuyu Processor import

* Fix FuyuProcessor import

* Fix FuyuProcessor

* Add defaults for specific kwargs kosmos2

* Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs

* Add tests processor Udop

* remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature

* Fix overwrite tests kwargs processors

* Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop

* Fix processing test fuyu

* remove unnecessary pad_token check in instructblip ProcessorTest

* Fix BC tests and cleanup

* FIx imports fuyu

* Uniformize Pix2Struct

* Fix wrong name for FuyuProcessorKwargs

* Fix slow tests reversed inputs align fuyu llava-next, change udop warning

* Fix wrong logging import udop

* Add check images text input order

* Fix copies

* change text pair handling when positional arg

* rebase on main, fix imports in test_processing_common

* remove optional args and udop uniformization from this PR

* fix failing tests

* remove unnecessary test, fix processing utils and test processing common

* cleanup Unpack

* cleanup

* fix conflict grounding dino

* 🚨🚨 Setting default behavior of assisted decoding (huggingface#33657)

* tests: fix pytorch tensor placement errors (huggingface#33485)

This commit fixes the following errors:
* Fix "expected all tensors to be on the same device" error
* Fix "can't convert device type tensor to numpy"

According to pytorch documentation torch.Tensor.numpy(force=False)
performs conversion only if tensor is on CPU (plus few other restrictions)
which is not the case. For our case we need force=True since we just
need a data and don't care about tensors coherency.

Fixes: huggingface#33517
See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html

Signed-off-by: Dmitry Rogozhkin <[email protected]>

* bump tokenizers, fix added tokens fast (huggingface#32535)

* update based on tokenizers release

* update

* nits

* update

* revert re addition

* don't break that yet

* fmt

* revert unwanted

* update tokenizers version

* update dep table

* update

* update in conversion script as well

* some fix

* revert

* fully revert

* fix training

* remove set trace

* fixup

* update

* update

* [Pixtral] Improve docs, rename model (huggingface#33491)

* Improve docs, rename model

* Fix style

* Update repo id

* fix code quality after merge

* HFQuantizer implementation for compressed-tensors library (huggingface#31704)

* Add compressed-tensors HFQuantizer implementation

* flag serializable as False

* run

* revive lines deleted by ruff

* fixes to load+save from sparseml, edit config to quantization_config, and load back

* address satrat comment

* compressed_tensors to compressed-tensors and revert back is_serializable

* rename quant_method from sparseml to compressed-tensors

* tests

* edit tests

* clean up tests

* make style

* cleanup

* cleanup

* add test skip for when compressed tensors is not installed

* remove pydantic import + style

* delay torch import in test

* initial docs

* update main init for compressed tensors config

* make fix-copies

* docstring

* remove fill_docstring

* Apply suggestions from code review

Co-authored-by: Marc Sun <[email protected]>

* review comments

* review comments

* comments - suppress warnings on state dict load, tests, fixes

* bug-fix - remove unnecessary call to apply quant lifecycle

* run_compressed compatability

* revert changes not needed for compression

* no longer need unexpected keys fn

* unexpected keys not needed either

* Apply suggestions from code review

Co-authored-by: Marc Sun <[email protected]>

* add to_diff_dict

* update docs and expand testing

* Update _toctree.yml with compressed-tensors

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <[email protected]>

* update doc

* add note about saving a loaded model

---------

Co-authored-by: George Ohashi <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: Sara Adkins <[email protected]>
Co-authored-by: Sara Adkins <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Dipika <[email protected]>

* update model card for opt

* add batch size to inference table

* [slow-run] opt

* [run-slow] opt

---------

Signed-off-by: Dmitry Rogozhkin <[email protected]>
Co-authored-by: Avishai Elmakies <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: chengchengpei <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: jiqing-feng <[email protected]>
Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Titus von Koeller <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Tibor Reiss <[email protected]>
Co-authored-by: Matt <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
Co-authored-by: Muhammad Naufil <[email protected]>
Co-authored-by: sizhky <[email protected]>
Co-authored-by: Umar Butler <[email protected]>
Co-authored-by: Jonathan Mamou <[email protected]>
Co-authored-by: Dmitry Rogozhkin <[email protected]>
Co-authored-by: NielsRogge <[email protected]>
Co-authored-by: Arthur Zucker <[email protected]>
Co-authored-by: Benjamin Fineran <[email protected]>
Co-authored-by: George Ohashi <[email protected]>
Co-authored-by: Sara Adkins <[email protected]>
Co-authored-by: Sara Adkins <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Dipika <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.