Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
17bc171
update ort CIs
IlyasMoutawwakil Sep 14, 2024
fbaa980
fix train ci
IlyasMoutawwakil Sep 14, 2024
90aa85d
fix gpu ci
IlyasMoutawwakil Sep 14, 2024
87c9f3e
gpus all
IlyasMoutawwakil Sep 14, 2024
0c1c6bd
devel
IlyasMoutawwakil Sep 14, 2024
430260e
enable trt
IlyasMoutawwakil Sep 15, 2024
00e51c7
fix
IlyasMoutawwakil Sep 15, 2024
3fc5486
fix
IlyasMoutawwakil Sep 15, 2024
8044232
fix
IlyasMoutawwakil Sep 16, 2024
2fd4d47
test
IlyasMoutawwakil Sep 27, 2024
1f322fc
rename
IlyasMoutawwakil Sep 27, 2024
6f7c599
change instance
IlyasMoutawwakil Sep 27, 2024
806faca
test
IlyasMoutawwakil Sep 27, 2024
3eecee6
use available
IlyasMoutawwakil Sep 28, 2024
ab62319
Merge branch 'main' into enable-ort-gpu-tests
IlyasMoutawwakil Dec 10, 2024
1b7e652
Merge branch 'main' into enable-ort-gpu-tests
IlyasMoutawwakil Jan 10, 2025
cebe6bf
update
IlyasMoutawwakil Jan 10, 2025
d0f62b0
shorter labels as well
IlyasMoutawwakil Jan 10, 2025
d001b9b
add onnxruntime-traning
IlyasMoutawwakil Jan 10, 2025
d271637
Merge branch 'main' into enable-ort-gpu-tests
IlyasMoutawwakil Jan 13, 2025
a318c0a
fix onnxruntime package checking
IlyasMoutawwakil Jan 13, 2025
7597692
Merge branch 'enable-ort-gpu-tests' of https://github.com/huggingface…
IlyasMoutawwakil Jan 13, 2025
a6b3a8e
fix typo
IlyasMoutawwakil Jan 13, 2025
a5c76c4
fix typo
IlyasMoutawwakil Jan 13, 2025
745ad8d
remove torch version
IlyasMoutawwakil Jan 13, 2025
bb48c4d
fix trainer
IlyasMoutawwakil Jan 13, 2025
0518dfd
fixed trt ep by using trt docker image (the only way to make sure eve…
IlyasMoutawwakil Jan 13, 2025
9635ec4
latest trt version
IlyasMoutawwakil Jan 13, 2025
cb9cb7f
remove pkv speedup timing since never used
IlyasMoutawwakil Jan 13, 2025
eb25460
trust remote code for training datasets
IlyasMoutawwakil Jan 13, 2025
0a7a23d
remove rocm from diffusers tests
IlyasMoutawwakil Jan 13, 2025
64e9c86
move ort training tests to onnxruntime-training
IlyasMoutawwakil Jan 13, 2025
bbed6bc
fix ort training
IlyasMoutawwakil Jan 14, 2025
1334200
fix
IlyasMoutawwakil Jan 14, 2025
84bf7ee
style
IlyasMoutawwakil Jan 14, 2025
be10d26
always assert closenes and not equality
IlyasMoutawwakil Jan 14, 2025
7ba72a6
fixed perceiver
IlyasMoutawwakil Jan 14, 2025
eceba5b
fixed missing position ids when attn mask is given
IlyasMoutawwakil Jan 14, 2025
9150e05
remove num_labels from output shapes as it's not a dynamic axis
IlyasMoutawwakil Jan 14, 2025
198ce06
raise error on missing mandatory inputs
IlyasMoutawwakil Jan 14, 2025
930103f
added atol and rtol as part of the ORTModelTestMixin class
IlyasMoutawwakil Jan 14, 2025
49cfdc0
fix segformer image segmentation
IlyasMoutawwakil Jan 14, 2025
5b8efd4
style
IlyasMoutawwakil Jan 14, 2025
941484a
fix vision encoder io binding
IlyasMoutawwakil Jan 14, 2025
18e887d
hot fix io binding, remove its dependency to the order of inputs and …
IlyasMoutawwakil Jan 15, 2025
88a7e8b
fix
IlyasMoutawwakil Jan 15, 2025
e9abe6a
typo
IlyasMoutawwakil Jan 15, 2025
c9b45ee
unify io binding api with non io binding
IlyasMoutawwakil Jan 15, 2025
aad9aaf
force evaluated shape to int
IlyasMoutawwakil Jan 15, 2025
a29706e
mark pix2struct io binding tests
IlyasMoutawwakil Jan 15, 2025
821c997
force contiguity in forward pass
IlyasMoutawwakil Jan 16, 2025
cc2e124
fixed cryptic contiguity problems
IlyasMoutawwakil Jan 16, 2025
3a2bcee
fix some
IlyasMoutawwakil Jan 16, 2025
f0ea288
fix vision2seq modeling and testing
IlyasMoutawwakil Jan 16, 2025
7e122c0
Merge branch 'main' into enable-ort-gpu-tests
IlyasMoutawwakil Jan 28, 2025
dc2361d
Update setup.py
IlyasMoutawwakil Jan 28, 2025
4eb95f1
update import utils
IlyasMoutawwakil Jan 28, 2025
7f1fc40
Update optimum/onnxruntime/modeling_ort.py
IlyasMoutawwakil Jan 28, 2025
696cc95
fix vision encoder decoder io binding
IlyasMoutawwakil Jan 28, 2025
1827450
enable bigbird and bigbirg pegasus and seperate timm slow tests to un…
IlyasMoutawwakil Jan 28, 2025
41abf7f
use bigger machine for slow tests
IlyasMoutawwakil Jan 28, 2025
6f3084a
lower atol and rtol for image classification logits
IlyasMoutawwakil Jan 28, 2025
010030e
fix
IlyasMoutawwakil Jan 28, 2025
445b291
large
IlyasMoutawwakil Jan 28, 2025
04c8904
enable more Longformer and MCTCT
IlyasMoutawwakil Jan 29, 2025
18e1844
enable commented models in export as well
IlyasMoutawwakil Jan 29, 2025
4487c74
uncomment timm slow models, big bird optimization and marian pkv comp…
IlyasMoutawwakil Jan 29, 2025
24d682e
Merge branch 'main' into enable-ort-gpu-tests
IlyasMoutawwakil Jan 29, 2025
def5fdb
Merge branch 'main' into enable-ort-gpu-tests
IlyasMoutawwakil Jan 29, 2025
458355d
fix whisper/speech_to_text test and make convolution deterministic
IlyasMoutawwakil Jan 29, 2025
881015c
pin torch for ort training
IlyasMoutawwakil Jan 29, 2025
7c8c56f
ctc and speech also uses convolution so has to be deterministic
IlyasMoutawwakil Jan 29, 2025
3a4bac9
revert vison2seq atol
IlyasMoutawwakil Jan 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions .github/workflows/test_export_onnx_cli.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@ name: Exporters ONNX CLI / Python - Test

on:
push:
branches: [main]
branches:
- main
pull_request:
branches: [main]
branches:
- main

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
Expand All @@ -19,16 +21,20 @@ jobs:
os: [ubuntu-20.04]

runs-on: ${{ matrix.os }}

steps:
- uses: actions/checkout@v2
- name: Checkout repository
uses: actions/checkout@v4

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies for pytorch export

- name: Install dependencies
run: |
pip install .[tests,exporters,diffusers]
- name: Test with unittest
working-directory: tests

- name: Test with pytest
run: |
pytest exporters/onnx/test_exporters_onnx_cli.py -n auto -m "not tensorflow_test and not timm_test" -s --durations=0
pytest tests/exporters/onnx/test_exporters_onnx_cli.py -n auto -m "not tensorflow_test and not timm_test" -s --durations=0
12 changes: 6 additions & 6 deletions .github/workflows/test_onnxruntime.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name: ONNX Runtime / Python - Test

on:
push:
branches: [main]
branches:
- main
pull_request:
branches: [main]
branches:
- main

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
Expand Down Expand Up @@ -58,10 +58,10 @@ jobs:

- name: Test with pytest (in series)
run: |
pytest tests/onnxruntime -m "run_in_series" --durations=0 -vvvv -s
pytest tests/onnxruntime -m "run_in_series" --durations=0 -vvvv

- name: Test with pytest (in parallel)
run: |
pytest tests/onnxruntime -m "not run_in_series" --durations=0 -vvvv -s -n auto
pytest tests/onnxruntime -m "not run_in_series" --durations=0 -vvvv -n auto
env:
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
58 changes: 41 additions & 17 deletions .github/workflows/test_onnxruntime_gpu.yml
Original file line number Diff line number Diff line change
@@ -1,30 +1,54 @@
name: ONNX Runtime / Test GPU
name: ONNX Runtime GPU / Python - Test

on:
workflow_dispatch:
schedule:
- cron: 0 1 */3 * * # at 1am every 3 days
- cron: 0 7 * * * # every day at 7am UTC
pull_request:
types: [opened, synchronize, reopened, labeled]
# uncomment to enable on PR merge on main branch:
#push:
# branches:
# - main
branches:
- main
types:
- opened
- labeled
- reopened
- unlabeled
- synchronize

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
do-the-job:
if: ${{ (github.event_name == 'workflow_dispatch') || (github.event_name == 'schedule') || contains( github.event.pull_request.labels.*.name, 'gpu-test') }}
name: Start self-hosted EC2 runner
build:
if: ${{
(github.event_name == 'push') ||
(github.event_name == 'workflow_dispatch') ||
contains(github.event.pull_request.labels.*.name, 'gpu') ||
contains(github.event.pull_request.labels.*.name, 'onnxruntime-gpu')
}}

runs-on:
group: aws-g6-4xlarge-plus
env:
AWS_REGION: us-east-1

container:
image: nvcr.io/nvidia/tensorrt:24.12-py3
options: --gpus all

steps:
- name: Checkout
uses: actions/checkout@v2
- name: Build image
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.9"

- name: Install dependencies
run: |
docker build -f tests/onnxruntime/docker/Dockerfile_onnxruntime_gpu -t onnxruntime-gpu .
- name: Test with unittest within docker container
pip install --upgrade pip
pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install .[tests,onnxruntime-gpu,diffusers]

- name: Test with pytest
run: |
docker run --rm --gpus all -v /mnt/cache/.cache/huggingface:/root/.cache/huggingface --workdir=/workspace/optimum/tests onnxruntime-gpu:latest
pytest tests/onnxruntime -m "cuda_ep_test or trt_ep_test" --durations=0 -vvvv -n auto
57 changes: 37 additions & 20 deletions .github/workflows/test_onnxruntime_slow.yml
Original file line number Diff line number Diff line change
@@ -1,33 +1,50 @@
name: ONNX Runtime slow / Python - Test
name: ONNX Runtime Slow / Python - Test

on:
workflow_dispatch:
schedule:
- cron: 0 7 * * * # every day at 7am
- cron: 0 7 * * * # every day at 7am UTC
pull_request:
branches:
- main
types:
- opened
- labeled
- reopened
- unlabeled
- synchronize

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
build:
strategy:
fail-fast: false
matrix:
python-version: ["3.9"]
os: [ubuntu-20.04]
if: ${{
(github.event_name == 'push') ||
(github.event_name == 'workflow_dispatch') ||
contains(github.event.pull_request.labels.*.name, 'slow') ||
contains(github.event.pull_request.labels.*.name, 'onnxruntime-slow')
}}

runs-on:
group: aws-general-8-plus

runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies for export
run: |
pip install .[tests,onnxruntime,diffusers]
- name: Test with unittest
working-directory: tests
run: |
RUN_SLOW=1 pytest onnxruntime -s -m "run_slow" --durations=0
- name: Checkout
uses: actions/checkout@v4

- name: Setup Python 3.9
uses: actions/setup-python@v5
with:
python-version: "3.9"

- name: Install dependencies
run: |
pip install --upgrade pip
pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install .[tests,onnxruntime,diffusers]

- name: Test with pytest
run: |
RUN_SLOW=1 pytest tests/onnxruntime -m "run_slow" --durations=0 -vvvv
26 changes: 0 additions & 26 deletions .github/workflows/test_onnxruntime_train.yml

This file was deleted.

66 changes: 66 additions & 0 deletions .github/workflows/test_onnxruntime_training.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
name: ONNX Runtime Training / Python - Test

on:
workflow_dispatch:
schedule:
- cron: 0 7 * * * # every day at 7am UTC
pull_request:
branches:
- main
types:
- opened
- labeled
- reopened
- unlabeled
- synchronize

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
build:
if: ${{
(github.event_name == 'push') ||
(github.event_name == 'workflow_dispatch') ||
contains( github.event.pull_request.labels.*.name, 'training') ||
contains( github.event.pull_request.labels.*.name, 'onnxruntime-training')
}}

runs-on:
group: aws-g6-4xlarge-plus

container:
image: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
options: --gpus all

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.9"

- name: Install dependencies
env:
TORCH_CUDA_ARCH_LIST: "5.0 6.0 7.0 7.5 8.0 8.6 9.0+PTX"
run: |
pip install --upgrade pip
pip install --no-cache-dir "torch<2.6" torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install --no-cache-dir torch-ort onnxruntime-training && python -m torch_ort.configure
pip install --no-cache-dir evaluate absl-py rouge_score seqeval sacrebleu nltk scikit-learn
pip install .[tests,onnxruntime-training]

- name: Test with pytest (trainer)
run: |
RUN_SLOW=1 pytest tests/onnxruntime-training/test_trainer.py --durations=0 -vvvv
env:
HF_DATASETS_TRUST_REMOTE_CODE: 1

- name: Test with pytest (examples)
run: |
RUN_SLOW=1 pytest tests/onnxruntime-training/test_examples.py --durations=0 -vvvv
env:
HF_DATASETS_TRUST_REMOTE_CODE: 1
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,7 @@ def compute_metrics(p):
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
attn_implementation="eager",
)
image_processor = AutoImageProcessor.from_pretrained(
model_args.image_processor_name or model_args.model_name_or_path,
Expand Down
5 changes: 4 additions & 1 deletion examples/onnxruntime/training/language-modeling/run_clm.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,9 +442,12 @@ def main():
trust_remote_code=model_args.trust_remote_code,
torch_dtype=torch_dtype,
low_cpu_mem_usage=model_args.low_cpu_mem_usage,
attn_implementation="eager",
)
else:
model = AutoModelForCausalLM.from_config(config, trust_remote_code=model_args.trust_remote_code)
model = AutoModelForCausalLM.from_config(
config, trust_remote_code=model_args.trust_remote_code, attn_implementation="eager"
)
n_params = sum({p.data_ptr(): p.numel() for p in model.parameters()}.values())
logger.info(f"Training new model from scratch - Total size={n_params/2**20:.2f}M params")

Expand Down
5 changes: 4 additions & 1 deletion examples/onnxruntime/training/language-modeling/run_mlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,10 +430,13 @@ def main():
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
low_cpu_mem_usage=model_args.low_cpu_mem_usage,
attn_implementation="eager",
)
else:
logger.info("Training new model from scratch")
model = AutoModelForMaskedLM.from_config(config, trust_remote_code=model_args.trust_remote_code)
model = AutoModelForMaskedLM.from_config(
config, trust_remote_code=model_args.trust_remote_code, attn_implementation="eager"
)

# We resize the embeddings only when necessary to avoid index errors. If you are creating a model from scratch
# on a small vocab and want a smaller embedding size, remove this test.
Expand Down
1 change: 1 addition & 0 deletions examples/onnxruntime/training/question-answering/run_qa.py
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,7 @@ def main():
revision=model_args.model_revision,
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
attn_implementation="eager",
)

# Tokenizer check: this script requires a fast tokenizer.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -458,6 +458,7 @@ def main():
revision=model_args.model_revision,
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
attn_implementation="eager",
)

if model.config.decoder_start_token_id is None and isinstance(tokenizer, (MBartTokenizer, MBartTokenizerFast)):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -527,6 +527,7 @@ def main():
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
attn_implementation="eager",
)
model.config.pad_token_id = model.config.eos_token_id

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,7 @@ def main():
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
attn_implementation="eager",
)

# Preprocessing the raw_datasets
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,7 @@ def get_label_list(labels):
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
attn_implementation="eager",
)

if tokenizer.pad_token is None:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,7 @@ def main():
revision=model_args.model_revision,
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
attn_implementation="eager",
)

# Set decoder_start_token_id
Expand Down
Loading