Skip to content

Initialize TileGym CI#4

Merged
arjkesh merged 28 commits intoNVIDIA:mainfrom
arjkesh:tilegym_ci_init
Dec 12, 2025
Merged

Initialize TileGym CI#4
arjkesh merged 28 commits intoNVIDIA:mainfrom
arjkesh:tilegym_ci_init

Conversation

@arjkesh
Copy link
Collaborator

@arjkesh arjkesh commented Dec 10, 2025

Description

Add build/test CI, code formatter (along with formatting), PR config, GHCR cleanup job

Relevant files:
There are a lot of formatting changes due to addition of darker formatter. The files below have functional changes

  • tests/benchmark/run_all.sh --> modify benchmark runner script to improve reporting
  • .github/* --> home of CI infra and associated tests
  • modeling/transformers/Dockerfile --> Restructure dockerfile to maximize build-over-build layer overlap

Features:

  • parallelize ops tests with xdist (added da1f37a)
  • debug parser logic (fixed 42c632f)
  • add junitxml reporting (added da1f37a)
  • debug image tagging (fixed 42c632f)
  • use darker instead of black (fixed in d653790)
  • speed up build time (caching and dockerfile refactoring, dropped build time by 60% da1f37a)
  • swap out main push flows for nightly (fixed in d653790)
  • Add tests for infra scripts (passing in 8de6ac0)
  • Add "verified" tag after build/test completes to signal that testing succeeded on the image (will run nightly)

CI Configuration

config:
  build: true
  # valid options are "ops" and "benchmark"
  test: ["ops", "benchmark"]

Checklist

  • Documentation updated (if needed)
  • CI configuration reviewed

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

/ok to test 6264b2c

azazhu
azazhu previously approved these changes Dec 10, 2025
--gpus all \
-w /workspace/tilegym \
tilegym-transformers:latest \
pytest -s tests/ops -v -k test_op
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try to run in parallel if it does not OOM.
Use pip install pytest-xdist and pytest -n job_number to automatically run tests in parallel.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

/ok to test 9c9e570

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

/ok to test 0987a19

@NVIDIA NVIDIA deleted a comment from copy-pr-bot bot Dec 10, 2025
@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

/ok to test 1a2e7ce

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

Remaining changes

  • parallelize ops tests with xdist (added da1f37a)
  • parallelize benchmark tests (added da1f37a)
  • debug parser logic (fixed 42c632f)
  • add junitxml reporting (added da1f37a)
  • debug image tagging (fixed 42c632f)
  • use darker instead of black (fixed in d653790)
  • speed up build time (caching and dockerfile refactoring, dropped build time by 60% da1f37a)
  • swap out main push flows for nightly (fixed in d653790)

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

/ok to test cdeba44

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

/ok to test 42c632f

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

/ok to test da1f37a

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

/ok to test da1f37a

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 10, 2025

/ok to test da1f37a

@arjkesh, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

/ok to test 173ca7b

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 10, 2025

/ok to test d653790

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 11, 2025

/ok to test 00e7f0e

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 11, 2025

/ok to test 5e36b90

@hannahli-nv
Copy link
Collaborator

Could you please rename the README.md file in the .github directory or move it elsewhere? This will ensure that the README.md file in the root directory is displayed properly on the project's homepage.

1 similar comment
@hannahli-nv
Copy link
Collaborator

Could you please rename the README.md file in the .github directory or move it elsewhere? This will ensure that the README.md file in the root directory is displayed properly on the project's homepage.


env:
# PR images go to a temp repo, main/nightly go to main repo
IMAGE_NAME_PR: tilegym-transformers-pr
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to tilegym and tilegym-pr

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 11, 2025

/ok to test d3ba035

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 11, 2025

/ok to test c03370b

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 11, 2025

/ok to test 11c023b

@arjkesh arjkesh enabled auto-merge (squash) December 11, 2025 16:39
@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 12, 2025

/ok to test a2f303b

@arjkesh arjkesh disabled auto-merge December 12, 2025 04:17
@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 12, 2025

/ok to test 0daef68

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 12, 2025

/ok to test fd88ad2

@arjkesh
Copy link
Collaborator Author

arjkesh commented Dec 12, 2025

/ok to test a276ae1

Copy link
Collaborator

@xjmxyt xjmxyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

@arjkesh arjkesh merged commit c15b012 into NVIDIA:main Dec 12, 2025
9 checks passed
@arjkesh arjkesh deleted the tilegym_ci_init branch December 12, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants