Thank you for your interest in contributing to TileGym! This document explains the main ways you can help and what we expect from contributions.
-
Report issues
- Use the issue tracker to report bugs, request features, or suggest improvements.
- Include clear steps to reproduce, expected vs. actual behavior, and environment details when possible.
-
Contribute code
- See Code contributions for the end-to-end workflow and expectations.
- See Contributing kernels for kernel work (new kernels vs. optimizing existing kernels).
Before starting non-trivial work, please check for existing issues and consider opening a discussion/issue so we can align on scope and design.
If you plan to submit code changes (new features, bug fixes, refactors):
-
Review the Roadmap
- Before contributing, please review the ROADMAP.md to understand:
- Current operator support status (what's available, in progress, or planned)
- Contribution opportunities and priority areas
- Which kernels need help (marked as "🙋 Help Wanted")
- This helps ensure your contribution aligns with project priorities and avoids duplicate work.
- Before contributing, please review the ROADMAP.md to understand:
-
Read the project README
- Review the project-level
README.mdfor build, install, and basic usage instructions.
- Review the project-level
-
Pick or propose an issue
- Look for existing issues that match what you want to do, or create a new issue describing your proposal.
- Comment on the issue to indicate you are working on it.
-
Discuss significant changes first
- For larger features or intrusive refactors, outline your approach in the issue so maintainers can provide feedback early.
-
Implement the change
- Follow the existing coding style and patterns in the affected modules.
- Add or update tests to cover new behavior.
- If you are contributing kernel code, follow Contributing kernels.
-
Format your code
- Run
./format.shfrom the repository root to ensure your changes pass CI format checks.
- Run
-
Open a pull request
- Keep PRs focused on a single logical change.
- Describe what the PR does, how you tested it, and any potential user-facing impact.
-
Respond to review
- Address comments, push updates, and keep the discussion on the PR/issue.
There are two common situations when contributing kernel code:
If you are adding a new kernel (new @ct.kernel / new op implementation) that is not yet validated by the core team, it should go through the experimental-kernel flow.
New cuTile kernel contributions should first be placed in the experimental/ directories. Once the TileGym team has fully verified functional correctness and performance, kernels will be promoted from experimental/ into the main source tree.
We provide adding-cutile-kernel skill for AI agent to add new kernels in this repo.
src/tilegym/ops/cutile/experimental/<kernel_name>.py # kernel implementation
tests/ops/experimental/test_<kernel_name>.py # correctness tests
tests/benchmark/experimental/bench_<kernel_name>.py # performance benchmarks
Create src/tilegym/ops/cutile/experimental/<kernel_name>.py.
- Import
experimental_kernelviafrom tilegym.experimental import experimental_kernel. - Apply the
@experimental_kerneldecorator before@ct.kernel. - Use
@register_implon the op entry-point function. - See
src/tilegym/ops/cutile/experimental/mhc.pyfor an example.
The @experimental_kernel decorator marks a kernel as experimental so that a one-time message is printed the first time the kernel is launched via ct.launch. Three usage forms:
@experimental_kernel # bare — auto-generates message from kernel name (recommended)
@experimental_kernel() # empty parens — same as bare
@experimental_kernel("Custom message text") # custom message- The bare form (recommended) auto-generates a message that includes the kernel's function name.
- The message prints once per kernel per process at the first
ct.launchcall. - The decorator must be placed before
@ct.kernel.
If your kernel introduces a new public op name (a new dispatch key that does not already exist), you need to add it to the unified API layer in src/tilegym/ops/ops.py:
- Add a new function decorated with
@dispatch("<op_name>"). - Keep the public API signature stable and aligned with your implementation.
- Ensure the dispatch key string matches exactly:
@dispatch("<op_name>")inops.pymust match@register_impl("<op_name>", backend=...)in your backend implementation.
If you are providing a new backend implementation for an op that already exists in ops.py, you usually do not need to modify ops.py.
In src/tilegym/ops/cutile/__init__.py:
- Add an import for your module (for example
from .experimental import <module>) so the module is loaded and your@register_impl(...)registration runs. - If you want functions to be directly accessible from
tilegym.ops.cutile, addfrom .experimental.<module> import <function>. - Add any directly-exported function names to
__all__.
Create tests/ops/experimental/test_<kernel_name>.py.
- Inherit from
common.PyTestCase. - Implement a
reference()method (or similar) that computes the expected result using PyTorch. - Use
@pytest.mark.parametrizefor shape/dtype coverage andself.assertCorrectness()for validation. - Add cases that cover typical shapes, edge cases, and mixed-precision scenarios where relevant.
- Make sure tests pass locally with
pytestbefore opening the PR. - See
tests/ops/README.mdfor full testing guidelines.
Create tests/benchmark/experimental/bench_<kernel_name>.py.
Benchmarks in tests/benchmark/experimental/ are auto-discovered by run_all.sh and run_all_json.py — no extra registration is needed.
If you are optimizing or updating an existing kernel (for performance, correctness, maintainability, readability, portability, etc.) for an op that already exists, you typically only need to update the existing implementation file(s) and tests/benchmarks.
- No
@experimental_kernelmarker needed: you do not need to add@experimental_kernelwhen you are improving an already-existing kernel implementation. If the kernel already carries@experimental_kernel, keep the existing behavior unless maintainers request otherwise. - No
ops.pyor registration changes needed in most cases, because the op is already dispatched. This includes helper kernels that are called inside an already-registered implementation. - Follow the existing coding style and patterns in the affected modules.
- Add or update tests and benchmarks to verify your changes if needed: correctness tests under
tests/ops/and benchmarks undertests/benchmark/.
To accept your contribution, we need a signed Contributor License Agreement (CLA) on file.
- Locate the CLA at
LICENSES/CLA.mdin this repository. - Fill it out and sign.
- Email the signed CLA to
TileGym@nvidia.comwith subject:TileGym CLA Submission. - Wait for confirmation from the TileGym team before your PR can be merged.
- Maintainers will review your PR, suggest changes if needed, and approve once it meets project standards.
- CI and tests must pass before merge.
- Focused, well-described, and well-tested PRs are much easier and faster to review.
If anything in this document is unclear or missing, feel free to comment on issues and ask for clarifications!