Skip to content

build(jit-cache): split flashinfer-jit-cache wheels by SM family#3265

Open
dierksen wants to merge 9 commits into
flashinfer-ai:mainfrom
dierksen:worktree-wheel-size
Open

build(jit-cache): split flashinfer-jit-cache wheels by SM family#3265
dierksen wants to merge 9 commits into
flashinfer-ai:mainfrom
dierksen:worktree-wheel-size

Conversation

@dierksen
Copy link
Copy Markdown
Collaborator

@dierksen dierksen commented May 7, 2026

The cu130 flashinfer-jit-cache wheel grew to 2.0 GB and started failing to upload as a GitHub Release asset (per-asset 2 GiB limit; see #3257). Each new SM target appended ~150-200 MB compressed to every wheel, and cu130 carries 8 (sm75/80/89/90a/100a/103a/110a/120f).

Split each (CUDA, CPU-arch) wheel into three by GPU SM family:

  • sm9x - Ampere/Ada/Hopper (<= sm90a)
  • sm10x - Datacenter Blackwell (sm100a/103a/110a)
  • sm12x - Consumer Blackwell (sm120f, future sm121a)

Same package name everywhere; the family is encoded in the PEP 440 local-version, so wheels resolve as e.g. 'flashinfer-jit-cache== 0.6.11+cu130.sm10x'. Existing 'pip install flashinfer-jit-cache' still works once the right pin is given.

Driven by a new 'flashinfer install-jit-cache-wheel' subcommand that detects FlashInfer version, CUDA version, and GPU compute capability (via torch.cuda.get_device_capability) and runs the matching pip install. Honors --cuda-version, --sm-family, --nightly, --dry-run. Modeled on the CLI scaffolding from #3142 with the family dimension added.

Build side: 'FLASHINFER_JIT_CACHE_SM_FAMILY' env var, when set, filters 'FLASHINFER_CUDA_ARCH_LIST' to the family's archs and appends '.' to the local-version suffix. Release / nightly workflows gain an 'sm_family' matrix dimension; the upload-to-release loop iterates over all three families. The wheel-index regex accepts the new local-version shape and remains compatible with the legacy '+cuXY' format.

Closes #3257

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Testing Done

Local DGX Spark validation:

  • Confirmed host is aarch64 with NVIDIA GB10, compute capability 12.1; PyTorch sees CUDA 13.0 and reports the device as (12, 1).
  • Ran flashinfer install-jit-cache-wheel --dry-run from the source checkout. It now resolves from version.txt when package metadata is 0.0.0+unknown, detects CUDA 13.0, selects sm12x, and prints flashinfer-jit-cache==0.6.11+cu130.sm12x.
  • Verified uv-style venv behavior: this environment has no python -m pip, so the CLI now falls back to uv pip install --python ....
  • Ran the live install path against https://flashinfer.ai/whl/cu130; it reached the resolver cleanly and failed only because 0.6.11+cu130.sm12x is not yet published.
  • Downloaded the PR-built jit-cache-cu129-aarch64-sm12x artifact from Release run 25528598737, served it through a local simple index, installed it with explicit --cuda-version cu129 --sm-family sm12x --index-url ..., imported flashinfer_jit_cache, verified FLASHINFER_AOT_DIR points at the installed package cache, then uninstalled it because the local machine is CUDA 13.0.
  • Checked the Release run artifacts: the matching jit-cache-cu130-aarch64-sm12x artifact is absent because that job failed while downloading build dependencies (IncompleteRead for nvidia_cublas), not because of the CLI install path.

Merge/conflict validation:

  • Merged current upstream/main and resolved conflicts in pr-test.yml, release.yml, and nightly-release.yml, preserving both FLASHINFER_JIT_CACHE_SM_FAMILY forwarding and upstream sccache/NVCC env forwarding.
  • Parsed the touched workflow YAML files with yaml.safe_load.
  • Ran git diff --cached --check.
  • Ran python -m py_compile flashinfer/__main__.py flashinfer/jit/env.py tests/cli/test_cli_cmds.py.
  • Ran python -m pytest tests/cli/test_cli_cmds.py -q (18 passed).

Review-comment follow-up:

  • Reviewed the unresolved Gemini and CodeRabbit comments. Addressed Gemini's arch-list parsing case by accepting integer-style entries like 90, 100, and 120 in the SM-family filter, and moved the duplicated SM-family helpers into build_utils.py for reuse by both the CLI and jit-cache build backend.
  • Addressed CodeRabbit's --nightly concern by rejecting nightly installs when the resolved FlashInfer version is not a dev release; dev versions still exact-pin the matching SM-family wheel, e.g. flashinfer-jit-cache==0.6.11.dev20260508+cu130.sm12x.
  • Ran python -m py_compile build_utils.py flashinfer/__main__.py flashinfer-jit-cache/build_backend.py tests/cli/test_cli_cmds.py.
  • Ran python -m pytest tests/cli/test_cli_cmds.py -q (21 passed, with the expected PyTorch GB10 capability warning from this host's torch build).
  • Ran git diff --check.
  • Ran release CLI dry-run: flashinfer install-jit-cache-wheel --cuda-version cu130 --sm-family sm12x --dry-run, which resolves flashinfer-jit-cache==0.6.11+cu130.sm12x and the uv pip install --python ... command.
  • Ran nightly CLI dry-runs for both paths: stable 0.6.11 now fails early with the new explanatory error, while explicit 0.6.11.dev20260508 resolves flashinfer-jit-cache==0.6.11.dev20260508+cu130.sm12x against https://flashinfer.ai/whl/nightly/cu130 with --pre.

Human feedback follow-up:

  • Updated flashinfer install-jit-cache-wheel autodetection to inspect every visible CUDA device instead of only device 0. It selects a wheel only when the visible GPUs are covered by one jit-cache SM-family wheel, and otherwise fails with guidance to pass --sm-family or build from source with an explicit FLASHINFER_CUDA_ARCH_LIST.
  • Treated Blackwell-family wheels as sm80 base arch plus native Blackwell archs. The build-side family filter now keeps/adds 8.0 for sm10x and sm12x only when a native arch for that family is present.
  • Kept sm12x default arch lists on 12.0f; the family-specific sm120f target covers DGX Spark / GB10 (sm121) without adding an exact 12.1a target by default.
  • Added installed-wheel compatibility validation for flashinfer-jit-cache local-version SM suffixes. On CUDA hosts, a wrong-family installed wheel now fails fast; on this DGX Spark, 0.6.11+cu130.sm12x validates and 0.6.11+cu130.sm9x fails with an expected-family error.
  • Reused the shared Python family filter in the AOT build/import test script instead of maintaining a second shell implementation.
  • Ran python -m py_compile build_utils.py flashinfer/__main__.py flashinfer/jit/env.py flashinfer-jit-cache/build_backend.py tests/cli/test_cli_cmds.py.
  • Ran python -m pytest tests/cli/test_cli_cmds.py -q (27 passed, with the expected PyTorch GB10 capability warning from this host's torch build).
  • Ran uvx ruff check ... and uvx ruff format --check ... over the touched Python files.
  • Ran git diff --check, bash -n scripts/task_test_jit_cache_package_build_import.sh, and parsed the touched workflow YAML files with yaml.safe_load.
  • Verified the family filter outputs sm10x: 8.0 10.0a 10.3a 11.0a and sm12x: 8.0 12.0f for the CUDA 13.0 release arch list.
  • Re-ran the DGX Spark CLI dry-run; it still detects sm12x and resolves flashinfer-jit-cache==0.6.11+cu130.sm12x using the uv pip install --python ... fallback.

SM121 target cleanup:

  • Removed the temporary 12.1a additions from release/nightly/default jit-cache arch lists and docs; sm12x now defaults to 8.0 12.0f.
  • Kept explicit 12.1a support in the parser/filter if a user supplies it manually, but release artifacts no longer build it by default.
  • Re-ran python -m pytest tests/cli/test_cli_cmds.py -q (27 passed), uvx ruff check tests/cli/test_cli_cmds.py, uvx ruff format --check tests/cli/test_cli_cmds.py, bash -n scripts/task_test_jit_cache_package_build_import.sh, git diff --check, workflow YAML parsing, and the DGX Spark CLI dry-run.

SM110 architecture split:

  • Limited 11.0a / sm110 jit-cache build coverage to CUDA 13.0 aarch64 release, nightly, and PR AOT build/import arch lists. CUDA 13.0 x86_64 lists now omit 11.0a.
  • Updated source-build docs to omit 11.0a from the generic x86-oriented examples and call out adding it for Jetson AGX Thor / T5000 aarch64 targets.
  • Re-ran workflow YAML parsing, bash -n scripts/task_test_jit_cache_package_build_import.sh, git diff --check, uvx ruff check tests/cli/test_cli_cmds.py, and python -m pytest tests/cli/test_cli_cmds.py -q (27 passed).
  • Verified the generated CUDA 13 arch lists: x86_64 -> 7.5 8.0 8.9 9.0a 10.0a 10.3a 12.0f; aarch64 -> 7.5 8.0 8.9 9.0a 10.0a 10.3a 11.0a 12.0f.

Summary by CodeRabbit

  • New Features

    • Added a CLI command to detect CUDA/SM family and install per‑SM‑family JIT‑cache wheels (nightly, dry‑run, and explicit overrides supported).
  • Documentation

    • Updated README and installation docs with the new CLI, per‑SM‑family wheel scheme, offline init, source‑build flags, and nightly install guidance.
  • Tests / CI

    • CI matrices, workflows, and tests updated to build, name, upload, and consume JIT‑cache artifacts per (CUDA, arch, SM‑family).
  • Chores

    • Build/index tooling and version handling improved to produce and recognize SM‑family segmented wheels.

The cu130 flashinfer-jit-cache wheel grew to 2.0 GB and started failing
to upload as a GitHub Release asset (per-asset 2 GiB limit; see flashinfer-ai#3257).
Each new SM target appended ~150-200 MB compressed to every wheel, and
cu130 carries 8 (sm75/80/89/90a/100a/103a/110a/120f).

Split each (CUDA, CPU-arch) wheel into three by GPU SM family:
- sm9x   - Ampere/Ada/Hopper       (<= sm90a)
- sm10x  - Datacenter Blackwell    (sm100a/103a/110a)
- sm12x  - Consumer Blackwell      (sm120f, future sm121a)

Same package name everywhere; the family is encoded in the PEP 440
local-version, so wheels resolve as e.g. 'flashinfer-jit-cache==
0.6.11+cu130.sm10x'. Existing 'pip install flashinfer-jit-cache' still
works once the right pin is given.

Driven by a new 'flashinfer install-jit-cache-wheel' subcommand that
detects FlashInfer version, CUDA version, and GPU compute capability
(via torch.cuda.get_device_capability) and runs the matching pip
install. Honors --cuda-version, --sm-family, --nightly, --dry-run.
Modeled on the CLI scaffolding from flashinfer-ai#3142 with the family dimension
added.

Build side: 'FLASHINFER_JIT_CACHE_SM_FAMILY' env var, when set,
filters 'FLASHINFER_CUDA_ARCH_LIST' to the family's archs and appends
'.<family>' to the local-version suffix. Release / nightly workflows
gain an 'sm_family' matrix dimension; the upload-to-release loop
iterates over all three families. The wheel-index regex accepts the
new local-version shape and remains compatible with the legacy
'+cuXY' format.

Closes flashinfer-ai#3257

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds SM-family-aware JIT-cache wheel distribution: utilities to map/filter CUDA arches by SM family, build backend and metadata changes, a new install CLI with autodetection, CI matrix/artifact updates, wheel-index parsing, docs updates, and corresponding tests.

Changes

SM Family JIT Cache Wheels

Layer / File(s) Summary
SM Family Utilities
build_utils.py
Adds SM_FAMILY_ORDER, sm_family_for_capability, parse_cuda_arch_entry, and filter_arch_list_for_sm_family to parse and filter CUDA arch lists by SM family.
Build Backend SM Filtering
flashinfer-jit-cache/build_backend.py
Reads/validates FLASHINFER_JIT_CACHE_SM_FAMILY, filters FLASHINFER_CUDA_ARCH_LIST, errors if empty, and appends the family suffix to local build metadata; regenerates metadata during prepare.
Install CLI & Detection
flashinfer/__main__.py
Adds install-jit-cache-wheel command, PyTorch-based SM-family detection, CUDA-version parsing/normalization, requirement/index construction, and install command execution (supports --nightly and --dry-run).
Release Workflow
.github/workflows/release.yml
Adds sm_family axis to release build matrix, prints sm_family, sets FLASHINFER_JIT_CACHE_SM_FAMILY in container env, updates artifact names to include -{sm_family}, and expands release upload loop to iterate over sm_family.
Nightly Workflow
.github/workflows/nightly-release.yml
Extends nightly jit-cache matrix with sm_family, echoes and injects FLASHINFER_JIT_CACHE_SM_FAMILY, includes sm_family in artifact names, and iterates downloads/uploads per (cuda, arch, sm_family) with cleanup.
PR Test Matrix & Job Wiring
.github/workflows/pr-test.yml
Expands AOT Build Import test matrix with sm_family (with excludes), updates job display names, injects FLASHINFER_JIT_CACHE_SM_FAMILY into test runs and rerun generation.
Test Script Arch Filtering
scripts/task_test_jit_cache_package_build_import.sh
Optionally filters computed FLASHINFER_CUDA_ARCH_LIST by FLASHINFER_JIT_CACHE_SM_FAMILY, validates values, errors if none remain, and re-exports the filtered list.
Wheel Index Parsing
scripts/update_whl_index.py
Extends wheel-filename regex for flashinfer_jit_cache to accept an optional .sm... qualifier after the +cuXXX CUDA metadata segment.
JIT Env Version Normalization
flashinfer/jit/env.py
Adds _public_package_version() and uses it when comparing package versions for cubin and jit-cache directory selection.
Documentation Updates
README.md, docs/installation.rst
Replace hardcoded CUDA-specific pip instructions with flashinfer install-jit-cache-wheel CLI guidance; document per-(CUDA, SM family) wheels, supported families, Blackwell/cu13 notes, and source/nightly build flags.
CLI & Utility Tests
tests/cli/test_cli_cmds.py
Adds tests for install-jit-cache-wheel dry-run/nightly behavior, version resolution from version.txt, pip fallback to uv, public-version normalization, and arch-list filtering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

run-ci

Suggested reviewers

  • yzh119
  • yongwww
  • sricketts
  • cyx-6
  • aleozlx
  • nvmbreughe

"I’m a rabbit in the CI glen, I split the wheels by family then ran,
sm9x, sm10x, sm12x in a tidy band,
builds now filter, CLI finds the right hand,
docs updated, tests hop along—cheers from this fluffy dev land!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'build(jit-cache): split flashinfer-jit-cache wheels by SM family' clearly and concisely describes the main change—splitting JIT cache wheels by GPU SM family to address size constraints.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description comprehensively covers changes, testing, validation, and addresses review feedback with detailed technical justification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new CLI command, flashinfer install-jit-cache-wheel, to automate the installation of flashinfer-jit-cache wheels by autodetecting the CUDA version and GPU SM family. This update supports a new distribution model where wheels are split by SM family to comply with GitHub's asset size limits. Feedback from the review suggests improving the robustness of CUDA architecture parsing to handle various string formats and refactoring duplicated SM family logic into a common utility to enhance maintainability.

Comment thread flashinfer-jit-cache/build_backend.py Outdated
Comment thread flashinfer-jit-cache/build_backend.py Outdated
Mirrors the per-family split in release/nightly so PR CI actually
exercises the per-family build path. Was previously running one job
per (cuda, arch) which still built every arch; now runs three jobs
per (cuda, arch) — one per SM family — each compiling only its
family's archs.

- pr-test.yml: 'aot-build-import' and 'aot-build-import-rerun' gain
  'sm_family: [sm9x, sm10x, sm12x]'. cu126 is excluded for sm10x and
  sm12x because that toolkit only supports archs <= sm90. The rerun
  matrix builder mirrors the same exclude. FLASHINFER_JIT_CACHE_SM_FAMILY
  is forwarded into the test container via ci/bash.sh's '-e' flag.
- task_test_jit_cache_package_build_import.sh: when
  FLASHINFER_JIT_CACHE_SM_FAMILY is set, filter FLASHINFER_CUDA_ARCH_LIST
  to that family's archs before running the wheel build and
  verify_all_modules_compiled.py. The build-side filter in
  build_backend.py mutates os.environ inside its own process only, so
  doing it once in the parent shell ensures both subprocesses see the
  same arch list.

Also fix black formatting flagged by pre-commit on PR flashinfer-ai#3265:
- build_backend.py: rewrite SM_FAMILIES lambdas as named functions to
  avoid black's awkward multi-line break of '<' chained comparisons.
- __main__.py: collapse a ClickException to single-line per black's
  preference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flashinfer-jit-cache/build_backend.py`:
- Around line 38-94: Run the project formatter (e.g. ruff format or pre-commit
run --all-files) and commit the resulting changes so the SM_FAMILIES dict and
the multi-line print in _apply_sm_family_filter are formatted to satisfy ruff;
specifically reformat the SM_FAMILIES declaration and the print(...) call in
_apply_sm_family_filter (and any other affected lines) and push the reformatted
file so CI passes.

In `@flashinfer/__main__.py`:
- Around line 267-279: Re-run the project's formatter (ruff format) to apply the
canonical formatting for the Click exception lines in the CUDA-version parsing
block: ensure the click.ClickException(...) call around the InvalidVersion
exception handling and the earlier validation (the calls that raise
click.ClickException when normalized startswith "cu" and in the except block
that wraps InvalidVersion) are formatted according to ruff so the pre-commit
check passes; after formatting, stage and commit the changes.
- Around line 350-403: The current install_jit_cache_wheel_cmd builds an exact
pinned requirement from resolved_flashinfer_version which breaks when --nightly
points at nightly index but the installed __version__ is a stable release;
modify install_jit_cache_wheel_cmd to detect nightly and, if nightly is True and
resolved_flashinfer_version is a release (no "dev" or "+"), construct a range
requirement instead of an exact pin (e.g.
"flashinfer-jit-cache>={base},<{next_major_or_minor}") by parsing
resolved_flashinfer_version with packaging.version to compute the next version
bound, or alternatively call a new flag-aware helper (update
_build_jit_cache_requirement or add _build_jit_cache_requirement_for_nightly)
that returns the looser requirement when nightly is set; ensure the printed
requirement and pip args use this new requirement variable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7e3e084a-a262-4e22-b1ba-8cc3a970463e

📥 Commits

Reviewing files that changed from the base of the PR and between 1aa32d0 and f5dc1e6.

📒 Files selected for processing (7)
  • .github/workflows/nightly-release.yml
  • .github/workflows/release.yml
  • README.md
  • docs/installation.rst
  • flashinfer-jit-cache/build_backend.py
  • flashinfer/__main__.py
  • scripts/update_whl_index.py

Comment thread flashinfer-jit-cache/build_backend.py Outdated
Comment thread flashinfer/__main__.py
Comment thread flashinfer/__main__.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/pr-test.yml:
- Around line 244-247: The run step invokes ci/bash.sh with an unquoted
${DOCKER_IMAGE}, which triggers SC2086 (word-splitting); update the command to
quote the variable as "$DOCKER_IMAGE" in the invocation (e.g., change ci/bash.sh
${DOCKER_IMAGE} --no-gpu ... to ci/bash.sh "$DOCKER_IMAGE" --no-gpu ...), and
make the same change in the equivalent rerun "Run Test" step that calls the same
command line so both occurrences use "$DOCKER_IMAGE".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f367040-09e4-47ff-8782-3ddd358e6b36

📥 Commits

Reviewing files that changed from the base of the PR and between f5dc1e6 and c90d5b7.

📒 Files selected for processing (4)
  • .github/workflows/pr-test.yml
  • flashinfer-jit-cache/build_backend.py
  • flashinfer/__main__.py
  • scripts/task_test_jit_cache_package_build_import.sh

Comment thread .github/workflows/pr-test.yml Outdated
dierksen and others added 3 commits May 7, 2026 23:54
The Release workflow has a 'pull_request: paths: .github/workflows/
release.yml' trigger that runs the build jobs in dry-run mode whenever
release.yml changes. Its checkout used:

  ref: ${{ github.event_name == 'pull_request' && github.head_ref || inputs.tag }}

For a fork PR, github.head_ref resolves to a branch that doesn't exist
on flashinfer-ai/flashinfer (because actions/checkout defaults
'repository:' to the workflow's repo). 'git fetch' fails three times,
and the setup job dies before any actual build work runs.

The bug has been latent since flashinfer-ai#1910 (2025-10-10), where the trigger
and the buggy checkout were introduced together. It only fires on
fork-PRs that touch release.yml; PRs from branches on the main repo
work fine because the default 'repository:' already matches.

Fix all four affected checkouts to set 'repository:' explicitly to
the PR head's repo and pin to head.sha (which is also stable across
re-pushes during the run). On workflow_dispatch the existing
'inputs.tag' path is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dierksen
Copy link
Copy Markdown
Collaborator Author

dierksen commented May 8, 2026

/bot run

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flashinfer/jit/env.py`:
- Around line 126-127: The compatibility check currently uses
_public_package_version(flashinfer_version) !=
_public_package_version(flashinfer_jit_cache_version) but strips the
local-version suffix so different SM-family suffixes (e.g. .sm9x vs .sm12x) are
ignored; update the logic to, when CUDA is available, extract the sm* suffix
from flashinfer_jit_cache_version (e.g. via a small regex on the local-version
segment) and compare it to the detected device family (use your CUDA detection
helper / device-family variable); if the sm suffix is present and does not match
the detected device family, raise the same incompatibility error (or fail fast)
instead of proceeding, while falling back to the existing
_public_package_version check for non-CUDA cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 20101ea1-0e1a-416e-84cc-231857008350

📥 Commits

Reviewing files that changed from the base of the PR and between 67adb48 and 5c7081e.

📒 Files selected for processing (7)
  • .github/workflows/nightly-release.yml
  • .github/workflows/pr-test.yml
  • .github/workflows/release.yml
  • flashinfer/__main__.py
  • flashinfer/jit/env.py
  • scripts/task_test_jit_cache_package_build_import.sh
  • tests/cli/test_cli_cmds.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • scripts/task_test_jit_cache_package_build_import.sh
  • .github/workflows/nightly-release.yml
  • .github/workflows/pr-test.yml
  • .github/workflows/release.yml

Comment thread flashinfer/jit/env.py Outdated
@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !652 has been created, and the CI pipeline #50715933 is currently running. I'll report back once the pipeline job completes.

Copy link
Copy Markdown
Member

@kahyunnam kahyunnam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: the 3 GPU SM families in the PR description, I think @aleozlx mentioned earlier in the thread that each device typically requires 8.0 plus their native arch -- should we add sm80a compialtion to sm10x and sm12x subwheels as well?

Comment thread flashinfer/__main__.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]GitHub Release upload fails for flashinfer-jit-cache cu130 x86_64 wheel due to 2 GiB asset limit

3 participants