build(jit-cache): split flashinfer-jit-cache wheels by SM family by dierksen · Pull Request #3265 · flashinfer-ai/flashinfer

dierksen · 2026-05-07T23:25:13Z

The cu130 flashinfer-jit-cache wheel grew to 2.0 GB and started failing to upload as a GitHub Release asset (per-asset 2 GiB limit; see #3257). Each new SM target appended ~150-200 MB compressed to every wheel, and cu130 carries 8 (sm75/80/89/90a/100a/103a/110a/120f).

Split each (CUDA, CPU-arch) wheel into three by GPU SM family:

sm9x - Ampere/Ada/Hopper (<= sm90a)
sm10x - Datacenter Blackwell (sm100a/103a/110a)
sm12x - Consumer Blackwell (sm120f, future sm121a)

Same package name everywhere; the family is encoded in the PEP 440 local-version, so wheels resolve as e.g. 'flashinfer-jit-cache== 0.6.11+cu130.sm10x'. Existing 'pip install flashinfer-jit-cache' still works once the right pin is given.

Driven by a new 'flashinfer install-jit-cache-wheel' subcommand that detects FlashInfer version, CUDA version, and GPU compute capability (via torch.cuda.get_device_capability) and runs the matching pip install. Honors --cuda-version, --sm-family, --nightly, --dry-run. Modeled on the CLI scaffolding from #3142 with the family dimension added.

Build side: 'FLASHINFER_JIT_CACHE_SM_FAMILY' env var, when set, filters 'FLASHINFER_CUDA_ARCH_LIST' to the family's archs and appends '.' to the local-version suffix. Release / nightly workflows gain an 'sm_family' matrix dimension; the upload-to-release loop iterates over all three families. The wheel-index regex accepts the new local-version shape and remains compatible with the legacy '+cuXY' format.

Closes #3257

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Testing Done

Local DGX Spark validation:

Confirmed host is aarch64 with NVIDIA GB10, compute capability 12.1; PyTorch sees CUDA 13.0 and reports the device as (12, 1).
Ran flashinfer install-jit-cache-wheel --dry-run from the source checkout. It now resolves from version.txt when package metadata is 0.0.0+unknown, detects CUDA 13.0, selects sm12x, and prints flashinfer-jit-cache==0.6.11+cu130.sm12x.
Verified uv-style venv behavior: this environment has no python -m pip, so the CLI now falls back to uv pip install --python ....
Ran the live install path against https://flashinfer.ai/whl/cu130; it reached the resolver cleanly and failed only because 0.6.11+cu130.sm12x is not yet published.
Downloaded the PR-built jit-cache-cu129-aarch64-sm12x artifact from Release run 25528598737, served it through a local simple index, installed it with explicit --cuda-version cu129 --sm-family sm12x --index-url ..., imported flashinfer_jit_cache, verified FLASHINFER_AOT_DIR points at the installed package cache, then uninstalled it because the local machine is CUDA 13.0.
Checked the Release run artifacts: the matching jit-cache-cu130-aarch64-sm12x artifact is absent because that job failed while downloading build dependencies (IncompleteRead for nvidia_cublas), not because of the CLI install path.

Merge/conflict validation:

Merged current upstream/main and resolved conflicts in pr-test.yml, release.yml, and nightly-release.yml, preserving both FLASHINFER_JIT_CACHE_SM_FAMILY forwarding and upstream sccache/NVCC env forwarding.
Parsed the touched workflow YAML files with yaml.safe_load.
Ran git diff --cached --check.
Ran python -m py_compile flashinfer/__main__.py flashinfer/jit/env.py tests/cli/test_cli_cmds.py.
Ran python -m pytest tests/cli/test_cli_cmds.py -q (18 passed).

Review-comment follow-up:

Reviewed the unresolved Gemini and CodeRabbit comments. Addressed Gemini's arch-list parsing case by accepting integer-style entries like 90, 100, and 120 in the SM-family filter, and moved the duplicated SM-family helpers into build_utils.py for reuse by both the CLI and jit-cache build backend.
Addressed CodeRabbit's --nightly concern by rejecting nightly installs when the resolved FlashInfer version is not a dev release; dev versions still exact-pin the matching SM-family wheel, e.g. flashinfer-jit-cache==0.6.11.dev20260508+cu130.sm12x.
Ran python -m py_compile build_utils.py flashinfer/__main__.py flashinfer-jit-cache/build_backend.py tests/cli/test_cli_cmds.py.
Ran python -m pytest tests/cli/test_cli_cmds.py -q (21 passed, with the expected PyTorch GB10 capability warning from this host's torch build).
Ran git diff --check.
Ran release CLI dry-run: flashinfer install-jit-cache-wheel --cuda-version cu130 --sm-family sm12x --dry-run, which resolves flashinfer-jit-cache==0.6.11+cu130.sm12x and the uv pip install --python ... command.
Ran nightly CLI dry-runs for both paths: stable 0.6.11 now fails early with the new explanatory error, while explicit 0.6.11.dev20260508 resolves flashinfer-jit-cache==0.6.11.dev20260508+cu130.sm12x against https://flashinfer.ai/whl/nightly/cu130 with --pre.

Human feedback follow-up:

Updated flashinfer install-jit-cache-wheel autodetection to inspect every visible CUDA device instead of only device 0. It selects a wheel only when the visible GPUs are covered by one jit-cache SM-family wheel, and otherwise fails with guidance to pass --sm-family or build from source with an explicit FLASHINFER_CUDA_ARCH_LIST.
Treated Blackwell-family wheels as sm80 base arch plus native Blackwell archs. The build-side family filter now keeps/adds 8.0 for sm10x and sm12x only when a native arch for that family is present.
Kept sm12x default arch lists on 12.0f; the family-specific sm120f target covers DGX Spark / GB10 (sm121) without adding an exact 12.1a target by default.
Added installed-wheel compatibility validation for flashinfer-jit-cache local-version SM suffixes. On CUDA hosts, a wrong-family installed wheel now fails fast; on this DGX Spark, 0.6.11+cu130.sm12x validates and 0.6.11+cu130.sm9x fails with an expected-family error.
Reused the shared Python family filter in the AOT build/import test script instead of maintaining a second shell implementation.
Ran python -m py_compile build_utils.py flashinfer/__main__.py flashinfer/jit/env.py flashinfer-jit-cache/build_backend.py tests/cli/test_cli_cmds.py.
Ran python -m pytest tests/cli/test_cli_cmds.py -q (27 passed, with the expected PyTorch GB10 capability warning from this host's torch build).
Ran uvx ruff check ... and uvx ruff format --check ... over the touched Python files.
Ran git diff --check, bash -n scripts/task_test_jit_cache_package_build_import.sh, and parsed the touched workflow YAML files with yaml.safe_load.
Verified the family filter outputs sm10x: 8.0 10.0a 10.3a 11.0a and sm12x: 8.0 12.0f for the CUDA 13.0 release arch list.
Re-ran the DGX Spark CLI dry-run; it still detects sm12x and resolves flashinfer-jit-cache==0.6.11+cu130.sm12x using the uv pip install --python ... fallback.

SM121 target cleanup:

Removed the temporary 12.1a additions from release/nightly/default jit-cache arch lists and docs; sm12x now defaults to 8.0 12.0f.
Kept explicit 12.1a support in the parser/filter if a user supplies it manually, but release artifacts no longer build it by default.
Re-ran python -m pytest tests/cli/test_cli_cmds.py -q (27 passed), uvx ruff check tests/cli/test_cli_cmds.py, uvx ruff format --check tests/cli/test_cli_cmds.py, bash -n scripts/task_test_jit_cache_package_build_import.sh, git diff --check, workflow YAML parsing, and the DGX Spark CLI dry-run.

SM110 architecture split:

Limited 11.0a / sm110 jit-cache build coverage to CUDA 13.0 aarch64 release, nightly, and PR AOT build/import arch lists. CUDA 13.0 x86_64 lists now omit 11.0a.
Updated source-build docs to omit 11.0a from the generic x86-oriented examples and call out adding it for Jetson AGX Thor / T5000 aarch64 targets.
Re-ran workflow YAML parsing, bash -n scripts/task_test_jit_cache_package_build_import.sh, git diff --check, uvx ruff check tests/cli/test_cli_cmds.py, and python -m pytest tests/cli/test_cli_cmds.py -q (27 passed).
Verified the generated CUDA 13 arch lists: x86_64 -> 7.5 8.0 8.9 9.0a 10.0a 10.3a 12.0f; aarch64 -> 7.5 8.0 8.9 9.0a 10.0a 10.3a 11.0a 12.0f.

Summary by CodeRabbit

New Features
- Added a CLI command to detect CUDA/SM family and install per‑SM‑family JIT‑cache wheels (nightly, dry‑run, and explicit overrides supported).
Documentation
- Updated README and installation docs with the new CLI, per‑SM‑family wheel scheme, offline init, source‑build flags, and nightly install guidance.
Tests / CI
- CI matrices, workflows, and tests updated to build, name, upload, and consume JIT‑cache artifacts per (CUDA, arch, SM‑family).
Chores
- Build/index tooling and version handling improved to produce and recognize SM‑family segmented wheels.

The cu130 flashinfer-jit-cache wheel grew to 2.0 GB and started failing to upload as a GitHub Release asset (per-asset 2 GiB limit; see flashinfer-ai#3257). Each new SM target appended ~150-200 MB compressed to every wheel, and cu130 carries 8 (sm75/80/89/90a/100a/103a/110a/120f). Split each (CUDA, CPU-arch) wheel into three by GPU SM family: - sm9x - Ampere/Ada/Hopper (<= sm90a) - sm10x - Datacenter Blackwell (sm100a/103a/110a) - sm12x - Consumer Blackwell (sm120f, future sm121a) Same package name everywhere; the family is encoded in the PEP 440 local-version, so wheels resolve as e.g. 'flashinfer-jit-cache== 0.6.11+cu130.sm10x'. Existing 'pip install flashinfer-jit-cache' still works once the right pin is given. Driven by a new 'flashinfer install-jit-cache-wheel' subcommand that detects FlashInfer version, CUDA version, and GPU compute capability (via torch.cuda.get_device_capability) and runs the matching pip install. Honors --cuda-version, --sm-family, --nightly, --dry-run. Modeled on the CLI scaffolding from flashinfer-ai#3142 with the family dimension added. Build side: 'FLASHINFER_JIT_CACHE_SM_FAMILY' env var, when set, filters 'FLASHINFER_CUDA_ARCH_LIST' to the family's archs and appends '.<family>' to the local-version suffix. Release / nightly workflows gain an 'sm_family' matrix dimension; the upload-to-release loop iterates over all three families. The wheel-index regex accepts the new local-version shape and remains compatible with the legacy '+cuXY' format. Closes flashinfer-ai#3257 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-07T23:25:32Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds SM-family-aware JIT-cache wheel distribution: utilities to map/filter CUDA arches by SM family, build backend and metadata changes, a new install CLI with autodetection, CI matrix/artifact updates, wheel-index parsing, docs updates, and corresponding tests.

Changes

SM Family JIT Cache Wheels

Layer / File(s)	Summary
SM Family Utilities `build_utils.py`	Adds `SM_FAMILY_ORDER`, `sm_family_for_capability`, `parse_cuda_arch_entry`, and `filter_arch_list_for_sm_family` to parse and filter CUDA arch lists by SM family.
Build Backend SM Filtering `flashinfer-jit-cache/build_backend.py`	Reads/validates `FLASHINFER_JIT_CACHE_SM_FAMILY`, filters `FLASHINFER_CUDA_ARCH_LIST`, errors if empty, and appends the family suffix to local build metadata; regenerates metadata during prepare.
Install CLI & Detection `flashinfer/__main__.py`	Adds `install-jit-cache-wheel` command, PyTorch-based SM-family detection, CUDA-version parsing/normalization, requirement/index construction, and install command execution (supports `--nightly` and `--dry-run`).
Release Workflow `.github/workflows/release.yml`	Adds `sm_family` axis to release build matrix, prints `sm_family`, sets `FLASHINFER_JIT_CACHE_SM_FAMILY` in container env, updates artifact names to include `-{sm_family}`, and expands release upload loop to iterate over `sm_family`.
Nightly Workflow `.github/workflows/nightly-release.yml`	Extends nightly jit-cache matrix with `sm_family`, echoes and injects `FLASHINFER_JIT_CACHE_SM_FAMILY`, includes `sm_family` in artifact names, and iterates downloads/uploads per `(cuda, arch, sm_family)` with cleanup.
PR Test Matrix & Job Wiring `.github/workflows/pr-test.yml`	Expands AOT Build Import test matrix with `sm_family` (with excludes), updates job display names, injects `FLASHINFER_JIT_CACHE_SM_FAMILY` into test runs and rerun generation.
Test Script Arch Filtering `scripts/task_test_jit_cache_package_build_import.sh`	Optionally filters computed `FLASHINFER_CUDA_ARCH_LIST` by `FLASHINFER_JIT_CACHE_SM_FAMILY`, validates values, errors if none remain, and re-exports the filtered list.
Wheel Index Parsing `scripts/update_whl_index.py`	Extends wheel-filename regex for `flashinfer_jit_cache` to accept an optional `.sm...` qualifier after the `+cuXXX` CUDA metadata segment.
JIT Env Version Normalization `flashinfer/jit/env.py`	Adds `_public_package_version()` and uses it when comparing package versions for cubin and jit-cache directory selection.
Documentation Updates `README.md`, `docs/installation.rst`	Replace hardcoded CUDA-specific pip instructions with `flashinfer install-jit-cache-wheel` CLI guidance; document per-(CUDA, SM family) wheels, supported families, Blackwell/cu13 notes, and source/nightly build flags.
CLI & Utility Tests `tests/cli/test_cli_cmds.py`	Adds tests for `install-jit-cache-wheel` dry-run/nightly behavior, version resolution from `version.txt`, `pip` fallback to `uv`, public-version normalization, and arch-list filtering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

flashinfer-ai/flashinfer#2028: Modifies CUDA-architecture selection and CI scripts overlapping with arch/matrix logic.
flashinfer-ai/flashinfer#2326: Related changes to .github/workflows/pr-test.yml and CI matrix logic.
flashinfer-ai/flashinfer#1848: Related CLI and tests changes touching flashinfer/main and cli tests.

Suggested labels

run-ci

Suggested reviewers

yzh119
yongwww
sricketts
cyx-6
aleozlx
nvmbreughe

"I’m a rabbit in the CI glen, I split the wheels by family then ran,
sm9x, sm10x, sm12x in a tidy band,
builds now filter, CLI finds the right hand,
docs updated, tests hop along—cheers from this fluffy dev land!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'build(jit-cache): split flashinfer-jit-cache wheels by SM family' clearly and concisely describes the main change—splitting JIT cache wheels by GPU SM family to address size constraints.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	PR description comprehensively covers changes, testing, validation, and addresses review feedback with detailed technical justification.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new CLI command, flashinfer install-jit-cache-wheel, to automate the installation of flashinfer-jit-cache wheels by autodetecting the CUDA version and GPU SM family. This update supports a new distribution model where wheels are split by SM family to comply with GitHub's asset size limits. Feedback from the review suggests improving the robustness of CUDA architecture parsing to handle various string formats and refactoring duplicated SM family logic into a common utility to enhance maintainability.

Mirrors the per-family split in release/nightly so PR CI actually exercises the per-family build path. Was previously running one job per (cuda, arch) which still built every arch; now runs three jobs per (cuda, arch) — one per SM family — each compiling only its family's archs. - pr-test.yml: 'aot-build-import' and 'aot-build-import-rerun' gain 'sm_family: [sm9x, sm10x, sm12x]'. cu126 is excluded for sm10x and sm12x because that toolkit only supports archs <= sm90. The rerun matrix builder mirrors the same exclude. FLASHINFER_JIT_CACHE_SM_FAMILY is forwarded into the test container via ci/bash.sh's '-e' flag. - task_test_jit_cache_package_build_import.sh: when FLASHINFER_JIT_CACHE_SM_FAMILY is set, filter FLASHINFER_CUDA_ARCH_LIST to that family's archs before running the wheel build and verify_all_modules_compiled.py. The build-side filter in build_backend.py mutates os.environ inside its own process only, so doing it once in the parent shell ensures both subprocesses see the same arch list. Also fix black formatting flagged by pre-commit on PR flashinfer-ai#3265: - build_backend.py: rewrite SM_FAMILIES lambdas as named functions to avoid black's awkward multi-line break of '<' chained comparisons. - __main__.py: collapse a ClickException to single-line per black's preference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flashinfer-jit-cache/build_backend.py`:
- Around line 38-94: Run the project formatter (e.g. ruff format or pre-commit
run --all-files) and commit the resulting changes so the SM_FAMILIES dict and
the multi-line print in _apply_sm_family_filter are formatted to satisfy ruff;
specifically reformat the SM_FAMILIES declaration and the print(...) call in
_apply_sm_family_filter (and any other affected lines) and push the reformatted
file so CI passes.

In `@flashinfer/__main__.py`:
- Around line 267-279: Re-run the project's formatter (ruff format) to apply the
canonical formatting for the Click exception lines in the CUDA-version parsing
block: ensure the click.ClickException(...) call around the InvalidVersion
exception handling and the earlier validation (the calls that raise
click.ClickException when normalized startswith "cu" and in the except block
that wraps InvalidVersion) are formatted according to ruff so the pre-commit
check passes; after formatting, stage and commit the changes.
- Around line 350-403: The current install_jit_cache_wheel_cmd builds an exact
pinned requirement from resolved_flashinfer_version which breaks when --nightly
points at nightly index but the installed __version__ is a stable release;
modify install_jit_cache_wheel_cmd to detect nightly and, if nightly is True and
resolved_flashinfer_version is a release (no "dev" or "+"), construct a range
requirement instead of an exact pin (e.g.
"flashinfer-jit-cache>={base},<{next_major_or_minor}") by parsing
resolved_flashinfer_version with packaging.version to compute the next version
bound, or alternatively call a new flag-aware helper (update
_build_jit_cache_requirement or add _build_jit_cache_requirement_for_nightly)
that returns the looser requirement when nightly is set; ensure the printed
requirement and pip args use this new requirement variable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7e3e084a-a262-4e22-b1ba-8cc3a970463e

📥 Commits

Reviewing files that changed from the base of the PR and between 1aa32d0 and f5dc1e6.

📒 Files selected for processing (7)

.github/workflows/nightly-release.yml
.github/workflows/release.yml
README.md
docs/installation.rst
flashinfer-jit-cache/build_backend.py
flashinfer/__main__.py
scripts/update_whl_index.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/pr-test.yml:
- Around line 244-247: The run step invokes ci/bash.sh with an unquoted
${DOCKER_IMAGE}, which triggers SC2086 (word-splitting); update the command to
quote the variable as "$DOCKER_IMAGE" in the invocation (e.g., change ci/bash.sh
${DOCKER_IMAGE} --no-gpu ... to ci/bash.sh "$DOCKER_IMAGE" --no-gpu ...), and
make the same change in the equivalent rerun "Run Test" step that calls the same
command line so both occurrences use "$DOCKER_IMAGE".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f367040-09e4-47ff-8782-3ddd358e6b36

📥 Commits

Reviewing files that changed from the base of the PR and between f5dc1e6 and c90d5b7.

📒 Files selected for processing (4)

.github/workflows/pr-test.yml
flashinfer-jit-cache/build_backend.py
flashinfer/__main__.py
scripts/task_test_jit_cache_package_build_import.sh

The Release workflow has a 'pull_request: paths: .github/workflows/ release.yml' trigger that runs the build jobs in dry-run mode whenever release.yml changes. Its checkout used: ref: ${{ github.event_name == 'pull_request' && github.head_ref || inputs.tag }} For a fork PR, github.head_ref resolves to a branch that doesn't exist on flashinfer-ai/flashinfer (because actions/checkout defaults 'repository:' to the workflow's repo). 'git fetch' fails three times, and the setup job dies before any actual build work runs. The bug has been latent since flashinfer-ai#1910 (2025-10-10), where the trigger and the buggy checkout were introduced together. It only fires on fork-PRs that touch release.yml; PRs from branches on the main repo work fine because the default 'repository:' already matches. Fix all four affected checkouts to set 'repository:' explicitly to the PR head's repo and pin to head.sha (which is also stable across re-pushes during the run). On workflow_dispatch the existing 'inputs.tag' path is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dierksen · 2026-05-08T20:00:11Z

/bot run

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flashinfer/jit/env.py`:
- Around line 126-127: The compatibility check currently uses
_public_package_version(flashinfer_version) !=
_public_package_version(flashinfer_jit_cache_version) but strips the
local-version suffix so different SM-family suffixes (e.g. .sm9x vs .sm12x) are
ignored; update the logic to, when CUDA is available, extract the sm* suffix
from flashinfer_jit_cache_version (e.g. via a small regex on the local-version
segment) and compare it to the detected device family (use your CUDA detection
helper / device-family variable); if the sm suffix is present and does not match
the detected device family, raise the same incompatibility error (or fail fast)
instead of proceeding, while falling back to the existing
_public_package_version check for non-CUDA cases.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 20101ea1-0e1a-416e-84cc-231857008350

📥 Commits

Reviewing files that changed from the base of the PR and between 67adb48 and 5c7081e.

📒 Files selected for processing (7)

.github/workflows/nightly-release.yml
.github/workflows/pr-test.yml
.github/workflows/release.yml
flashinfer/__main__.py
flashinfer/jit/env.py
scripts/task_test_jit_cache_package_build_import.sh
tests/cli/test_cli_cmds.py

🚧 Files skipped from review as they are similar to previous changes (4)

scripts/task_test_jit_cache_package_build_import.sh
.github/workflows/nightly-release.yml
.github/workflows/pr-test.yml
.github/workflows/release.yml

flashinfer-bot · 2026-05-08T20:01:55Z

GitLab MR !652 has been created, and the CI pipeline #50715933 is currently running. I'll report back once the pipeline job completes.

kahyunnam

re: the 3 GPU SM families in the PR description, I think @aleozlx mentioned earlier in the thread that each device typically requires 8.0 plus their native arch -- should we add sm80a compialtion to sm10x and sm12x subwheels as well?

# Conflicts: # scripts/task_test_jit_cache_package_build_import.sh

dierksen requested review from aleozlx, bkryu, cyx-6, jimmyzho, kahyunnam, nv-yunzheq, saltyminty, samuellees, sricketts, yongwww, yyihuang and yzh119 as code owners May 7, 2026 23:25

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

Comment thread flashinfer-jit-cache/build_backend.py Outdated

Comment thread flashinfer-jit-cache/build_backend.py Outdated

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Comment thread flashinfer-jit-cache/build_backend.py Outdated

Comment thread flashinfer/__main__.py

Comment thread flashinfer/__main__.py

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Comment thread .github/workflows/pr-test.yml Outdated

dierksen and others added 3 commits May 7, 2026 23:54

Merge upstream main and harden jit-cache CLI

5c7081e

Address jit-cache review feedback

60974ee

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread flashinfer/jit/env.py Outdated

kahyunnam reviewed May 8, 2026

View reviewed changes

Comment thread flashinfer/__main__.py Outdated

dierksen added 4 commits May 8, 2026 15:22

Handle heterogeneous jit-cache wheel coverage

dafe452

Use sm120f as sm12x family target

1889027

Limit sm110 jit-cache builds to aarch64

d3d39bb

Merge remote-tracking branch 'upstream/main' into worktree-wheel-size

420f0f6

# Conflicts: # scripts/task_test_jit_cache_package_build_import.sh

Conversation

dierksen commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Testing Done

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dierksen commented May 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flashinfer-bot commented May 8, 2026

Uh oh!

kahyunnam left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dierksen commented May 7, 2026 •

edited

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading

kahyunnam left a comment •

edited

Loading