Skip to content

Fix [Spark unit test CI]: defer torch._dynamo.disable to avoid import-time crash in CI#3290

Open
kahyunnam wants to merge 1 commit into
flashinfer-ai:mainfrom
kahyunnam:knam/fix-defer-torch-dynamo-disable
Open

Fix [Spark unit test CI]: defer torch._dynamo.disable to avoid import-time crash in CI#3290
kahyunnam wants to merge 1 commit into
flashinfer-ai:mainfrom
kahyunnam:knam/fix-defer-torch-dynamo-disable

Conversation

@kahyunnam
Copy link
Copy Markdown
Member

@kahyunnam kahyunnam commented May 11, 2026

📌 Description

  • feat(moe): add SM120 W4A16 b12x kernels #3271 (feat(moe): add SM120 W4A16 b12x kernels) added a @torch._dynamo.disable decorator to current_cuda_stream() in cute_dsl/utils.py. This eagerly imports torch._dynamo at module load time, which triggers getpass.getuser() during cache-dir initialization. This crashes in CI containers running as unmapped UIDs (e.g. Spark runners with -u $(id -u):$(id -g) mapping to UID 996, which has no /etc/passwd entry).
  • Replaces the eager decorator with a self-replacing lazy wrapper that defers torch._dynamo.disable to the first call of current_cuda_stream(), with zero overhead on subsequent calls.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Performance
    • Improved module import performance by deferring CUDA stream initialization until first use, reducing startup overhead.

Review Change Stack

… in CI

The @torch._dynamo.disable decorator on current_cuda_stream() triggered
torch._dynamo import at module load time, which initializes
torch._inductor's cache directory via getpass.getuser(). This fails in
CI containers running with -u $(id -u):$(id -g) when the UID has no
/etc/passwd entry (KeyError: 'getpwuid(): uid not found: 996').

Use a self-replacing lazy wrapper so torch._dynamo.disable is applied on
first call rather than at import time.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

Optimizes module initialization by deferring torch._dynamo.disable wrapper creation until first call to current_cuda_stream(), reducing import-time overhead while preserving function behavior.

Changes

Lazy torch._dynamo Initialization

Layer / File(s) Summary
Deferred Wrapper Creation
flashinfer/cute_dsl/utils.py
current_cuda_stream() is refactored to defer torch._dynamo.disable application. A private _current_cuda_stream_impl() holds the CUDA stream retrieval logic. The public current_cuda_stream() wraps itself with the decorator on first invocation, then calls the wrapped function, eliminating import-time torch._dynamo initialization.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • kaixih
  • aleozlx
  • yzh119
  • jimmyzho

Poem

🐰 Ah, what cunning deferment!
Load the module, light as air,
Wrap it once, when first you dare,
No torch._dynamo at import,
CUDA streams, lazily caught.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: deferring torch._dynamo.disable to fix CI crashes in Spark unit tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively explains the problem and solution, including the root cause of the CI crash and the implementation approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kahyunnam
Copy link
Copy Markdown
Member Author

/bot run

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the current_cuda_stream function in flashinfer/cute_dsl/utils.py to implement a lazy wrapper for the torch._dynamo.disable decorator. This change prevents torch._dynamo from being imported at module load time, which addresses potential failures in container environments running with unmapped UIDs. I have no feedback to provide as no review comments were submitted.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !661 has been created, and the CI pipeline #50966858 is currently running. I'll report back once the pipeline job completes.

Copy link
Copy Markdown
Collaborator

@nv-yunzheq nv-yunzheq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve, please merge it after CI comes clean

@kahyunnam kahyunnam added v0.6.11 release blocker label for 0.6.11 and removed v0.6.12 labels May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch: DGX Spark v0.6.11 release blocker label for 0.6.11

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants