Use cudnn 9.23 new API to query workspace with override shape#3291
Use cudnn 9.23 new API to query workspace with override shape#3291yanqinz2 wants to merge 2 commits into
Conversation
📝 WalkthroughWalkthroughRunner factories (BF16/FP8/MXFP8/FP4) now compute effective AutoTuner mappers internally; override-shape availability and workspace sizing are cuDNN-backend-version aware; standard and override execution helpers resize workspaces and pass explicit cudnn handles; call sites updated to new runner signatures. ChangescuDNN GEMM Runner Mapper Internalization
Possibly related PRs
Suggested labels
Suggested reviewers
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors the cuDNN GEMM implementation by moving the effective_m_bucket_mapper logic into runner factories and updating version checks to support cuDNN backend 9.23.0 with frontend 1.24. It also introduces a helper function for workspace size calculation. Feedback focuses on using workspace.resize_() instead of reassigning the local variable to ensure in-place updates are reflected in the caller's reference, which avoids repeated allocations and skewed performance measurements during autotuning.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 2094-2101: The probe that checks cudnn.backend_version() and
parses cudnn.__version__ currently swallows all exceptions with a bare "except
Exception" which can mask real errors; update the exception handling in that
block (the code referencing backend_version, version_str, major, minor,
required_frontend_version, and cudnn.__version__) to catch only the expected
failure types (e.g., AttributeError, ValueError, TypeError, OSError) when
probing/parsing and let other exceptions propagate (or re-raise) so unexpected
errors aren't hidden—replace the broad except Exception with a tuple of these
specific exceptions and handle/log them accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a666c109-a296-4313-af7c-5b21db207d1f
📒 Files selected for processing (1)
flashinfer/gemm/gemm_base.py
📌 Description
Description
This MR makes two cuDNN GEMM backend cleanups/improvements:
Move effective M-bucket mapper lookup into the cuDNN runners
The effective
map_to_tuning_bucketslookup is now owned by the cuDNN GEMM runners instead of the higher-level GEMM dispatch functions. This keeps the bucket-to-cache_mlogic local to the backend that uses it, while still respecting active autotune overrides such as customtuning_buckets/round_up.Query override-shape workspace size dynamically on cuDNN 9.23+
For cuDNN override-shape GEMM execution, cuDNN 9.23+ can query the workspace requirement for the actual runtime problem shape via
get_workspace_size_plan_at_index(...)with override shapes and strides. The code now uses this API when available, so workspace allocation matches the executed dynamic problem size. Older cuDNN versions continue to query workspace by execution plan index without override-shape metadata.🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit