Skip to content

Enable smem_merge_branch_allocs option on branched mega-kernel examples#3304

Open
LongshengDu wants to merge 1 commit into
NVIDIA:mainfrom
LongshengDu:update_launch_branch_allocs
Open

Enable smem_merge_branch_allocs option on branched mega-kernel examples#3304
LongshengDu wants to merge 1 commit into
NVIDIA:mainfrom
LongshengDu:update_launch_branch_allocs

Conversation

@LongshengDu

Copy link
Copy Markdown
Contributor

4.6.dev version align kernel shared memory allocator default behavior with CUDA C++ (i.e. all static smem allocation will be counted).

Since mega-kernel approach has 2 mutually exclusive code branches, only one path runs per launch, thus new launch option smem_merge_branch_allocs is introduced for mega-kernel to enable shared memory reuse between two paths.

OFF (default)          ON
┌────────────┐         ┌────────────┐ ┐
│ Branch A   │ 512B    │ Branch A   │ │
├────────────┤         ├────────────┤ ├─ reuse same region
│ Branch B   │ 512B    │ Branch B   │ │
└────────────┘         └────────────┘ ┘
Total = 1024B          Total = 512B  

@LongshengDu LongshengDu changed the title Enable smem_merge_branch_allocs option to branched mega-kernel launch Enable smem_merge_branch_allocs option on branched mega-kernel examples Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant