Skip to content

identify device name by chip id#2325

Open
JaxChen29 wants to merge 1 commit intoROCm:mainfrom
JaxChen29:fix_device_name_get
Open

identify device name by chip id#2325
JaxChen29 wants to merge 1 commit intoROCm:mainfrom
JaxChen29:fix_device_name_get

Conversation

@JaxChen29
Copy link
Contributor

Motivation

In DPX (Dynamic Partition eXecution) mode, the visible CU count changes depending on the partition configuration. The previous device identification logic relied on hardcoded CU counts (multiProcessorCount == 304 for MI300, == 80 || == 64 for MI308) to select the correct pre-compiled kernel binary (.co file). When running in DPX mode, the CU count no longer matches these expected values, causing device identification to fail and the wrong kernel path to be selected -- resulting in "file not found" errors and kernel launch failures.

Technical Details

Replace CU-count-based MI300/MI308 GPU identification with PCI Chip ID detection via hipDeviceAttributePciChipId. PCI Chip ID is a hardware constant burned into the silicon that never changes regardless of DPX partition mode, CU masking, or container environments.
MI308 device IDs (0x74A2, 0x74A8, 0x74B6, 0x74BC) are identified from the official AMD device ID registry; all other gfx942 devices default to MI300.

csrc/include/aiter_hip_common.h: Add get_pci_chip_id() helper using hipDeviceGetAttribute(hipDeviceAttributePciChipId) and is_mi308_device() that checks against known MI308 chip IDs.
csrc/cpp_itfs/mha_fwd.cu: Update get_kernel_co_name() to use is_mi308_device() instead of CU count comparison for selecting the correct .co kernel binary path (MI308/ vs MI300/).
aiter/jit/utils/chip_info.py: Add _get_pci_chip_id() using ctypes to call hipDeviceGetAttribute directly, and update get_device_name() to use chip ID instead of CU count.

Test Plan

Test Result

Submission Checklist

@JaxChen29 JaxChen29 requested a review from a team March 18, 2026 06:57
@JaxChen29 JaxChen29 force-pushed the fix_device_name_get branch from d6252d9 to 6f3c536 Compare March 18, 2026 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant