-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add ROCm support. #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ROCm support. #393
Changes from all commits
e775828
d6bc36d
4c2ac64
aba6823
4f6cf04
964342c
9878250
7b86dd1
66df97a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,8 +8,14 @@ void BBoxOverlapsCUDAKernelLauncher(const Tensor bboxes1, const Tensor bboxes2, | |
int num_bbox1 = bboxes1.size(0); | ||
int num_bbox2 = bboxes2.size(0); | ||
|
||
#ifdef __NVCC__ | ||
at::cuda::CUDAGuard device_guard(bboxes1.device()); | ||
cudaStream_t stream = at::cuda::getCurrentCUDAStream(); | ||
#endif | ||
#ifdef __HIP_PLATFORM_HCC__ | ||
// at::cuda::HIPGuard device_guard(bboxes1.device()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why comment out HIPGuard? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because pytorch handles ROCm as CUDA, it will cause assert device type error if not comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems that here should use HIPGuardImplMasqueradingAsCUDA.h and getCurrentHIPStreamMasqueradingAsCUDA? |
||
hipStream_t stream = at::cuda::getCurrentHIPStream(); | ||
#endif | ||
AT_DISPATCH_FLOATING_TYPES_AND_HALF( | ||
bboxes1.scalar_type(), "bbox_overlaps_cuda_kernel", ([&] { | ||
bbox_overlaps_cuda_kernel<scalar_t> | ||
|
@@ -18,5 +24,10 @@ void BBoxOverlapsCUDAKernelLauncher(const Tensor bboxes1, const Tensor bboxes2, | |
ious.data_ptr<scalar_t>(), num_bbox1, num_bbox2, mode, aligned, | ||
offset); | ||
})); | ||
#ifdef __NVCC__ | ||
AT_CUDA_CHECK(cudaGetLastError()); | ||
#endif | ||
#ifdef __HIP_PLATFORM_HCC__ | ||
AT_CUDA_CHECK(hipGetLastError()); | ||
#endif | ||
} |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__syncwarp() in order to avoid read-write conflict of output_val, when warpReduceSum reading output_val from other threads within a warp.
AMD HIP doesn't support __syncwarp() now. Using __syncthreads() instead is ok, although bringing a few performance decrease.