An out-of-tree Execution Provider for ONNXRuntime that uses AMD's hipDNN library for accelerated inference on AMD GPUs.
Work in Progress - This is a prototype implementation.
Currently supported operations:
- Conv (2D convolution)
- CMake 3.20+
- Ninja build system
- HIP SDK (from TheRock)
- hipDNN library (from TheRock)
- ONNXRuntime (source and built library)
- iree-compile (required by hipDNN backend for code generation)
export THEROCK_DIST="/path/to/TheRock/build/dist/rocm"
export ONNXRUNTIME_ROOT="/path/to/onnxruntime"
# iree-compile must be in PATH
export PATH="/path/to/iree/build/tools:$PATH"cd hipDNNEP
# Configure
cmake --preset RelWithDebInfo
# Build
cmake --build --preset RelWithDebInfoctest --preset RelWithDebInfo#include <onnxruntime_cxx_api.h>
int main() {
Ort::InitApi(OrtGetApiBase()->GetApi(ORT_API_VERSION));
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "example");
// Register the hipDNN EP library
OrtStatus* status = Ort::GetApi().RegisterExecutionProviderLibrary(
env, "HipDNN", "/path/to/libhipdnn_ep.so");
if (status != nullptr) {
// Handle error
Ort::GetApi().ReleaseStatus(status);
return 1;
}
// Get available EP devices
std::vector<Ort::ConstEpDevice> devices = env.GetEpDevices();
// Find HipDNN device
const OrtEpDevice* hipdnn_device = nullptr;
for (const auto& device : devices) {
if (device.EpName() == "HipDNN") {
hipdnn_device = static_cast<const OrtEpDevice*>(device);
break;
}
}
// Create session options and append EP
Ort::SessionOptions session_options;
Ort::GetApi().SessionOptionsAppendExecutionProvider_V2(
session_options, env, &hipdnn_device, 1, nullptr, nullptr, 0);
// Create session
Ort::Session session(env, "model.onnx", session_options);
// Run inference
// ...
return 0;
}This EP uses the ONNXRuntime Plugin EP V2 system, which allows:
- Building as a separate shared library
- Dynamic loading at runtime
- No modifications to ONNXRuntime source
- EP Factory (
HipDNNEpFactory): Creates EP instances and manages device discovery - EP (
HipDNNEp): Main execution provider, handles graph partitioning and compilation - Kernel (
Kernel): Builds hipDNN graph from ONNX nodes and executes inference - NodeComputeInfo: ORT callback interface for kernel lifecycle
- Allocator (
HipDeviceAllocator): HIP device memory allocation - Data Transfer (
HipDataTransfer): CPU <-> GPU data copies
hipDNN uses a graph-based execution model:
- Build operation graph from ONNX nodes (conv_fprop, etc.)
- Validate and create execution plans
- Execute with variant pack (tensor uid -> device pointer mapping)
The Kernel class maintains a symbol table mapping ONNX value names to hipDNN TensorAttributes,
enabling multi-node graph construction and future op fusion.
MIT License