Prerequisites
Feature Summary
Add preflight check (and init container) for dcgmi diag with configurable level
Problem/Use Case
dcgmi diag run various targeted test to find out latent GPUs issues that would be helpful to run pre-flight
Proposed Solution
Should be a pluggable init container that sends health event depending on the fail/pass/warning
Component
Health Monitor