Skip to content

[Feature]: Add preflight check for dcgmi diag #763

@XRFXLP

Description

@XRFXLP

Prerequisites

  • I searched existing issues

Feature Summary

Add preflight check (and init container) for dcgmi diag with configurable level

Problem/Use Case

dcgmi diag run various targeted test to find out latent GPUs issues that would be helpful to run pre-flight

Proposed Solution

Should be a pluggable init container that sends health event depending on the fail/pass/warning

Component

Health Monitor

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions