- Fine-Tune LLMs with Ray and DeepSpeed on OpenShift AI
- Fine-Tune Stable Diffusion with DreamBooth and Ray Train
- Hyperparameters Optimization with Ray Tune on OpenShift AI
-
Admin access to an OpenShift cluster (CRC is fine)
-
Installed OpenDataHub or RHOAI, enabled all Distributed Workload components
-
Installed Go 1.21
-
TEST_OUTPUT_DIR- Output directory for test logs -
TEST_TIMEOUT_SHORT- Timeout duration for short tasks -
TEST_TIMEOUT_MEDIUM- Timeout duration for medium tasks -
TEST_TIMEOUT_LONG- Timeout duration for long tasks -
TEST_RAY_IMAGE(Optional) - Ray image used for raycluster configuration -
MINIO_CLI_IMAGE(Optional) - Minio CLI image used for uploading/downloading data from/into s3 bucket -
TEST_TIER(Optional) - Specifies test tier to run, skipping tests which don't belong to specified test tier. Supported test tiers: Smoke, Sanity, Tier1, Tier2, Tier3, Pre-Upgrade and Post-Upgrade. Test tier can also be provided using test parametertestTier.NOTE:
quay.io/modh/ray:2.47.1-py312-cu128is the default image used for creating a RayCluster resource. If you have your own custom ray image which suits your purposes, specify it inTEST_RAY_IMAGEenvironment variable.
FMS_HF_TUNING_IMAGE- Image tag used in PyTorchJob CR for model training
TEST_NAMESPACE_NAME(Optional) - Existing namespace where will the Training operator GPU tests be executedHF_TOKEN- HuggingFace token used to pull models which has limited accessGPTQ_MODEL_PVC_NAME- Name of PersistenceVolumeClaim containing downloaded GPTQ models
To upload trained model into S3 compatible storage, use the environment variables mentioned below :
AWS_DEFAULT_ENDPOINT- Storage bucket endpoint to upload trained dataset to, if set then test will upload model into s3 bucketAWS_ACCESS_KEY_ID- Storage bucket access keyAWS_SECRET_ACCESS_KEY- Storage bucket secret keyAWS_STORAGE_BUCKET- Storage bucket nameAWS_STORAGE_BUCKET_MODEL_PATH(Optional) - Path in the storage bucket where trained model will be stored to
ODH_NAMESPACE- Namespace where ODH components are installed toNOTEBOOK_USER_NAME- Username of user used for running WorkbenchNOTEBOOK_USER_PASSWORD- Password of user used for running WorkbenchNOTEBOOK_USER_TOKEN- Login token of user used for running WorkbenchNOTEBOOK_IMAGE- Image used for running Workbench
To download MNIST training script datasets from S3 compatible storage, use the environment variables mentioned below :
AWS_DEFAULT_ENDPOINT- Storage bucket endpoint from which to download MNIST datasetsAWS_ACCESS_KEY_ID- Storage bucket access keyAWS_SECRET_ACCESS_KEY- Storage bucket secret keyAWS_STORAGE_BUCKET- Storage bucket nameAWS_STORAGE_BUCKET_MNIST_DIR- Storage bucket directory from which to download MNIST datasets.
Execute tests like standard Go unit tests.
go test -timeout 60m ./tests/kfto/