|
| 1 | +# k6 Benchmark Image |
| 2 | + |
| 3 | +This container image packages [k6](https://k6.io/) load testing tool with |
| 4 | +specific scripts to benchmark Machine Learning inference workloads. |
| 5 | + |
| 6 | +It is designed to run in environments like Google Kubernetes Engine (GKE) to |
| 7 | +generate consistent, reproducible load against target endpoints and output |
| 8 | +granular metrics to a JSONL file for further analysis. It also includes a Python |
| 9 | +script (`extract_metrics.py`) that can be run manually to process the k6 output |
| 10 | +and generate a price/performance report. |
| 11 | + |
| 12 | +## Usage |
| 13 | + |
| 14 | +You can run this container image via Docker or deploy it as a Job in a |
| 15 | +Kubernetes cluster. |
| 16 | + |
| 17 | +### Environment Variables |
| 18 | + |
| 19 | +The container accepts the following optional environment variables for metric |
| 20 | +output naming and processing: |
| 21 | + |
| 22 | +- `ACCELERATOR_NAME`: A string representing the target hardware (e.g., `l4`, |
| 23 | + `a100`, `v5p`). If not provided, it defaults to `accelerator-not-set`. |
| 24 | +- `NODE_HOURLY_COST`: The hourly cost of the underlying node in USD. Used by the |
| 25 | + automatic metric extraction script to compute cost per 1k images. Defaults to |
| 26 | + `0.0`. |
| 27 | + |
| 28 | +The default benchmark script (`k6-diffusers-flux-2-klein-4b.js`) expects the |
| 29 | +following environment variables: |
| 30 | + |
| 31 | +- `TARGET_URL`: The full URL of the inference endpoint to test (e.g., |
| 32 | + `http://model-service:8000/generate`). |
| 33 | +- `BATCH_SIZE`: The batch size to request in the payload (default: `1`). |
| 34 | +- `VUS`: The number of concurrent Virtual Users to simulate (default: `1`). |
| 35 | + |
| 36 | +### Running via Docker |
| 37 | + |
| 38 | +Set the k6 script to run by setting the `CMD` to point to the script path when |
| 39 | +starting the container: |
| 40 | + |
| 41 | +```bash |
| 42 | +# Example: running a different script mounted into the container |
| 43 | +docker run --rm \ |
| 44 | + -e ACCELERATOR_NAME="custom" \ |
| 45 | + -v $(pwd)/custom-script.js:/app/custom-script.js \ |
| 46 | + -v $(pwd)/output:/output \ |
| 47 | + k6-benchmark:latest /app/your-k6-script.js |
| 48 | +``` |
| 49 | + |
| 50 | +The k6 output will be saved in the mapped `/output` directory on your host. The |
| 51 | +filename will be dynamically generated in the format: |
| 52 | +`<name-of-k6-script>-<ACCELERATOR_NAME>-<experiment-start-timestamp>.jsonl`. For |
| 53 | +For example: `k6-diffusers-flux-2-klein-4b-l4-20260417T120000Z.jsonl`. |
| 54 | + |
| 55 | +#### Supported Benchmarks |
| 56 | + |
| 57 | +The following benchmark scripts are included: |
| 58 | + |
| 59 | +- **`/app/k6-diffusers-flux-2-klein-4b.js`**: Benchmark the FLUX.2-klein-4B |
| 60 | + image generation model. |
| 61 | + |
| 62 | +## Metrics Extraction |
| 63 | + |
| 64 | +The extraction script (`extract_metrics.py`) can be run manually after the |
| 65 | +benchmark finishes to generate a price/performance report. |
| 66 | + |
| 67 | +The extraction script calculates throughput (Images/sec) and latencies (p50, |
| 68 | +p95, p99) strictly from the `benchmark` scenario, and automatically fetches |
| 69 | +corresponding on-node telemetry (Peak VRAM, Avg GPU Utilization) from Google |
| 70 | +Cloud Monitoring if the dependencies are installed and it is running on Google |
| 71 | +Cloud. |
| 72 | + |
| 73 | +To ensure accurate hardware metrics when multiple deployments are running in the |
| 74 | +same project, the script can filter by pod, namespace, or node. If the `--pod` |
| 75 | +argument is omitted, the script automatically uses the `deployment_name` |
| 76 | +(extracted from the `TARGET_URL` hostname) as a prefix to filter for relevant |
| 77 | +pods. |
| 78 | + |
| 79 | +### Script Arguments |
| 80 | + |
| 81 | +- `--file`: Path to the k6 `.jsonl` output file (Required). |
| 82 | +- `--output-csv`: Path to the output CSV file where aggregated results are |
| 83 | + stored (Optional, default: `k6-benchmark.csv`). |
| 84 | +- `--hourly-cost`: The hourly cost of the underlying GKE node in USD. If set to |
| 85 | + `0.0`, a warning is emitted and cost metrics will be `0.0` (Optional, default: |
| 86 | + `0.0`). |
| 87 | +- `--project-id`: Google Cloud Project ID to query DCGM metrics via Cloud |
| 88 | + Monitoring. If omitted, the script dynamically fetches the project ID from the |
| 89 | + Google Cloud Metadata server (Optional). |
| 90 | +- `--pod`: Filter metrics by a specific pod name. If omitted, the script |
| 91 | + automatically uses the `deployment_name` (derived from the `TARGET_URL` |
| 92 | + hostname) as a prefix filter to match all relevant pods in the deployment |
| 93 | + (Optional). |
| 94 | +- `--namespace`: Filter metrics by a specific namespace (Optional). |
| 95 | +- `--node`: Filter metrics by a specific node name (Optional). |
| 96 | +- `--vram-metric`: The Prometheus metric string for VRAM usage (Default: |
| 97 | + `prometheus.googleapis.com/DCGM_FI_DEV_FB_USED/gauge`). |
| 98 | +- `--util-metric`: The Prometheus metric string for GPU utilization (Default: |
| 99 | + `prometheus.googleapis.com/DCGM_FI_DEV_GPU_UTIL/gauge`). |
0 commit comments