Skip to content

Commit d45d7ea

Browse files
Merge branch 'main' into dependabot/pip/container-images/gpu/diffusers-flux/src/transformers-5.0.0rc3
2 parents 444ff2f + 73db29f commit d45d7ea

170 files changed

Lines changed: 4942 additions & 272 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.devcontainer/Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
FROM hashicorp/terraform:1.5.7 AS terraform
16-
FROM koalaman/shellcheck:v0.10.0 AS shellcheck
17-
FROM mvdan/shfmt:v3.10.0 AS shfmt
15+
FROM hashicorp/terraform:1.14.8 AS terraform
16+
FROM koalaman/shellcheck:v0.11.0 AS shellcheck
17+
FROM mvdan/shfmt:v3.13.1 AS shfmt
1818

1919
FROM python:3.13-bookworm AS python-builder
2020

.devcontainer/devcontainer.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"$schema": "https://raw.githubusercontent.com/devcontainers/spec/main/schemas/devContainer.schema.json",
3-
"name": "Cloud Solutions devcontainer",
3+
"name": "Accelerated Platforms devcontainer",
44
"build": {
55
"dockerfile": "Dockerfile"
66
},
@@ -13,7 +13,9 @@
1313
"editor.wordWrap": "off",
1414
"files.insertFinalNewline": true,
1515
"files.trimFinalNewlines": true,
16+
"geminicodeassist.displayInlineContextHint": false,
1617
"prettier.resolveGlobalModules": true,
18+
"python.defaultInterpreterPath": "/venv/bin/python",
1719
"redhat.telemetry.enabled": false,
1820
"telemetry.telemetryLevel": "off",
1921
"[css]": {
@@ -78,6 +80,7 @@
7880
"ms-azuretools.vscode-containers",
7981
"ms-python.black-formatter",
8082
"ms-python.isort",
83+
"ms-python.python",
8184
"streetsidesoftware.code-spell-checker",
8285
"timonwong.shellcheck"
8386
]

.github/workflows/dictionary/python.txt

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,16 @@ aiohttp
33
aqtp
44
asctime
55
asgi
6+
asynccontextmanager
67
asyncio
8+
certifi
9+
cffi
710
classmethod
811
configparser
12+
contextlib
913
coveragerc
14+
dataclass
15+
dataclasses
1016
dataframe
1117
dbapi
1218
dbcommands
@@ -17,6 +23,7 @@ fastapi
1723
fillna
1824
fromarray
1925
frombuffer
26+
fromisoformat
2027
fsspec
2128
ftfy
2229
functools
@@ -29,11 +36,13 @@ getframerate
2936
getnchannels
3037
getnframes
3138
getsampwidth
39+
grpcio
3240
gunicorn
3341
hasattr
3442
hashlib
3543
hexdigest
3644
httpx
45+
idna
3746
iloc
3847
imgf
3948
inplace
@@ -59,7 +68,10 @@ pgvector
5968
pipreqs
6069
pmap
6170
prng
71+
protos
72+
pyasn
6273
pycache
74+
pycparser
6375
pydantic
6476
pyenv
6577
pylint
@@ -69,8 +81,10 @@ pythondontwritebytecode
6981
pythonpath
7082
pythonunbuffered
7183
qualname
84+
quantiles
7285
readframes
7386
removesuffix
87+
reqs
7488
rerank
7589
reranked
7690
retryable
@@ -83,13 +97,16 @@ shutil
8397
spacy
8498
splitlines
8599
sqlalchemy
100+
strftime
86101
tensorboard
87102
tensorboardx
88103
thejsonlogger
89104
tqdm
90105
unittests
91106
urllib
107+
urlopen
92108
urlretrieve
93109
uvicorn
94110
venv
95111
writerow
112+
writestr
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
epath
2+
etils
3+
grpo
4+
highmem
5+
logdir
6+
logps
7+
maxtext
8+
multiproc
9+
returncode
10+
sigabrt
11+
strftime
12+
tunix
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
lmsysorg
2+
musa
3+
nvls
4+
sglang

.github/workflows/dictionary/shell.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ nslookup
1616
pipefail
1717
pkill
1818
shuf
19+
subshell
1920
syscall
2021
xtrace
2122
zxvf

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,10 @@ terraform.tfstate*
3939
# Test
4040
test/log/*.log
4141
test/scripts/environment_files/*
42+
43+
# Generated outputs
44+
*.log
45+
k6-*.txt
46+
k6-*.csv
47+
k6-*.jsonl
48+
k6-report.md

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,16 @@ the primary runtime.
6262
- [Benchmarking Online inference performance on Google Kubernetes Engine (GKE)](/docs/platforms/gke/base/use-cases/inference-ref-arch/inference-perf-bench/inf-perf-benchmarking-with-hf-model.md)
6363

6464
- [Training reference architecture](/docs/platforms/gke/base/use-cases/training-ref-arch/README.md)
65+
6566
- [Model fine tuning](/docs/platforms/gke/base/use-cases/training-ref-arch/model-fine-tuning/README.md)
6667
- [Data processing](/docs/platforms/gke/base/use-cases/training-ref-arch/model-fine-tuning/data-processing.md)
6768
- [Data preparation](/docs/platforms/gke/base/use-cases/training-ref-arch/model-fine-tuning/data-preparation.md)
6869
- [Fine tuning](/docs/platforms/gke/base/use-cases/training-ref-arch/model-fine-tuning/fine-tuning.md)
6970
- [Model evaluation](/docs/platforms/gke/base/use-cases/training-ref-arch/model-fine-tuning/model-evaluation.md)
7071

72+
- [Reinforcement Learning reference architecture](/docs/platforms/gke/base/use-cases/reinforcement-learning/README.md)
73+
- [RL on TPU](/docs/platforms/gke/base/use-cases/reinforcement-learning/single-host-tpu-grpo/README.md)
74+
7175
### Guides
7276

7377
- [LLM Inference Optimization: Achieving faster Pod Startup with Google Cloud Storage](/use-cases/inferencing/cost-optimization/gcsfuse/AchievingFasterPodStartup.md)
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Copyright 2026 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
FROM grafana/k6:1.7.1
16+
17+
USER root
18+
19+
WORKDIR /app
20+
# Create the /output directory and ensure k6 owns it, along with /app
21+
RUN mkdir -p /output && chown -R k6:k6 /app /output
22+
23+
COPY --chown=k6:k6 scripts /app/scripts
24+
COPY --chmod=a+x --chown=k6:k6 entrypoint.sh /app/entrypoint.sh
25+
26+
# Switch back to the unprivileged k6 user
27+
USER k6
28+
29+
ENTRYPOINT ["/app/entrypoint.sh"]
30+
31+
CMD ["--help"]
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# k6 Benchmark Image
2+
3+
This container image packages [k6](https://k6.io/) load testing tool with
4+
specific scripts to benchmark Machine Learning inference workloads.
5+
6+
It is designed to run in environments like Google Kubernetes Engine (GKE) to
7+
generate consistent, reproducible load against target endpoints and output
8+
granular metrics to a JSONL file for further analysis. It also includes a Python
9+
script (`extract_metrics.py`) that can be run manually to process the k6 output
10+
and generate a price/performance report.
11+
12+
## Usage
13+
14+
You can run this container image via Docker or deploy it as a Job in a
15+
Kubernetes cluster.
16+
17+
### Environment Variables
18+
19+
The container accepts the following optional environment variables for metric
20+
output naming and processing:
21+
22+
- `ACCELERATOR_NAME`: A string representing the target hardware (e.g., `l4`,
23+
`a100`, `v5p`). If not provided, it defaults to `accelerator-not-set`.
24+
- `NODE_HOURLY_COST`: The hourly cost of the underlying node in USD. Used by the
25+
automatic metric extraction script to compute cost per 1k images. Defaults to
26+
`0.0`.
27+
28+
The default benchmark script (`k6-diffusers-flux-2-klein-4b.js`) expects the
29+
following environment variables:
30+
31+
- `TARGET_URL`: The full URL of the inference endpoint to test (e.g.,
32+
`http://model-service:8000/generate`).
33+
- `BATCH_SIZE`: The batch size to request in the payload (default: `1`).
34+
- `VUS`: The number of concurrent Virtual Users to simulate (default: `1`).
35+
36+
### Running via Docker
37+
38+
Set the k6 script to run by setting the `CMD` to point to the script path when
39+
starting the container:
40+
41+
```bash
42+
# Example: running a different script mounted into the container
43+
docker run --rm \
44+
-e ACCELERATOR_NAME="custom" \
45+
-v $(pwd)/custom-script.js:/app/custom-script.js \
46+
-v $(pwd)/output:/output \
47+
k6-benchmark:latest /app/your-k6-script.js
48+
```
49+
50+
The k6 output will be saved in the mapped `/output` directory on your host. The
51+
filename will be dynamically generated in the format:
52+
`<name-of-k6-script>-<ACCELERATOR_NAME>-<experiment-start-timestamp>.jsonl`. For
53+
For example: `k6-diffusers-flux-2-klein-4b-l4-20260417T120000Z.jsonl`.
54+
55+
#### Supported Benchmarks
56+
57+
The following benchmark scripts are included:
58+
59+
- **`/app/k6-diffusers-flux-2-klein-4b.js`**: Benchmark the FLUX.2-klein-4B
60+
image generation model.
61+
62+
## Metrics Extraction
63+
64+
The extraction script (`extract_metrics.py`) can be run manually after the
65+
benchmark finishes to generate a price/performance report.
66+
67+
The extraction script calculates throughput (Images/sec) and latencies (p50,
68+
p95, p99) strictly from the `benchmark` scenario, and automatically fetches
69+
corresponding on-node telemetry (Peak VRAM, Avg GPU Utilization) from Google
70+
Cloud Monitoring if the dependencies are installed and it is running on Google
71+
Cloud.
72+
73+
To ensure accurate hardware metrics when multiple deployments are running in the
74+
same project, the script can filter by pod, namespace, or node. If the `--pod`
75+
argument is omitted, the script automatically uses the `deployment_name`
76+
(extracted from the `TARGET_URL` hostname) as a prefix to filter for relevant
77+
pods.
78+
79+
### Script Arguments
80+
81+
- `--file`: Path to the k6 `.jsonl` output file (Required).
82+
- `--output-csv`: Path to the output CSV file where aggregated results are
83+
stored (Optional, default: `k6-benchmark.csv`).
84+
- `--hourly-cost`: The hourly cost of the underlying GKE node in USD. If set to
85+
`0.0`, a warning is emitted and cost metrics will be `0.0` (Optional, default:
86+
`0.0`).
87+
- `--project-id`: Google Cloud Project ID to query DCGM metrics via Cloud
88+
Monitoring. If omitted, the script dynamically fetches the project ID from the
89+
Google Cloud Metadata server (Optional).
90+
- `--pod`: Filter metrics by a specific pod name. If omitted, the script
91+
automatically uses the `deployment_name` (derived from the `TARGET_URL`
92+
hostname) as a prefix filter to match all relevant pods in the deployment
93+
(Optional).
94+
- `--namespace`: Filter metrics by a specific namespace (Optional).
95+
- `--node`: Filter metrics by a specific node name (Optional).
96+
- `--vram-metric`: The Prometheus metric string for VRAM usage (Default:
97+
`prometheus.googleapis.com/DCGM_FI_DEV_FB_USED/gauge`).
98+
- `--util-metric`: The Prometheus metric string for GPU utilization (Default:
99+
`prometheus.googleapis.com/DCGM_FI_DEV_GPU_UTIL/gauge`).

0 commit comments

Comments
 (0)