This guide shows how to benchmark image generation APIs using SGLang and AIPerf. You'll learn how to set up the server, create an input file, and run the benchmark. You'll also learn how to view the results, and even extract the generated images!
For the most up-to-date information, please refer to the following resources:
Login to Hugging Face, and accept the terms of use for the following model: FLUX.1-dev.
Export your Hugging Face token as an environment variable:
export HF_TOKEN=<your-huggingface-token>Start the SGLang Docker container:
docker run --gpus all \
--shm-size 32g \
-it \
--rm \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=$HF_TOKEN" \
--ipc=host \
lmsysorg/sglang:devNote
The following steps are to be performed inside the SGLang Docker container.
Install the dependencies:
uv pip install yunchang remote_pdb imageio diffusers diffusion --systemSet the server arguments:
Important
The following arguments will setup the SGLang server to use the FLUX.1-dev model on a single GPU, on port 30000. You can modify these arguments to use a different model, different number of GPUs, different port, etc. See the SGLang Image Generation CLI for more details.
SERVER_ARGS=( --model-path black-forest-labs/FLUX.1-dev --text-encoder-cpu-offload --pin-cpu-memory --num-gpus 1 --port 30000 --host 0.0.0.0 )Start the SGLang server:
sglang serve "${SERVER_ARGS[@]}"Wait until the server is ready (watch the logs for the following message):
Uvicorn running on http://0.0.0.0:30000 (Press CTRL+C to quit)Note
The following steps are to be performed on your local machine. (outside the SGLang Docker container.)
Create an input file:
cat > image_prompts.jsonl << 'EOF'
{"text": "A serene mountain landscape at sunset"}
{"text": "A futuristic city with flying cars"}
{"text": "A cute robot playing with a kitten"}
EOFRun the benchmark:
aiperf profile \
--model black-forest-labs/FLUX.1-dev \
--tokenizer gpt2 \
--url http://localhost:30000 \
--endpoint-type image_generation \
--input-file image_prompts.jsonl \
--custom-dataset-type single_turn \
--extra-inputs size:512x512 \
--extra-inputs quality:standard \
--concurrency 1 \
--request-count 3Done! This sends 3 requests to http://localhost:30000/v1/images/generations
View the results:
NVIDIA AIPerf | Image Generation Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p50 ┃ std ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━┩
│ Request Latency (ms) │ 12,617.58 │ 12,251.41 │ 12,954.04 │ 12,947.91 │ 12,892.69 │ 12,647.29 │ 287.62 │
│ Input Sequence Length (tokens) │ 6.67 │ 6.00 │ 7.00 │ 7.00 │ 7.00 │ 7.00 │ 0.47 │
│ Request Throughput (requests/sec) │ 0.08 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
│ Request Count (requests) │ 3.00 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
└───────────────────────────────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴────────┘
aiperf profile \
--model black-forest-labs/FLUX.1-dev \
--tokenizer gpt2 \
--url http://localhost:30000 \
--endpoint-type image_generation \
--extra-inputs size:512x512 \
--extra-inputs quality:standard \
--synthetic-input-tokens-mean 150 \
--synthetic-input-tokens-stddev 30 \
--concurrency 1 \
--request-count 3Done! This sends 3 requests to http://localhost:30000/v1/images/generations
View the results:
NVIDIA AIPerf | Image Generation Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p50 ┃ std ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━┩
│ Request Latency (ms) │ 12,173.18 │ 11,918.37 │ 12,503.38 │ 12,495.27 │ 12,422.26 │ 12,097.79 │ 244.71 │
│ Input Sequence Length (tokens) │ 137.00 │ 107.00 │ 153.00 │ 152.96 │ 152.60 │ 151.00 │ 21.23 │
│ Request Throughput (requests/sec) │ 0.08 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
│ Request Count (requests) │ 3.00 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
└───────────────────────────────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴────────┘
Create an input file:
cat > image_prompts.jsonl << 'EOF'
{"text": "A serene mountain landscape at sunset"}
{"text": "A futuristic city with flying cars"}
{"text": "A cute robot playing with a kitten"}
EOFRun the benchmark:
Important
Use --export-level raw to get the raw input/output payloads.
aiperf profile \
--model black-forest-labs/FLUX.1-dev \
--tokenizer gpt2 \
--url http://localhost:30000 \
--endpoint-type image_generation \
--input-file image_prompts.jsonl \
--custom-dataset-type single_turn \
--extra-inputs size:512x512 \
--extra-inputs quality:standard \
--concurrency 1 \
--request-count 3 \
--export-level rawExtract the generated images:
Copy the following code into a file called extract_images.py:
#!/usr/bin/env python3
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
"""Extract base64-encoded images from AIPerf JSONL output file."""
import base64
import json
import os
from pathlib import Path
import sys
# Read input file path
input_file = Path(sys.argv[1]) if len(sys.argv) > 1 else Path('artifacts/black-forest-labs_FLUX.1-dev-openai-image_generation-concurrency1/profile_export_raw.jsonl')
output_dir = Path(sys.argv[2]) if len(sys.argv) > 2 else Path('extracted_images')
# Create output directory
os.makedirs(output_dir, exist_ok=True)
# Process each line in the JSONL file
with open(input_file, 'r') as f:
for line_num, line in enumerate(f, 1):
record = json.loads(line)
# Extract images from responses
for response in record.get('responses', []):
response_data = json.loads(response.get('text', '{}'))
for data_idx, item in enumerate(response_data.get('data', [])):
if b64_image := item.get('b64_json'):
# Decode and save image
image_data = base64.b64decode(b64_image)
filename = output_dir / f"image_{line_num:04d}_{data_idx:02d}.jpg"
with open(filename, 'wb') as img_file:
img_file.write(image_data)
print(f"Extracted: {filename.resolve()}")Run the script:
Tip
The script is setup to use the default directory and file names for the input and output files, but can be modified to use different files.
Usage: python extract_images.py <input_file> <output_dir>
python extract_images.pyOutput:
Extracted: /path/to/extracted_images/image_0001_00.jpg
Extracted: /path/to/extracted_images/image_0001_01.jpg
Extracted: /path/to/extracted_images/image_0001_02.jpg
View the generated images:
Prompt:
{"text": "A serene mountain landscape at sunset"}
Prompt:
{"text": "A futuristic city with flying cars"}
Prompt:
{"text": "A cute robot playing with a kitten"}
You've successfully set up SGLang, run your first image generation benchmarks, and learned how to extract and view the generated images. You can now experiment with different models, prompts, and concurrency settings to optimize your image generation workloads.
Now go forth and generate!


