AIPerf v0.6.0 Release
AIPerf - Release 0.6.0
Summary
AIPerf 0.6.0 delivers a substantial update focused on plugin architecture, multimodal and video support, robustness, and developer experience. This release introduces a Complete Plugin Registry System with YAML definitions and schema-driven tooling, expands endpoint support with NIM Embeddings (text + image), vLLM multimodal chat embeddings, and SGLang video generation, and adds an LLM TCO calculator plus improved GPU telemetry and metrics. Critical fixes address dataloader multiprocessing, security, prefix synthesis, and cross-platform behavior (including macOS). Documentation is expanded with metrics guides, plugin tutorials, and LLM benchmarking updates.
Key Highlights
Plugin System Overhaul
AIPerf 0.6.0 introduces a Complete Plugin Registry System (#616) that replaces the previous factory-based approach. Plugins can now be defined via YAML (#613), with schema files, enums, and an auto-generator tool (#614), plus ExtensibleStrEnum for dynamic enum generation (#615). Plugin lookup normalizes dashes and underscores (#635), and the alphabetical ordering constraint was removed for flexibility (#679).
Multimodal & Video Support
- NIM Embeddings: New NIM Embeddings endpoint with multimodal (text + image) support (#560).
- vLLM Multimodal Chat Embeddings: Multimodal chat embeddings endpoint for vLLM (#637).
- SGLang Video Generation: SGLang
/v1/videogeneration with polling support (#658).
Multi-URL, Telemetry & Tooling
- Multiple
--urlEndpoints: Support for multiple--urlendpoints (#606), with a fix so multi-url advances URL correctly beyond the first turn (#618). - Local GPU Telemetry: Support for local GPU telemetry via pynvml (#647); warmup is filtered out from GPU telemetry and counter deltas (#596).
- LLM TCO Calculator: New LLM TCO (Total Cost of Ownership) calculator tool (#559).
- Pre-Flight Tokenizer: Pre-flight tokenizer auto-detection and error display (#640).
- OSL Mismatch Detection: OSL mismatch detection and warning in metrics (#644).
Stability & Security
- Dataloader multiprocessing reworked: parallel processing disabled by default (#583), then billiard-based multiprocessing (#584), and finally stdio redirect at OS level with daemon-flag approach replacing billiard (#711).
- Security: Multiple security fixes (#590), SonarQube medium/high/critical cleanups (#605), and regex backtracking fix (#643).
- macOS: Handling of None stdio streams in child process bootstrap (#708).
Features & Enhancements
Plugin System
- Complete Plugin Registry System: Replaces factory-based plugin loading with a registry (#616).
- YAML Plugin Definitions: Plugins can be defined via YAML (#613).
- Plugin Schema & Tooling: Schema files, enums, and auto-generator tool for plugins (#614).
- ExtensibleStrEnum: Dynamic enum generation for plugin and config use (#615).
- Normalize Names: Dashes and underscores normalized in enum and plugin lookup (#635).
- Plugin Ordering: Removed alphabetical ordering constraint for plugins (#679).
Endpoints & Backends
- NIM Embeddings: NIM Embeddings endpoint with multimodal (text + image) support (#560).
- vLLM Multimodal Chat Embeddings: Multimodal chat embeddings endpoint for vLLM (#637).
- SGLang Video: SGLang
/v1/videogeneration with polling support (#658). - Multiple URLs: Support for multiple
--urlendpoints (#606).
Telemetry & Metrics
- Local GPU Telemetry: Local GPU telemetry via pynvml (#647).
- Warmup Filtering: Filter warmup from GPU telemetry and counter deltas (#596).
- OSL Mismatch: OSL mismatch detection and warning in metrics (#644).
Tooling & UX
- LLM TCO Calculator: New LLM TCO calculator tool (#559).
- Pre-Flight Tokenizer: Pre-flight tokenizer auto-detection and error display (#640).
- TTY Auto-Detection: Auto-detect TTY for UI type and log formatting (#650).
- Markdown-Accuracy-Auditor: Claude Code agent for markdown accuracy auditing (#641).
Prefix & Synthesis
- Num Prefix Groups: Num prefix groups in prefix analyzer (#627).
Network & Configuration
- IP Version & aiohttp: Ability to specify IP version and
trust_envfor aiohttp (#625). - Metrics Collector: Use
create_tcp_connectorfor metrics collector HTTP sessions (#652).
Refactoring & Infrastructure
- Shared Generator Infrastructure: Extract shared generator infrastructure in tools (#621).
- Remove mkinit: Remove mkinit and all generated init files (#646).
- Version Bump: 0.6.0 version bump and canonical
aiperf.__version__(#665).
Bug Fixes
Dataloader & Multiprocessing
- Disable Parallel Processing: Disable parallel processing for dataloader to avoid instability (#583).
- Billiard for Multiprocessing: Use billiard for multiprocessing in dataloader (#584).
- Stdio Redirect & Daemon Flag: Redirect stdio at OS level and replace billiard with daemon-flag approach (#711).
Scheduling & Benchmarks
- Fixed Schedule Completion: Fix errors with benchmark completion in fixed schedule (#585).
Security
- Various Security Issues: Fix various security issues (#590).
- SonarQube: Clean up SonarQube medium, high, and critical security findings (#605).
- Regex Backtracking: Fix regex backtracking issue (#643).
Dashboard & Binding
- Dashboard Host Binding: Expose the host to adjust the dashboard service binding address (#572).
Logging & Debug
- Debug Logging: Fix issues with debug logging (#594).
Prefix & Synthesis
- prefix-root-mult: Fix prefix-root-mult feature to behave correctly (#593).
- max_isl / max_osl: Fix max_isl and max_osl for synthesis (#592).
- Prefix Synthesizer Parity: Prefix synthesizer partial feature parity with Dynamo (#636).
- Prefix Synthesis text_input: Fix prefix synthesis error with
text_inputtrace (#673). - Prefix Mult Fractional: Fix prefix mult with fractional multiplier (#692).
Metrics & Messaging
- ISL Token Count & ZMQ: Correct ISL token count and fix ZMQ message size (#597).
Tests & CI
- Curl Not Found: Fix test failure due to curl not found (#600).
Multi-URL
- URL Advancement: Multi-url only advances URL on first turn — fixed (#618).
Dependencies & Build
- Numpy Versioning: Update numpy versioning (#634).
- Dockerfile: Upgrade Dockerfile base image and remove setuptools (#651).
- Pillow: Bump Pillow dependency to 12.1.1 (#713).
Plugins & Serialization
- Pickle Enum Fix: Cherry-pick pickle enum fix into release/0.6.0 (#672).
Platform (macOS)
- None stdio Streams: Handle None stdio streams in macOS child process bootstrap (#708).
Documentation
- Failing Examples: Fix failing examples in docs (#581).
- Metrics Documentation: Add comprehensive metrics documentation (#321).
- UI Types Tutorial: Add short tutorial describing different UI types (#611).
- Plugin Documentation: Add comprehensive Plugin documentation (#617).
- Sample Outputs: Add sample outputs to tutorial documentation (#622).
- Chrome/Chromium Note: Add Chrome/Chromium requirement note for plot CLI commands (#649).
- Custom Dataset: Add docs for custom dataset (#642).
- Audio Models: Add docs for audio models (#573).
- LLM Benchmarking Guide: Update comprehensive LLM benchmarking guide for v0.5.0 (#626).
- Architecture Diagram: Simplify architecture diagram (#674).
- Plugin Tutorial: Improve plugin tutorial accuracy and add runnable examples (#681).
Testing
- Pytest Skips: Make pytest skip integration and component integration tests appropriately (#582).
- Mooncake Trace: Add mooncake trace integration test (#603).
New Contributors
A warm welcome to the new contributors who helped make this release possible:
- @liguodongiot made their first contribution in #572
- @oliverholworthy made their first contribution in #560
- @vinhngx made their first contribution in #559
- @brandonpelfrey made their first contribution in #606
- @PeaBrane made their first contribution in #627
- @alperkokmen made their first contribution in #625
- @jzakrzew made their first contribution in #637
- @piotrm-nvidia made their first contribution in #658
Thank you for contributing to AIPerf!
Full Changelog
Full Changelog: v0.5.0...v0.6.0