Skip to content

AIPerf v0.6.0 Release

Choose a tag to compare

@saturley-hall saturley-hall released this 10 Mar 19:36
0746a7c

AIPerf - Release 0.6.0

Summary

AIPerf 0.6.0 delivers a substantial update focused on plugin architecture, multimodal and video support, robustness, and developer experience. This release introduces a Complete Plugin Registry System with YAML definitions and schema-driven tooling, expands endpoint support with NIM Embeddings (text + image), vLLM multimodal chat embeddings, and SGLang video generation, and adds an LLM TCO calculator plus improved GPU telemetry and metrics. Critical fixes address dataloader multiprocessing, security, prefix synthesis, and cross-platform behavior (including macOS). Documentation is expanded with metrics guides, plugin tutorials, and LLM benchmarking updates.


Key Highlights

Plugin System Overhaul

AIPerf 0.6.0 introduces a Complete Plugin Registry System (#616) that replaces the previous factory-based approach. Plugins can now be defined via YAML (#613), with schema files, enums, and an auto-generator tool (#614), plus ExtensibleStrEnum for dynamic enum generation (#615). Plugin lookup normalizes dashes and underscores (#635), and the alphabetical ordering constraint was removed for flexibility (#679).

Multimodal & Video Support

  • NIM Embeddings: New NIM Embeddings endpoint with multimodal (text + image) support (#560).
  • vLLM Multimodal Chat Embeddings: Multimodal chat embeddings endpoint for vLLM (#637).
  • SGLang Video Generation: SGLang /v1/video generation with polling support (#658).

Multi-URL, Telemetry & Tooling

  • Multiple --url Endpoints: Support for multiple --url endpoints (#606), with a fix so multi-url advances URL correctly beyond the first turn (#618).
  • Local GPU Telemetry: Support for local GPU telemetry via pynvml (#647); warmup is filtered out from GPU telemetry and counter deltas (#596).
  • LLM TCO Calculator: New LLM TCO (Total Cost of Ownership) calculator tool (#559).
  • Pre-Flight Tokenizer: Pre-flight tokenizer auto-detection and error display (#640).
  • OSL Mismatch Detection: OSL mismatch detection and warning in metrics (#644).

Stability & Security

  • Dataloader multiprocessing reworked: parallel processing disabled by default (#583), then billiard-based multiprocessing (#584), and finally stdio redirect at OS level with daemon-flag approach replacing billiard (#711).
  • Security: Multiple security fixes (#590), SonarQube medium/high/critical cleanups (#605), and regex backtracking fix (#643).
  • macOS: Handling of None stdio streams in child process bootstrap (#708).

Features & Enhancements

Plugin System

  • Complete Plugin Registry System: Replaces factory-based plugin loading with a registry (#616).
  • YAML Plugin Definitions: Plugins can be defined via YAML (#613).
  • Plugin Schema & Tooling: Schema files, enums, and auto-generator tool for plugins (#614).
  • ExtensibleStrEnum: Dynamic enum generation for plugin and config use (#615).
  • Normalize Names: Dashes and underscores normalized in enum and plugin lookup (#635).
  • Plugin Ordering: Removed alphabetical ordering constraint for plugins (#679).

Endpoints & Backends

  • NIM Embeddings: NIM Embeddings endpoint with multimodal (text + image) support (#560).
  • vLLM Multimodal Chat Embeddings: Multimodal chat embeddings endpoint for vLLM (#637).
  • SGLang Video: SGLang /v1/video generation with polling support (#658).
  • Multiple URLs: Support for multiple --url endpoints (#606).

Telemetry & Metrics

  • Local GPU Telemetry: Local GPU telemetry via pynvml (#647).
  • Warmup Filtering: Filter warmup from GPU telemetry and counter deltas (#596).
  • OSL Mismatch: OSL mismatch detection and warning in metrics (#644).

Tooling & UX

  • LLM TCO Calculator: New LLM TCO calculator tool (#559).
  • Pre-Flight Tokenizer: Pre-flight tokenizer auto-detection and error display (#640).
  • TTY Auto-Detection: Auto-detect TTY for UI type and log formatting (#650).
  • Markdown-Accuracy-Auditor: Claude Code agent for markdown accuracy auditing (#641).

Prefix & Synthesis

  • Num Prefix Groups: Num prefix groups in prefix analyzer (#627).

Network & Configuration

  • IP Version & aiohttp: Ability to specify IP version and trust_env for aiohttp (#625).
  • Metrics Collector: Use create_tcp_connector for metrics collector HTTP sessions (#652).

Refactoring & Infrastructure

  • Shared Generator Infrastructure: Extract shared generator infrastructure in tools (#621).
  • Remove mkinit: Remove mkinit and all generated init files (#646).
  • Version Bump: 0.6.0 version bump and canonical aiperf.__version__ (#665).

Bug Fixes

Dataloader & Multiprocessing

  • Disable Parallel Processing: Disable parallel processing for dataloader to avoid instability (#583).
  • Billiard for Multiprocessing: Use billiard for multiprocessing in dataloader (#584).
  • Stdio Redirect & Daemon Flag: Redirect stdio at OS level and replace billiard with daemon-flag approach (#711).

Scheduling & Benchmarks

  • Fixed Schedule Completion: Fix errors with benchmark completion in fixed schedule (#585).

Security

  • Various Security Issues: Fix various security issues (#590).
  • SonarQube: Clean up SonarQube medium, high, and critical security findings (#605).
  • Regex Backtracking: Fix regex backtracking issue (#643).

Dashboard & Binding

  • Dashboard Host Binding: Expose the host to adjust the dashboard service binding address (#572).

Logging & Debug

  • Debug Logging: Fix issues with debug logging (#594).

Prefix & Synthesis

  • prefix-root-mult: Fix prefix-root-mult feature to behave correctly (#593).
  • max_isl / max_osl: Fix max_isl and max_osl for synthesis (#592).
  • Prefix Synthesizer Parity: Prefix synthesizer partial feature parity with Dynamo (#636).
  • Prefix Synthesis text_input: Fix prefix synthesis error with text_input trace (#673).
  • Prefix Mult Fractional: Fix prefix mult with fractional multiplier (#692).

Metrics & Messaging

  • ISL Token Count & ZMQ: Correct ISL token count and fix ZMQ message size (#597).

Tests & CI

  • Curl Not Found: Fix test failure due to curl not found (#600).

Multi-URL

  • URL Advancement: Multi-url only advances URL on first turn — fixed (#618).

Dependencies & Build

  • Numpy Versioning: Update numpy versioning (#634).
  • Dockerfile: Upgrade Dockerfile base image and remove setuptools (#651).
  • Pillow: Bump Pillow dependency to 12.1.1 (#713).

Plugins & Serialization

  • Pickle Enum Fix: Cherry-pick pickle enum fix into release/0.6.0 (#672).

Platform (macOS)

  • None stdio Streams: Handle None stdio streams in macOS child process bootstrap (#708).

Documentation

  • Failing Examples: Fix failing examples in docs (#581).
  • Metrics Documentation: Add comprehensive metrics documentation (#321).
  • UI Types Tutorial: Add short tutorial describing different UI types (#611).
  • Plugin Documentation: Add comprehensive Plugin documentation (#617).
  • Sample Outputs: Add sample outputs to tutorial documentation (#622).
  • Chrome/Chromium Note: Add Chrome/Chromium requirement note for plot CLI commands (#649).
  • Custom Dataset: Add docs for custom dataset (#642).
  • Audio Models: Add docs for audio models (#573).
  • LLM Benchmarking Guide: Update comprehensive LLM benchmarking guide for v0.5.0 (#626).
  • Architecture Diagram: Simplify architecture diagram (#674).
  • Plugin Tutorial: Improve plugin tutorial accuracy and add runnable examples (#681).

Testing

  • Pytest Skips: Make pytest skip integration and component integration tests appropriately (#582).
  • Mooncake Trace: Add mooncake trace integration test (#603).

New Contributors

A warm welcome to the new contributors who helped make this release possible:

Thank you for contributing to AIPerf!


Full Changelog

Full Changelog: v0.5.0...v0.6.0