A framework for comparing multi-agent PyTorch optimization systems, along with multiple optimization strategy implementations.
These components are collectively defined as PyTorch Inference Kernel Evolution (PIKE).
See the paper preprint here: https://arxiv.org/abs/2511.16964
This is a fork of KernelBench by Anne Ouyang, Simon Guo, and Azalia Mirhoseini. Benchmark additions and modifications are included from KernelBenchFiltered by METR.
This repository contains:
- a refined set of KernelBench benchmarks
- our evaluator setup
- PIKE-B, a multi-agent evolutionary branching strategy for PyTorch optimization
The implementation for PIKE-O can be found in the pike-openevolve repository. PIKE-O is an OpenEvolve-based PyTorch optimization strategy. It makes use of the evaluator in this repository.
Clone this repository, then do the following:
conda create --name kernel-bench python=3.12
conda activate kernel-bench
pip install -r requirements.txt
pip install -e .
# additional data analysis
pip install matplotlib pandas scipy
Save the following API key environment variables to ~/.bashrc:
export OPENAI_API_KEY=<...>
export GEMINI_API_KEY=<...>Running a PIKE implementation involves 3 key components. It is recommended to start the components in the order listed below.
- Eval Worker: Runs evaluator in a container, and allows low-level, filesystem-based communication with the containerized worker
- Eval Server: Exposes HTTP server for sending and receiving eval data, managing the low-level communication with the Eval Worker internally
- PIKE Implementation (PIKE-B/PIKE-O): implements the LLM-based optimization strategy
If you are working on a machine where you have root access, install Docker, along with the NVIDIA Container Toolkit (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
Start the containerized Eval Worker like so, passing in the correct GPU architecture:
python -u sandbox/tools/start_worker_container.py --engine docker --arch <Ampere/Hopper> --max_active_tasks 20Once the Eval Worker is started, start the Eval Server. The Eval Server is an HTTP server that acts as a proxy between the Eval Worker's low-level communication channel and the PIKE implementation eval requests.
python scripts/disk_channel_server.py --port 8000To run PIKE-B directly, first try a dry run (does not require the eval worker):
python scripts/parallel_tree_search.py server_type=google model_name=gemini-2.5-pro num_workers=10 level=3-pike task_start=1 task_end=50 num_samples=10 num_phases=30 max_fix_attempts=5 dry_run=True eval_port=8000 run_dir=<path/to/output-dir>If this works fine, you can switch to dry_run=False. Run this only after the Eval Worker and Eval Server are running.
First, clone the following repository: pike-openevolve
In the pike-openevolve directory:
pip install -e .As with PIKE-B, run the following (from within the pike-openevolve directory) only after the Eval Worker and Eval Server are running:
python examples/kernelbench/run.py --pike_dir <path/to/this-repo> --level 3-pike --task_start 1 --task_end 50 --max_fix_attempts 5 --eval_port 8000 --run_dir <path/to/output-dir>To further tune the PIKE-O system configuration, edit examples/kernelbench/config.yaml
To learn more about using PIKE, see docs/README.md
@misc{nagaitsev2025pike,
title={Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems},
author={Kirill Nagaitsev and Luka Grbcic and Samuel Williams and Costin Iancu},
year={2025},
eprint={2511.16964},
archivePrefix={arXiv},
primaryClass={cs.MA},
url={https://arxiv.org/abs/2511.16964},
}