Integration between Inspect and Weights & Biases, including support for both the Models API for experiment tracking, and Weave for evaluation analysis and transcripts.
Check out this brief demo video for an overview of Inspect WandB
If you prefer to read, you can check out a tutorial on the Inspect WandB docs site
Inspect WandB can be installed with:
pip install inspect-wandbTo install the optional Weave extra:
pip install inspect-wandb[weave]Once Inspect WandB is installed in an environment authenticated with Weights & Biases (either by running wandb login or setting WANDB_API_KEY), the integration will be enabled for future Inspect runs by default. The Inspect logger output will link to the Models dashboard where you can track runs, and also, if you have enabled the weave extra, to the Weave dashboard where you can visualise eval results.
Some configuration options are available, including adjusting wandb config, settings tags, and adjusting Weave trace naming. To dive deeper with Inspect WandB, please see the documentation at https://inspect-wandb.readthedocs.io/en/latest/
The following are some examples of the types of data that can be automatically logged to W&B when Inspect WandB is enabled:
The Models integration allows you to track each Inspect eval or eval-set run as a WandB run. This can be useful for having a shared source-of-truth for which evals have been run, as well as storing exact configurations for faithful reproductions in future.
Inspect evals tracked in W&B Runs table
Reproduction information tracked in a W&B Run, including Inspect metadata
The Weave integration traces Inspect evaluations, allowing you to track and analyse performance of different models on multiple tasks, visualise and compare result sets, and dig into individual transcripts.
Table of Inspect evaluations with score summaries in Weave
Trace tree of an Inspect task, with the main solver transcript selected for a given sample
Comparison of performance on AgentHarm between Claude 4 Sonnet and GPT 4o-mini
Please see our contributing guidelines if you'd like to make contributions to Inspect WandB
We welcome all feedback; the best way to get in touch to discuss the project is the Inspect Community #inspect_wandb Slack Channel
This project was primarily developed by DanielPolatajko, Qi Guo, Matan Shtepel, and supervised by Justin Olive. It was supported through the MARS (Mentorship for Alignment Research Students) program at the Cambridge AI Safety Hub.