HUSAI studies whether sparse autoencoder (SAE) features are trustworthy under strict release criteria: internal reproducibility, stress robustness, and external benchmark competitiveness.
Trained SAE features are indistinguishable from random baseline (PWMCC = 0.309 vs 0.300 for untrained SAEs). SAEs reconstruct well but learn arbitrary, non-reproducible feature decompositions across seeds. See paper/sae_stability_paper.md for the full paper.
- Internal consistency signal: positive.
- Stress controls (
random_model,transcoder,OOD): passing. - External transfer (
SAEBench,CE-Bench): below strict thresholds. - Strict release outcome:
pass_all=false.
Use EVIDENCE_STATUS.md before citing exact metrics (local vs remote evidence tiers).
See START_HERE.md for the full reading order and orientation.
conda env create -f environment.yml && conda activate husai
pip install -r requirements-dev.txt
pytest tests -q
make smokeInternal baselines and ablations:
scripts/experiments/run_phase4a_reproduction.pyscripts/experiments/run_core_ablations.pyscripts/experiments/run_assignment_consistency_v3.py
Follow-up experiments (Section 4.11 of the paper):
scripts/experiments/run_all_followup_experiments.sh-- run all follow-upsscripts/experiments/exp_1layer_ground_truth.py-- 1-layer vs 2-layer comparisonscripts/experiments/exp_subspace_stability.py-- subspace vs feature stabilityscripts/experiments/exp_effective_rank_predictor.py-- universal stability predictorscripts/experiments/exp_contrastive_stability.py-- contrastive alignment lossscripts/experiments/exp_intervention_stability.py-- steering consistency across seedsscripts/experiments/exp_dictionary_pinning.py-- warm-start with frozen decoder columnsscripts/experiments/exp_pythia70m_stability.py-- scale to Pythia-70M (GPU)
External benchmark program:
scripts/experiments/run_husai_saebench_custom_eval.pyscripts/experiments/run_husai_cebench_custom_eval.pyscripts/experiments/run_architecture_frontier_external.py
Strict gating:
scripts/experiments/select_release_candidate.pyscripts/experiments/run_stress_gated_release_policy.py
MIT (LICENSE).