HUSAI: Reliability-First SAE Research

HUSAI studies whether sparse autoencoder (SAE) features are trustworthy under strict release criteria: internal reproducibility, stress robustness, and external benchmark competitiveness.

Key Finding

Trained SAE features are indistinguishable from random baseline (PWMCC = 0.309 vs 0.300 for untrained SAEs). SAEs reconstruct well but learn arbitrary, non-reproducible feature decompositions across seeds. See paper/sae_stability_paper.md for the full paper.

Current Bottom Line

Internal consistency signal: positive.
Stress controls (random_model, transcoder, OOD): passing.
External transfer (SAEBench, CE-Bench): below strict thresholds.
Strict release outcome: pass_all=false.

Use EVIDENCE_STATUS.md before citing exact metrics (local vs remote evidence tiers).

Start Here

See START_HERE.md for the full reading order and orientation.

Quick Validation

conda env create -f environment.yml && conda activate husai
pip install -r requirements-dev.txt
pytest tests -q
make smoke

Core Scripts

Internal baselines and ablations:

scripts/experiments/run_phase4a_reproduction.py
scripts/experiments/run_core_ablations.py
scripts/experiments/run_assignment_consistency_v3.py

Follow-up experiments (Section 4.11 of the paper):

scripts/experiments/run_all_followup_experiments.sh -- run all follow-ups
scripts/experiments/exp_1layer_ground_truth.py -- 1-layer vs 2-layer comparison
scripts/experiments/exp_subspace_stability.py -- subspace vs feature stability
scripts/experiments/exp_effective_rank_predictor.py -- universal stability predictor
scripts/experiments/exp_contrastive_stability.py -- contrastive alignment loss
scripts/experiments/exp_intervention_stability.py -- steering consistency across seeds
scripts/experiments/exp_dictionary_pinning.py -- warm-start with frozen decoder columns
scripts/experiments/exp_pythia70m_stability.py -- scale to Pythia-70M (GPU)

External benchmark program:

scripts/experiments/run_husai_saebench_custom_eval.py
scripts/experiments/run_husai_cebench_custom_eval.py
scripts/experiments/run_architecture_frontier_external.py

Strict gating:

scripts/experiments/select_release_candidate.py
scripts/experiments/run_stress_gated_release_policy.py

License

MIT (LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
archive		archive
configs		configs
docs		docs
examples		examples
figures		figures
notebooks		notebooks
paper		paper
results/analysis		results/analysis
scripts		scripts
src		src
tests		tests
.Rhistory		.Rhistory
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ADVISOR_BRIEF.md		ADVISOR_BRIEF.md
ARCHITECTURE.md		ARCHITECTURE.md
AUDIT.md		AUDIT.md
BUGS.md		BUGS.md
CANONICAL_DOCS.md		CANONICAL_DOCS.md
EVIDENCE_STATUS.md		EVIDENCE_STATUS.md
EXECUTIVE_SUMMARY.md		EXECUTIVE_SUMMARY.md
EXPERIMENT_LOG.md		EXPERIMENT_LOG.md
FINAL_BLOG.md		FINAL_BLOG.md
FINAL_PAPER.md		FINAL_PAPER.md
HIGH_IMPACT_FOLLOWUPS_REPORT.md		HIGH_IMPACT_FOLLOWUPS_REPORT.md
LICENSE		LICENSE
LIT_REVIEW.md		LIT_REVIEW.md
Makefile		Makefile
NOVEL_CONTRIBUTIONS.md		NOVEL_CONTRIBUTIONS.md
QUICK_START.md		QUICK_START.md
README.md		README.md
REPO_NAVIGATION.md		REPO_NAVIGATION.md
RUNBOOK.md		RUNBOOK.md
START_HERE.md		START_HERE.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run_training.sh		run_training.sh
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HUSAI: Reliability-First SAE Research

Key Finding

Current Bottom Line

Start Here

Quick Validation

Core Scripts

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HUSAI: Reliability-First SAE Research

Key Finding

Current Bottom Line

Start Here

Quick Validation

Core Scripts

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages