Interpretable Next-token Prediction via the Generalized Induction Head (Kim*, Mantena* et al. 2024).
While large transformer models excel in predictive performance, their lack of interpretability restricts their usefulness in high-stakes domains. To remedy this, we propose the Generalized Induction-Head Model (GIM), an interpretable model for next-token prediction inspired by the observation of “induction heads” in LLMs. GIM is a retrieval-based module that identifies similar sequences in the input context by combining exact n-gram matching and fuzzy matching based on a neural similarity metric. We evaluate GIM in two settings: language modeling and fMRI response prediction. In language modeling, GIM improves next-token prediction by up to 25%p over interpretable baselines, significantly narrowing the gap with black-box LLMs. In an fMRI setting, GIM improves neural response prediction by 20% and offers insights into the language selectivity of the brain. GIM represents a significant step toward uniting interpretability and performance across domains. The code is available at https://github.com/ejkim47/generalized-induction-head.
- Clone the repo and run
pip install -e .to install thealmpackage locally - Set paths/env variables in
alm/config.pyanddeep_fmri/encoding/config.pyto point to the correct directories to store data - Experiments
- For language modeling experiments, please refer to
experiments_lang/readme.md. - For fMRI experiments, please refer to
experiments_fmri/readme.md.
- For language modeling experiments, please refer to
almis the main packagealm/config.pyis the configuration file that points to where things should be storedalm/datacontains data utilitiesalm/modelscontains source for modelsalm/models/build_infinigram_pyis heavily borrowed from Infini-gram implementation.
datacontains code for preprocessing datadeep_fmriis the package for fmri experiments- heavily borrowed from deep-fMRI-dataset
experiments_langcontains code for language modeling experimentsexperiments_fmricontains code for fmri experiments- heavily borrowed from deep-fMRI-dataset
If you find this work useful, please cite:
@misc{kim2024inductiongram,
title={Interpretable Next-token Prediction via the Generalized Induction Head},
author={Eunji Kim and Sriya Mantena and Weiwei Yang and Chandan Singh and Sungroh Yoon and Jianfeng Gao},
year={2024},
eprint={2411.00066},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.00066},
}