Skip to content

automl/NeuralAttentionSearchLinear

Repository files navigation

Neural Attention Search Linear

This repository introduces a framework that searches for the optimal token-level hybrid attention models. Currently, we allow searching for the optimal Gated DeltaNet-Softmax Transformer models.

usage: create a new conda environment with:

conda create -n NAtSL python=3.11
conda activate NAtSL
pip install -e .

We use the flame package to train the model:

cd experiments
git clone https://github.com/fla-org/flame.git && cd flame
pip install -e flame/
cd ..

Then train the model:

cd experiments
sbatch slurm_train_model.sh

We could then evaluate the pre-trained model with experiments/harness.py

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages