Using Transformers to Model Symbol Sequences with Memory
.
├── models/
│ ├── attention.py # Attention mechanism implementation
│ ├── mlp.py # MLP layer implementation
│ ├── transformer.py # Main transformer model
│ ├── autoencoder.py # Base autoencoder implementation
│ └── utils.py # Utility functions for model operations
├── training/
│ ├── trainer.py # Training infrastructure
│ ├── loss.py # Loss functions and metrics
│ ├── sae_trainer.py # Sparse autoencoder training infrastructure
│ └── utils.py # Utility functions for model saving/loading
└── examples/
└── cyclic_sequence/ # Example of training on cyclic sequences
├── README.md # Example-specific documentation
├── train_cyclic.py # Training script for transformer
├── train_sae.py # Training script for sparse autoencoder
├── sae_mechanistic_intervention.py # Intervention experiments
└── check_hallucinations.py # Hallucination testing
First change into the SymbolicMemory
directory and follow the instructions below to set up the repository:
-
Install
uv
:curl -LsSf https://astral.sh/uv/install.sh | sh
-
Install Python 3.11 using
uv
:uv python install 3.11
-
Create a virtual environment with Python 3.11 and sync:
uv venv --python 3.11 source .venv/bin/activate
-
Install the package:
- For CPU-only (default, works on all platforms):
uv sync
- For CUDA 12.4 support (Linux only):
uv sync --extra cuda
- For CPU-only (default, works on all platforms):
The project supports both CPU and GPU acceleration:
- CPU Support: Works on all platforms (Linux, macOS, Windows)
- GPU Support: Available on Linux systems with CUDA 12.4
- You can check available devices with:
import jax print(jax.devices()) # Shows available devices (CPU/GPU)
- You can check available devices with:
This example demonstrates training a transformer to predict the next token in a repeating sequence, and then training a sparse autoencoder to analyze its internal representations.
- First, train the transformer model:
python examples/cyclic_sequence/train_cyclic.py
- Then, train the sparse autoencoder on the transformer's activations:
python examples/cyclic_sequence/train_sae.py
- Finally, run mechanistic interventions using the trained models:
python examples/cyclic_sequence/sae_mechanistic_intervention.py
The scripts will:
- Generate cyclic sequence datasets
- Train the transformer model
- Train the sparse autoencoder on transformer activations
- Perform mechanistic interventions to analyze the model's behavior
- Show attention and activation visualizations
- Plot training metrics
-
Transformer Model (
models/transformer.py
):- Implements a simple transformer architecture
- Handles sequence prediction tasks
- Supports mechanistic interventions
-
Sparse Autoencoder (
models/autoencoder.py
):- Implements a sparse autoencoder architecture
- Trains on transformer layer activations
- Supports expansion factors for different inflation ratios
-
Training Infrastructure (
training/
):trainer.py
: Base training infrastructuresae_trainer.py
: Specialized trainer for sparse autoencodersloss.py
: Loss functions for both modelsutils.py
: Model saving/loading utilities
You can modify the following parameters in the training scripts:
-
In
examples/cyclic_sequence/train_cyclic.py
:- Model dimensions
- Number of layers
- Training steps
- Learning rate
-
In
examples/cyclic_sequence/train_sae.py
:- Expansion factor
- Layer to analyze
- Training steps
- Batch size
-
In
examples/cyclic_sequence/sae_mechanistic_intervention.py
:- Intervention strength
- Number of candidate indices
- Sequence generation length