Skip to content

LauraGomezjurado/adaptive-rope

Repository files navigation

Head-wise Adaptive RoPE

A proof of concept implementation for head-wise adaptive RoPE (Rotary Position Embedding) that allows each attention head to learn its own frequency and phase scaling for improved long-context recall and copy-style in-context learning.

Overview

This project implements and evaluates head-wise adaptive RoPE variants where each attention head can learn:

  • Per-head frequency scaling (how fast the rotation changes with position)
  • Per-head phase scaling (initial phase offset)

Key Features

  • Baseline RoPE implementation
  • Head-wise adaptive RoPE with learnable scales and phases
  • Synthetic copy and associative recall tasks
  • Mechanistic analysis tools (attention patching, head scale analysis)
  • Evaluation on context lengths from 2k to 16k

Installation

pip install -r requirements.txt

Usage

python train.py --model_size 4 --context_length 2048 --test_length 16384

Project Structure

  • rope.py - RoPE implementations (baseline and adaptive)
  • model.py - Transformer model with adaptive RoPE
  • tasks.py - Synthetic copy and associative recall tasks
  • train.py - Training and evaluation script
  • analysis.py - Head analysis and attention patching tools
  • experiments.py - Experiment configurations and comparisons

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages