Author: gadwant
This repository contains the implementation for "Subspace Collisions in Knowledge Editing: Orthogonal Low-Rank Updates for Scalable, Stable Model Edits".
This repository contains the implementation for the paper "Subspace Collisions in Knowledge Editing: Orthogonal Low-Rank Updates for Scalable, Stable Model Edits"
This research addresses the instability of existing low-rank knowledge editing methods (like ROME and MEMIT) when scaled to hundreds or thousands of edits. We identify "subspace collisions"—overlapping update directions in the model's representation space—as a primary cause of this instability.
The code provided here implements Orthogonal Low-Rank Editing, a novel approach that:
- Enforces Orthogonality: Ensures new knowledge updates are geometrically separated from existing ones.
- Preserves Stability: Maintains low condition numbers and high effective rank even as the number of edits scales.
- Scales Effectively: Demonstrates robust performance from 1 to 50+ edits where naive methods fail.
pip install -r requirements.txtcode/
├── utils/
│ ├── orthogonal_editing.py # Core orthogonal editing implementation
│ └── evaluation.py # Evaluation metrics
├── scripts/
│ └── run_experiments.py # Main experiment script
├── data/ # Dataset storage
├── models/ # Model checkpoints
├── notebooks/ # Jupyter notebooks for analysis
└── requirements.txt # Python dependencies
python scripts/run_experiments.py \
--model_name "EleutherAI/pythia-70m" \
--dataset counterfact \
--dataset_path data/counterfact.json \
--output_dir results \
--scales 1 3 5 10 25 50 \
--use_orthogonal \
--device cpuThe main class for applying orthogonal edits:
from utils.orthogonal_editing import OrthogonalLowRankEditor, Edit
editor = OrthogonalLowRankEditor(model, tokenizer, use_qr=True, device="cpu")
# Apply a single edit
edit = Edit(
subject="Paris",
relation="capital of",
old_object="France",
new_object="Germany",
layer_idx=6
)
# Returns u (update direction) and v (projection)
u, v = editor.apply_edit(edit)
# Apply multiple edits (automatically handles orthogonalization)
edits = [edit1, edit2, ...]
updates = editor.apply_edits_batch(edits)
# Apply to model weights
editor.apply_updates_to_model(updates)from utils.evaluation import KnowledgeEditingEvaluator
evaluator = KnowledgeEditingEvaluator(model, tokenizer, device="cpu")
result = evaluator.evaluate_edit(
subject="Paris",
relation="capital of",
old_object="France",
new_object="Germany",
unrelated_facts=[...],
paraphrases=[...]
)Download from: CounterFact Dataset (ROME website)
Download from: zsRE Dataset (ROME website)
The paper experiments include:
- Scaling Analysis: Testing edit performance from 1 to 50 edits.
- Baseline Comparison: Comparing against ROME, MEMIT, and naive sequential editing.
- Geometric Analysis: Measuring condition number, interference index, and effective rank.
- Robustness Testing: Testing order invariance and noise robustness.
- This implementation focuses on the geometric analysis of edit interactions.
- Designed for use with Pythia and GPT-style models.
- Uses
SimpleROME(gradient-based rank-1 updates) as the base editor signal.
gadwant
- Initial implementation and experiments.
This project is licensed under the MIT License - see the LICENSE file for details.