Skip to content

Implement "NorMuon" Optimizer (Spectral Balancing) #10

@vukrosic

Description

@vukrosic

Difficulty: Hard

Description:
We should investigate replacing the standard Muon optimizer with NorMuon. This variant modifies the Newton-Schulz iteration steps to effectively balance the spectral norm of the weight matrices. This has shown significant convergence speedups in later speedrun records.

Task:

  1. Port the NorMuon optimization logic (specifically the update step and axis handling).
  2. (optional) Run a baseline training run with standard Muon vs. NorMuon.
  3. (optional) Perform a hyperparameter sweep on the learning rate, as NorMuon often requires different tuning than standard Muon.

References:


🛠️ General Instructions

1. Environment Setup
To get started with development, follow these steps:

  1. Fork this repository - Click the "Fork" button at the top right.
  2. Clone your fork:
    git clone FORK_URL_HERE
    cd 5-dollar-llm
    (You may also clone it with our coding IDE)

Note: If you have already forked/cloned, please ensure you sync your fork with this repo & pull the latest changes to your local before starting - we make frequent changes)*

  1. Install dependencies:
    pip install -r requirements.txt

2. Write your code

3. Verification & Testing

  • Debug Mode: To quickly check if your code runs without errors (on CPU or GPU), use the debug script:
    python debug_moe.py
  • Performance Test (optional): We will run the experiments anyways, but you may also run it (specify new name so you don't overwrite the baseline):
    python train_moe.py --experiment_name amp_speed_test
    (Note: This will use the GPU24GBMoEModelConfig by default)

4. Submission
Once finished, please create a Pull Request into the development branch, preferrably notify us on Discord as well.

No experiment is guaranteed to yield improvement, however, you will be credited for you work in any case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions