[Modeling] Reduce runtime when loading missing keys #36312

kylesayrs · 2025-02-20T23:27:16Z

Purpose

Reduce runtime when loading missing keys with low_cpu_mem_usage
- This is particularly helpful when loading large models

Changes

Do not eagerly convert all keys into a list. Instead, check membership directly to preserve O(1) lookup within the for loop

Testing

This script can be used to load a large model without a state dict for testing
- Without these changes, loading deepseek_v3 takes 229.1s
- With these changes, loading deepseek_v3 takes 64.5s (~4x speedup)

load_without_weights_for_testing.py

import os
import torch
import tempfile
import contextlib
from huggingface_hub import snapshot_download
from transformers.utils import SAFE_WEIGHTS_INDEX_NAME, WEIGHTS_INDEX_NAME
from safetensors.torch import save_file

import time
from transformers import AutoModelForCausalLM, AutoConfig, PreTrainedModel
from accelerate import init_empty_weights

## Define utils

@contextlib.contextmanager
def skip_weights_download(model_class: PreTrainedModel = AutoModelForCausalLM):
    """
    Context manager under which models are initialized without having to download
    the model weight files

    :param model_class: class to patch, `AutoModelForCausalLM`
    """
    original_fn = model_class.from_pretrained
    weights_files = [
        "*.bin", "*.safetensors", "*.pth", SAFE_WEIGHTS_INDEX_NAME, WEIGHTS_INDEX_NAME
    ]

    @classmethod
    def patched(cls, *args, **kwargs):
        nonlocal tmp_dir

        # intercept model stub
        model_stub = args[0] if args else kwargs.pop("pretrained_model_name_or_path")

        # download files into tmp dir
        os.makedirs(tmp_dir, exist_ok=True)
        snapshot_download(
            repo_id=model_stub,
            local_dir=tmp_dir,
            ignore_patterns=weights_files
        )

        # make an empty weights file to avoid errors
        weights_file_path = os.path.join(tmp_dir, "model.safetensors")
        save_file({}, weights_file_path, metadata={"format": "pt"})

        # load from tmp dir
        return original_fn(tmp_dir, **kwargs)
    
    with tempfile.TemporaryDirectory() as tmp_dir:
        model_class.from_pretrained = patched
        yield
        model_class.from_pretrained = original_fn


@contextlib.contextmanager
def skip_weights_initialize():
    def skip(tensor: torch.Tensor, *args, **kwargs) -> torch.Tensor:
        return tensor

    kaiming_restore = torch.nn.init.kaiming_uniform_
    uniform_restore = torch.nn.init.uniform
    normal_restore = torch.nn.init.normal_

    t_uniform_restore = torch.Tensor.uniform_
    t_normal_restore = torch.Tensor.normal_

    torch.nn.init.kaiming_uniform_ = skip
    torch.nn.init.uniform_ = skip
    torch.nn.init.normal_ = skip

    torch.Tensor.uniform_ = skip
    torch.Tensor.normal_ = skip
    try:
        yield
    finally:
        torch.nn.init.kaiming_uniform_ = kaiming_restore
        torch.nn.init.uniform_ = uniform_restore
        torch.nn.init.normal_ = normal_restore

        torch.Tensor.uniform_ = t_uniform_restore
        torch.Tensor.normal_ = t_normal_restore


## Load model
model_name = "deepseek-ai/DeepSeek-V3"

# needed for deekseek
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
del config.quantization_config

# load model
start = time.time()
with skip_weights_download(), skip_weights_initialize():
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        config=config,
        device_map="auto",
        trust_remote_code=True,
    )
    print(f"Loaded model in {time.time() - start:.1f}s")

Signed-off-by: Kyle Sayers <[email protected]>

Rocketknight1

This seems like an obviously correct change, and a clear improvement!

Approving for now, but I see you're pushing additional commits, so ping me whenever it's ready for final review + merging.

kylesayrs · 2025-02-21T19:17:16Z

@Rocketknight1 All good on my end, thanks!

Rocketknight1 · 2025-02-24T16:10:40Z

Merged, and thank you for the improvement @kylesayrs!

hoist keys

d58106a

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the title ~~Reduce runtime when loading missing keys~~ [Modeling] Reduce runtime when loading missing keys Feb 20, 2025

kylesayrs added 2 commits February 20, 2025 18:31

remove hoist

f0722a4

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'main' into kylesayrs/low-memory-load-optimization

cdfa479

Rocketknight1 approved these changes Feb 21, 2025

View reviewed changes

shethaadit approved these changes Feb 21, 2025

View reviewed changes

Rocketknight1 merged commit 05dfed0 into huggingface:main Feb 24, 2025
21 checks passed

kylesayrs deleted the kylesayrs/low-memory-load-optimization branch March 7, 2025 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Modeling] Reduce runtime when loading missing keys #36312

[Modeling] Reduce runtime when loading missing keys #36312

Uh oh!

kylesayrs commented Feb 20, 2025 •

edited

Loading

Uh oh!

Rocketknight1 left a comment

Uh oh!

kylesayrs commented Feb 21, 2025

Uh oh!

Uh oh!

Rocketknight1 commented Feb 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Modeling] Reduce runtime when loading missing keys #36312

[Modeling] Reduce runtime when loading missing keys #36312

Uh oh!

Conversation

kylesayrs commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Testing

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Feb 21, 2025

Uh oh!

Uh oh!

Rocketknight1 commented Feb 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kylesayrs commented Feb 20, 2025 •

edited

Loading