Skip to content

Conversation

@faaany
Copy link
Contributor

@faaany faaany commented Feb 7, 2025

What does this PR do?

While running the deepspeed example code, I got the following error, indicating the value of stage3_prefetch_bucket_size in the deepspeed config is not correct. The given value is 3774873.6, which is a float, but it should be an integer. So this PR fixes this.

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/sdp/fanli/doc_to_fix.py", line 112, in <module>
[rank0]:     model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
[rank0]:   File "/home/sdp/fanli/transformers/src/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
[rank0]:     return model_class.from_pretrained(
[rank0]:   File "/home/sdp/fanli/transformers/src/transformers/modeling_utils.py", line 261, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/sdp/fanli/transformers/src/transformers/modeling_utils.py", line 4144, in from_pretrained
[rank0]:     deepspeed.zero.Init(config_dict_or_path=deepspeed_config()),
[rank0]:   File "/home/sdp/miniforge3/envs/ipex-ww04/lib/python3.10/site-packages/deepspeed-0.16.3+66d3d3e94-py3.10.egg/deepspeed/runtime/zero/partition_parameters.py", line 949, in __init__
[rank0]:     _ds_config = deepspeed.runtime.config.DeepSpeedConfig(config_dict_or_path,
[rank0]:   File "/home/sdp/miniforge3/envs/ipex-ww04/lib/python3.10/site-packages/deepspeed-0.16.3+66d3d3e94-py3.10.egg/deepspeed/runtime/config.py", line 797, in __init__
[rank0]:     self._initialize_params(copy.copy(self._param_dict))
[rank0]:   File "/home/sdp/miniforge3/envs/ipex-ww04/lib/python3.10/site-packages/deepspeed-0.16.3+66d3d3e94-py3.10.egg/deepspeed/runtime/config.py", line 817, in _initialize_params
[rank0]:     self.zero_config = get_zero_config(param_dict)
[rank0]:   File "/home/sdp/miniforge3/envs/ipex-ww04/lib/python3.10/site-packages/deepspeed-0.16.3+66d3d3e94-py3.10.egg/deepspeed/runtime/zero/config.py", line 73, in get_zero_config
[rank0]:     return DeepSpeedZeroConfig(**zero_config_dict)
[rank0]:   File "/home/sdp/miniforge3/envs/ipex-ww04/lib/python3.10/site-packages/deepspeed-0.16.3+66d3d3e94-py3.10.egg/deepspeed/runtime/config_utils.py", line 57, in __init__
[rank0]:     super().__init__(**data)
[rank0]:   File "/home/sdp/.local/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__
[rank0]:     validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
[rank0]: pydantic_core._pydantic_core.ValidationError: 1 validation error for DeepSpeedZeroConfig
[rank0]: stage3_prefetch_bucket_size
[rank0]:   Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=3774873.6, input_type=float]
[rank0]:     For further information visit https://errors.pydantic.dev/2.10/v/int_from_float

Documentation: @stevhliu

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Also pinging @muellerzr for a quick look :)

@stevhliu stevhliu requested a review from muellerzr February 7, 2025 20:04
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@faaany
Copy link
Contributor Author

faaany commented Feb 26, 2025

Hi @muellerz, could you pls take a look? Thx!

@stevhliu stevhliu merged commit 51083d1 into huggingface:main Feb 28, 2025
10 checks passed
garrett361 pushed a commit to garrett361/transformers that referenced this pull request Mar 4, 2025
garrett361 pushed a commit to garrett361/transformers that referenced this pull request Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants