-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Open
Description
I am using lm-evaluation-harness to evaluate the Qwen3-1.7B
model. The task is ifeval
. However, the following error appears.
# ./run.sh
2025-09-04:11:04:01 INFO [__main__:446] Selected Tasks: ['ifeval']
2025-09-04:11:04:01 INFO [evaluator:202] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-09-04:11:04:01 INFO [evaluator:240] Initializing hf model, with arguments: {'pretrained': 'models/Qwen3-1.7B', 'dtype': 'auto'}
2025-09-04:11:04:01 WARNING [accelerate.utils.other:512] Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
2025-09-04:11:04:01 INFO [models.huggingface:147] Using device 'cuda:2'
2025-09-04:11:04:01 INFO [models.huggingface:414] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:2'}
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.05it/s]
[nltk_data] Error loading punkt_tab: <urlopen error [Errno 110]
[nltk_data] Connection timed out>
Downloaded punkt_tab on rank 0
Traceback (most recent call last):
File "/usr/local/bin/lm_eval", line 8, in <module>
sys.exit(cli_evaluate())
^^^^^^^^^^^^^^
File "/workspace/lm/lm-evaluation-harness-main/lm_eval/__main__.py", line 455, in cli_evaluate
results = evaluator.simple_evaluate(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/lm/lm-evaluation-harness-main/lm_eval/utils.py", line 456, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/workspace/lm/lm-evaluation-harness-main/lm_eval/evaluator.py", line 283, in simple_evaluate
task_dict = get_task_dict(
^^^^^^^^^^^^^^
File "/workspace/lm/lm-evaluation-harness-main/lm_eval/tasks/__init__.py", line 635, in get_task_dict
task_name_from_string_dict = task_manager.load_task_or_group(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/lm/lm-evaluation-harness-main/lm_eval/tasks/__init__.py", line 426, in load_task_or_group
collections.ChainMap(
File "/workspace/lm/lm-evaluation-harness-main/lm_eval/tasks/__init__.py", line 428, in <lambda>
lambda task: self._load_individual_task_or_group(task),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/lm/lm-evaluation-harness-main/lm_eval/tasks/__init__.py", line 326, in _load_individual_task_or_group
return _load_task(task_config, task=name_or_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/lm/lm-evaluation-harness-main/lm_eval/tasks/__init__.py", line 286, in _load_task
task_object = ConfigurableTask(config=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/lm/lm-evaluation-harness-main/lm_eval/api/task.py", line 865, in __init__
self.download(self.config.dataset_kwargs)
File "/workspace/lm/lm-evaluation-harness-main/lm_eval/api/task.py", line 997, in download
self.dataset = datasets.load_dataset(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/datasets/load.py", line 2062, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/datasets/load.py", line 1782, in load_dataset_builder
dataset_module = dataset_module_factory(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/datasets/load.py", line 1519, in dataset_module_factory
).get_module()
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/datasets/load.py", line 861, in get_module
config_name: DatasetInfo.from_dict(dataset_info_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/datasets/info.py", line 284, in from_dict
return cls(**{k: v for k, v in dataset_info_dict.items() if k in field_names})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 20, in __init__
File "/usr/local/lib/python3.12/dist-packages/datasets/info.py", line 170, in __post_init__
self.features = Features.from_dict(self.features)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/datasets/features/features.py", line 1888, in from_dict
obj = generate_from_dict(dic)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/datasets/features/features.py", line 1468, in generate_from_dict
return {key: generate_from_dict(value) for key, value in obj.items()}
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/datasets/features/features.py", line 1484, in generate_from_dict
return class_type(**{k: v for k, v in obj.items() if k in field_names})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Value.__init__() missing 1 required positional argument: 'dtype'
run.sh
:
#!/bin/bash
lm_eval --model hf \
--model_args pretrained=models/Qwen3-1.7B,dtype="auto" \
--tasks ifeval \
--device cuda:2 \
--batch_size auto:4
python version: 3.12.3
pytorch version: 2.6.0+cu124
Metadata
Metadata
Assignees
Labels
No labels