Skip to content

VLLM 0.6.6.post1: There is no module or parameter named 'multi_modal_projector.cross_attn.layer_norm_kv' in AriaForConditionalGeneration #99

@dcrockwell

Description

@dcrockwell

Using vllm version 0.6.6.post1, there appears to be an issue with a missing parameter multi_modal_projector.cross_attn.layer_norm_kv in AriaForConditionalGeneration

Below are the logs from the attempt:

INFO 01-23 16:15:25 config.py:2272] Downcasting torch.float32 to torch.bfloat16.
INFO 01-23 16:15:31 config.py:510] This model supports multiple tasks: {'score', 'classify', 'embed', 'generate', 'reward'}. Defaulting to 'generate'.
INFO 01-23 16:15:31 config.py:1310] Defaulting to use mp for distributed inference
WARNING 01-23 16:15:31 arg_utils.py:1113] The model has a long context length (65536). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
INFO 01-23 16:15:31 llm_engine.py:234] Initializing an LLM engine (v0.6.6.post1) with config: model='rhymes-ai/Aria', speculative_config=None, tokenizer='rhymes-ai/Aria', skip_tokenizer_init=False, tokenizer_mode=slow, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=rhymes-ai/Aria, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"candidate_compile_sizes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, 
WARNING 01-23 16:15:31 tokenizer.py:174] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
WARNING 01-23 16:15:31 multiproc_worker_utils.py:312] Reducing Torch parallelism from 24 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 01-23 16:15:31 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
INFO 01-23 16:15:32 selector.py:120] Using Flash Attention backend.
(VllmWorkerProcess pid=9457) INFO 01-23 16:15:36 selector.py:120] Using Flash Attention backend.
(VllmWorkerProcess pid=9457) INFO 01-23 16:15:36 multiproc_worker_utils.py:222] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9458) INFO 01-23 16:15:36 selector.py:120] Using Flash Attention backend.
(VllmWorkerProcess pid=9458) INFO 01-23 16:15:36 multiproc_worker_utils.py:222] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9459) INFO 01-23 16:15:36 selector.py:120] Using Flash Attention backend.
(VllmWorkerProcess pid=9459) INFO 01-23 16:15:36 multiproc_worker_utils.py:222] Worker ready; awaiting tasks
INFO 01-23 16:15:37 utils.py:918] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9457) INFO 01-23 16:15:37 utils.py:918] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9458) INFO 01-23 16:15:37 utils.py:918] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9459) INFO 01-23 16:15:37 utils.py:918] Found nccl from library libnccl.so.2
INFO 01-23 16:15:37 pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorkerProcess pid=9457) INFO 01-23 16:15:37 pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorkerProcess pid=9458) INFO 01-23 16:15:37 pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorkerProcess pid=9459) INFO 01-23 16:15:37 pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorkerProcess pid=9457) WARNING 01-23 16:15:37 custom_all_reduce.py:134] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9458) WARNING 01-23 16:15:37 custom_all_reduce.py:134] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9459) WARNING 01-23 16:15:37 custom_all_reduce.py:134] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 01-23 16:15:37 custom_all_reduce.py:134] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
INFO 01-23 16:15:37 shm_broadcast.py:255] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1, 2, 3], buffer_handle=(3, 4194304, 6, 'psm_85a2fc8d'), local_subscribe_port=46095, remote_subscribe_port=None)
INFO 01-23 16:15:37 model_runner.py:1094] Starting to load model rhymes-ai/Aria...
(VllmWorkerProcess pid=9458) INFO 01-23 16:15:37 model_runner.py:1094] Starting to load model rhymes-ai/Aria...
(VllmWorkerProcess pid=9457) INFO 01-23 16:15:37 model_runner.py:1094] Starting to load model rhymes-ai/Aria...
(VllmWorkerProcess pid=9459) INFO 01-23 16:15:37 model_runner.py:1094] Starting to load model rhymes-ai/Aria...
(VllmWorkerProcess pid=9457) INFO 01-23 16:15:37 selector.py:249] Cannot use FlashAttention-2 backend for head size 72.
(VllmWorkerProcess pid=9459) INFO 01-23 16:15:37 selector.py:249] Cannot use FlashAttention-2 backend for head size 72.
(VllmWorkerProcess pid=9458) INFO 01-23 16:15:37 selector.py:249] Cannot use FlashAttention-2 backend for head size 72.
(VllmWorkerProcess pid=9457) INFO 01-23 16:15:37 selector.py:129] Using XFormers backend.
(VllmWorkerProcess pid=9459) INFO 01-23 16:15:37 selector.py:129] Using XFormers backend.
(VllmWorkerProcess pid=9458) INFO 01-23 16:15:37 selector.py:129] Using XFormers backend.
INFO 01-23 16:15:37 selector.py:249] Cannot use FlashAttention-2 backend for head size 72.
INFO 01-23 16:15:37 selector.py:129] Using XFormers backend.
(VllmWorkerProcess pid=9458) INFO 01-23 16:15:38 weight_utils.py:251] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9459) INFO 01-23 16:15:38 weight_utils.py:251] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9457) INFO 01-23 16:15:38 weight_utils.py:251] Using model weights format ['*.safetensors']
INFO 01-23 16:15:38 weight_utils.py:251] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards:   0% Completed | 0/12 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:   8% Completed | 1/12 [00:01<00:19,  1.74s/it]
Loading safetensors checkpoint shards:  17% Completed | 2/12 [00:03<00:16,  1.64s/it]
Loading safetensors checkpoint shards:  25% Completed | 3/12 [00:05<00:15,  1.73s/it]
Loading safetensors checkpoint shards:  33% Completed | 4/12 [00:05<00:10,  1.35s/it]
Loading safetensors checkpoint shards:  42% Completed | 5/12 [00:07<00:10,  1.51s/it]
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236] Traceback (most recent call last):
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 230, in _run_worker_process
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]              ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/worker/worker.py", line 155, in load_model
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     self.model_runner.load_model()
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1096, in load_model
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     self.model = get_model(vllm_config=self.vllm_config)
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     return loader.load_model(vllm_config=vllm_config)
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 366, in load_model
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     loaded_weights = model.load_weights(
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]                      ^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/aria.py", line 676, in load_weights
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 237, in load_weights
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     autoloaded_weights = set(self._load_module("", self.module, weights))
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 198, in _load_module
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     yield from self._load_module(prefix,
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 198, in _load_module
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     yield from self._load_module(prefix,
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 226, in _load_module
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     raise ValueError(msg)
(VllmWorkerProcess pid=9458) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236] ValueError: There is no module or parameter named 'multi_modal_projector.cross_attn.layer_norm_kv' in AriaForConditionalGeneration
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236] Traceback (most recent call last):
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 230, in _run_worker_process
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]              ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/worker/worker.py", line 155, in load_model
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     self.model_runner.load_model()
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1096, in load_model
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     self.model = get_model(vllm_config=self.vllm_config)
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     return loader.load_model(vllm_config=vllm_config)
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 366, in load_model
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     loaded_weights = model.load_weights(
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]                      ^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/aria.py", line 676, in load_weights
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 237, in load_weights
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     autoloaded_weights = set(self._load_module("", self.module, weights))
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 198, in _load_module
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     yield from self._load_module(prefix,
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 198, in _load_module
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     yield from self._load_module(prefix,
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 226, in _load_module
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236]     raise ValueError(msg)
(VllmWorkerProcess pid=9457) ERROR 01-23 16:15:47 multiproc_worker_utils.py:236] ValueError: There is no module or parameter named 'multi_modal_projector.cross_attn.layer_norm_kv' in AriaForConditionalGeneration
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236] Traceback (most recent call last):
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 230, in _run_worker_process
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]              ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/worker/worker.py", line 155, in load_model
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]     self.model_runner.load_model()
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1096, in load_model
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]     self.model = get_model(vllm_config=self.vllm_config)
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]     return loader.load_model(vllm_config=vllm_config)
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 366, in load_model
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]     loaded_weights = model.load_weights(
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]                      ^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/aria.py", line 676, in load_weights
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]     loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 237, in load_weights
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]     autoloaded_weights = set(self._load_module("", self.module, weights))
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 198, in _load_module
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]     yield from self._load_module(prefix,
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 198, in _load_module
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]     yield from self._load_module(prefix,
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 226, in _load_module
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236]     raise ValueError(msg)
(VllmWorkerProcess pid=9459) ERROR 01-23 16:15:48 multiproc_worker_utils.py:236] ValueError: There is no module or parameter named 'multi_modal_projector.cross_attn.layer_norm_kv' in AriaForConditionalGeneration
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/ubuntu/server.py", line 58, in <module>
[rank0]:     main()
[rank0]:   File "/home/ubuntu/server.py", line 7, in main
[rank0]:     llm = LLM(
[rank0]:           ^^^^
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/utils.py", line 986, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 230, in __init__
[rank0]:     self.llm_engine = self.engine_class.from_engine_args(
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 517, in from_engine_args
[rank0]:     engine = cls(
[rank0]:              ^^^^
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in __init__
[rank0]:     self.model_executor = executor_class(vllm_config=vllm_config, )
[rank0]:                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__
[rank0]:     super().__init__(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 36, in __init__
[rank0]:     self._init_executor()
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 83, in _init_executor
[rank0]:     self._run_workers("load_model",
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 157, in _run_workers
[rank0]:     driver_worker_output = driver_worker_method(*args, **kwargs)
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/worker/worker.py", line 155, in load_model
[rank0]:     self.model_runner.load_model()
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1096, in load_model
[rank0]:     self.model = get_model(vllm_config=self.vllm_config)
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model
[rank0]:     return loader.load_model(vllm_config=vllm_config)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 366, in load_model
[rank0]:     loaded_weights = model.load_weights(
[rank0]:                      ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/aria.py", line 676, in load_weights
[rank0]:     loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 237, in load_weights
[rank0]:     autoloaded_weights = set(self._load_module("", self.module, weights))
[rank0]:                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 198, in _load_module
[rank0]:     yield from self._load_module(prefix,
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 198, in _load_module
[rank0]:     yield from self._load_module(prefix,
[rank0]:   File "/opt/conda/envs/pytorch/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 226, in _load_module
[rank0]:     raise ValueError(msg)
[rank0]: ValueError: There is no module or parameter named 'multi_modal_projector.cross_attn.layer_norm_kv' in AriaForConditionalGeneration
ERROR 01-23 16:15:48 multiproc_worker_utils.py:123] Worker VllmWorkerProcess pid 9459 died, exit code: -15
INFO 01-23 16:15:48 multiproc_worker_utils.py:127] Killing local vLLM worker processes
Loading safetensors checkpoint shards:  42% Completed | 5/12 [00:10<00:14,  2.01s/it]

[rank0]:[W123 16:15:49.575131960 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
/opt/conda/envs/pytorch/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/opt/conda/envs/pytorch/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions