Checklist / 检查清单
Bug Description / Bug 描述
It seems the PiSSA doesn't support multi-GPU setups.
When training with PiSSA on multiple GPUs, the process hangs after the first parameter save, even SFT.
This issue is reproducible across multiple versions of ms-swift.
How to Reproduce / 如何复现
Followed https://github.com/modelscope/ms-swift/blob/main/examples/train/lora_sft.sh
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
swift sft \
--model Qwen/Qwen3.5-2B \
--tuner_type lora \
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
'AI-ModelScope/alpaca-gpt4-data-en#500' \
'swift/self-cognition#500' \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 8 \
--init_weights pissa \
--target_modules all-linear \
--gradient_accumulation_steps 1 \
--eval_steps 50 \
--save_steps 2 \
--save_total_limit 2 \
--logging_steps 5 \
--max_length 2048 \
--output_dir output \
--system 'You are a helpful assistant.' \
--warmup_ratio 0.05 \
--dataset_num_proc 4 \
--dataloader_num_workers 4 \
--model_author swift \
--model_name swift-robot
Additional Information / 补充信息
No response
Checklist / 检查清单
Bug Description / Bug 描述
It seems the PiSSA doesn't support multi-GPU setups.
When training with PiSSA on multiple GPUs, the process hangs after the first parameter save, even SFT.
This issue is reproducible across multiple versions of ms-swift.
How to Reproduce / 如何复现
Followed https://github.com/modelscope/ms-swift/blob/main/examples/train/lora_sft.sh
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ NPROC_PER_NODE=8 \ swift sft \ --model Qwen/Qwen3.5-2B \ --tuner_type lora \ --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \ 'AI-ModelScope/alpaca-gpt4-data-en#500' \ 'swift/self-cognition#500' \ --torch_dtype bfloat16 \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --learning_rate 1e-4 \ --lora_rank 8 \ --lora_alpha 8 \ --init_weights pissa \ --target_modules all-linear \ --gradient_accumulation_steps 1 \ --eval_steps 50 \ --save_steps 2 \ --save_total_limit 2 \ --logging_steps 5 \ --max_length 2048 \ --output_dir output \ --system 'You are a helpful assistant.' \ --warmup_ratio 0.05 \ --dataset_num_proc 4 \ --dataloader_num_workers 4 \ --model_author swift \ --model_name swift-robotAdditional Information / 补充信息
No response