[data] specify position_ids in PackedSupervisedDatasetProcessor for neat_packing#7318
Conversation
|
Hi @hiyouga , I noticed this PR has been in pending status for about two weeks. Would you have time to review the changes? The PR aims to add |
|
Sorry for delay! merged now |
|
@BlackWingedKing @hiyouga Hello, I may found a bug for this pull request. I have been testing this loss_nan issue for hours. I think there is a bug here, especially the seq_len is longer than 1 sample's length. In addition, I also found that without liger_kernel, the training will fail at the embedding size, as follows: Maybe related: #7490 |
|
@kangqiyue will be fixed in #7623 |
[assets] update wechat (hiyouga#7288) [dataset] fix ultrachat_200k dataset (hiyouga#7259) The `HuggingFaceH4/ultrachat_200k` dataset doesn't contain the default "train" split. The correct split is "train_sft". [data] gemma3 plugin pan and scan (hiyouga#7294) * gemma3 pan and scan * add test case * fix test [inference] support sglang backend (hiyouga#7278) * Mimic SGLang offline Engine * Add more tests and args * Pass all current tests * Clean Code * fix sample_params * clean code * Fix Stream Chat * change sglang from engine mode to server mode * fix * Fix Review Issues * Use SGLang Built-In Utilities * Fix test SGLang * Some Doc Issue * fix sglang engine * add readme --------- Co-authored-by: Jin Pan <jpan236@wisc.edu> Co-authored-by: hiyouga <hiyouga@buaa.edu.cn> [model] support hunyuan 7b (hiyouga#7317) * [Model]supported tencent-hunyuan model * [Model]supported tencent-hunyuan model(fix) * [Model]supported tencent-hunyuan model(fix) [assets] update videos (hiyouga#7340) * Update README.md * Update README_zh.md [data] fix template (hiyouga#7349) [misc] set dev version (hiyouga#7351) [assets] update wechat (hiyouga#7361) [version] fix minicpmo (hiyouga#7378) [3rdparty] fix redundant process group destroy for ray (hiyouga#7395) * fix redundant process group destroy for ray * Update tuner.py --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [misc] fix sglang deps (hiyouga#7432) * feat: Add transformer version requirement for sglang * feat: add srt to sglang which is required for running sglang Other options are srt_hip, srt_xpu, srt_npu, srt_hpu, srt_cpu, for different computation architectures. [deps] upgrade vllm to 0.8 (hiyouga#7436) [deps] upgrade transformers to 4.50.0 (hiyouga#7437) * upgrade transformers * fix hf cache * fix dpo trainer [scripts] support compute score on vllm's predictions (hiyouga#7419) * enable manual bleu&rouge eval by adding `scripts/eval_bleu_rouge.py` * added libraries check * update: 使用datasets库的多进程加速处理 * update: - 使用 fire.Fire - 修改代码格式 * Update eval_bleu_rouge.py: correctly uses fire Deleted the code of using sys.argv * Update eval_bleu_rouge.py --------- Co-authored-by: SnowFox4004 <manba@out> Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [misc] fix license (hiyouga#7440) [misc] fix ci (hiyouga#7441) * fix ci * improve ci [docker] upgrade to torch 2.6 (hiyouga#7442) [trainer] fix vlm loss for transformers 4.49 (hiyouga#7448) [assets] fix gemma3 readme (hiyouga#7449) [assets] update wechat (hiyouga#7455) [misc] enable liger kernel for gemma3 (hiyouga#7462) [misc] enable liger kernel for gemma3 text and paligemma (hiyouga#7466) * add gemma3 text * add paligemma (1,2 and 2 mix) [misc] update liger-kernel's monkey patch (hiyouga#7453) * Update liger_kernel.py * Update setup.py [model] fix lora on quant models (hiyouga#7456) Co-authored-by: root <root@ai> [model] add qwen2vl 32b & upgrade peft (hiyouga#7469) * add qwen2vl 32b * fix ci * upgrade peft to 0.15 * fix ci * fix ci [trainer] fix wsd scheduler (hiyouga#7304) * [trainer] Warmup_stable_decay supports setting the number of stable and decay steps according to the warmup_ratio ratio * Update trainer_utils.py --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [3rdparty] support swanlab lark notification (hiyouga#7481) [data] fix pixtral plugin (hiyouga#7505) * preserve `image_sizes` * add comments [assets] update wechat (hiyouga#7523) [deps] pin pydantic to 2.10.6 (hiyouga#7546) [model] add Qwen2.5-Omni model (hiyouga#7537) * preserve image_sizes * preserve image_sizes * init plugin * support audio-text2text lora * nit * support image/video-text2text, audio-text2text * remove args * remove lines * add docs && nit * remove some comments * fix && add merge part script * add license [data] fix qwen2.5 omni collator (hiyouga#7553) [trainer] new kto mismatch pair creation strategy (hiyouga#7509) [data] shard the dataset to allow multiprocessing when streaming is enabled (hiyouga#7530) * Shard the dataset when streaming to allow multiprocessing * Allow user to not set dataset_shards to ensure backward compatibility [webui] fix launch with proxy (hiyouga#7332) [data] specify position_ids in PackedSupervisedDatasetProcessor for neat_packing (hiyouga#7318) * use position_ids for neat_packing with fa2 * revert fa2 changes [model] fix use_cache patching for gemma3 multimodal (hiyouga#7500) [model] fix kv cache (hiyouga#7564) [infer] vllm video/audio inference (hiyouga#7566) [trainer] fix batch processing in PPO trainer (hiyouga#7576) [data] fix qwen2.5 omni plugin (hiyouga#7573) * align key with qwen2vl * nit && change scripts [data] fix qwen2.5 omni plugin (hiyouga#7578) * specific entry * Update mm_plugin.py * fix fps cal --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [assets] update wechat (hiyouga#7594) [model] add llama4 (hiyouga#7611) [assets] update readme (hiyouga#7612) [misc] fix packing and eval plot (hiyouga#7623) [sglang] support transformers 4.51.0 (hiyouga#7639) [trainer] fix key error (hiyouga#7635) [data] Fix bugs of `use_audio_in_video` in Qwen2.5 Omni (hiyouga#7638) * cache _mm_inputs * nit * support for use_audio_in_video * remove cache * fix data * Update mllm_video_audio_demo.json [assets] update readme (hiyouga#7644) [assets] update readme (hiyouga#7654) [data] add coig-p dataset (hiyouga#7657) [misc] fix cuda warn on intel GPU (hiyouga#7655) [bugfix] enable_gemma_liger_kernel (hiyouga#7660) - The `enable_liger_kernel` function for the Gemma model series was not executed due to the existing `if` statement in the code. - Changed the line to an `elif` statement so that the `apply_liger_kernel` function is executed properly. resolved: hiyouga#7628 [ray] allow for specifying ray.init kwargs (i.e. runtime_env) (hiyouga#7647) * ray init kwargs * Update trainer_utils.py * fix ray args --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [data] support for specifying a dataset in cloud storage (hiyouga#7567) * add support for loading datasets from s3/gcs * add comments to readme * run linter and address comments * add option to pass in kwargs to ray init (i.e. runtime env) * address comment * revert mixed up changes [assets] update wechat (hiyouga#7674) [deps] fix uv conflicts (hiyouga#7686) * fix hiyouga#7678 * Update setup.py * Update tests.yml * Update publish.yml * Update Makefile [model] add GLM-4-0414 (hiyouga#7695) * Update README_zh.md * update [deps] upgrade transformers (hiyouga#7704) [misc] upgrade cli (hiyouga#7714) [misc] fix env vars (hiyouga#7715) [model] Support Kimi_VL thinking/instruct (hiyouga#7719) * add kimi_vl * patch config * check version * Update mm_plugin.py * Update mm_plugin.py --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [assets] update model readme (hiyouga#7724) [docker] patch docker-rocm (hiyouga#7725) * Update Dockerfile * Fix typo * Fix syntax for /bin/sh conditional * Add build args to docker-compose * Change shell to /bin/bash This is required for "==" syntax in conditional string comparison [deps] upgrade vllm (hiyouga#7728) [api] fix chat messages (hiyouga#7732) [assets] wechat (hiyouga#7740) [infer] support vllm-ascend (hiyouga#7739) [misc] improve entrypoint (hiyouga#7345) * 纯粹优化下入口代码,因为看到if else太多了 * Update cli.py --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [model] support intern-VL 2.5-3 series (hiyouga#7258) * add internvl and rebase * fix for internvl2&3 * remove lines * fix video_inputs & lint * nit * add constants * remove lines * fix * fix error * pass ci * pass ci * skip internvl & nit [infer] set env for vllm ascend (hiyouga#7745) [breaking] bump transformers to 4.45.0 & improve ci (hiyouga#7746) * update ci * fix * fix * fix * fix * fix [trainer] fix pt loss (hiyouga#7748) * fix pt loss * robust * fix * test [assets] update wechat (hiyouga#7792) [misc] fix bug in constant (hiyouga#7765) Co-authored-by: Sachin Beldona <sbeldona@cs.cmu.edu> [model] fix gemma3 export (hiyouga#7786) Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [misc] fix new tokens adding (hiyouga#7253) Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [data] Fix wrong position ids with packed attention masks (hiyouga#7754) Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [parser] support omegaconf (hiyouga#7793) [trainer] Add Muon Optimizer (hiyouga#7749) Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [example] add bash usage (hiyouga#7794) [data] improve mmplugin (hiyouga#7795) [trainer] support early stop (hiyouga#7797) [misc] update internvl constants (hiyouga#7801) [model] add arch check for InternVL (hiyouga#7803) [assets] update model readme (hiyouga#7804) [data] fix internvl plugin (hiyouga#7817) [model] fix moe zero3 (hiyouga#7826) Merge commit from fork [model] fix vit gradient checkpointing (hiyouga#7830) [assets] update wechat (hiyouga#7840) [ray] add storage filesystem to ray config (hiyouga#7854) fix attn patch for kimivl (hiyouga#7867) [data] fix minicpmo vllm infer (hiyouga#7870) [trainer] make projector trainable in freeze training (hiyouga#7872) Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> [data] fix qwen2 omni plugin (hiyouga#7875) [model] fix dsv3 leaf node (hiyouga#7879) [data] fix qwen2.5 omni template (hiyouga#7883) [model] add qwen3 (hiyouga#7885) support lora sft dsv3 update code update eval yaml rebase sync w/ major branch update baseline
…eat_packing (hiyouga#7318) * use position_ids for neat_packing with fa2 * revert fa2 changes
…eat_packing (hiyouga#7318) * use position_ids for neat_packing with fa2 * revert fa2 changes
…eat_packing (hiyouga#7318) * use position_ids for neat_packing with fa2 * revert fa2 changes



What does this PR do?
In current
neat_packingwe don't useposition_idsalong with the 4D attention mask,Even though the 4D mask prevents cross sequence attention,
neat_packingis still different from no packing, since the model perceives (position encoding difference) later sentence position different in comparison to no packing.In this PR, we add
position_idsfor sentences whenneat_packingis specified, which allows the tokens position to be perceived (proper position encoding) properly as if there's no packing.Before submitting