Skip to content

允许手动对vllm_infer.py生成的jsonl文件进行评分#7419

Merged
hiyouga merged 6 commits intohiyouga:mainfrom
SnowFox4004:manually_eval
Mar 23, 2025
Merged

允许手动对vllm_infer.py生成的jsonl文件进行评分#7419
hiyouga merged 6 commits intohiyouga:mainfrom
SnowFox4004:manually_eval

Conversation

@SnowFox4004
Copy link
Copy Markdown
Contributor

@SnowFox4004 SnowFox4004 commented Mar 22, 2025

What does this PR do?

手动提取了bleu和rouge评分逻辑并独立放在script/eval_bleu_rouge.py
方便在使用vllm_infer.py生成结果时手动进行评分
可以避免直接使用配置文件时生成速度缓慢的问题

Fixes #7418

Before submitting

  • Did you read the contributor guideline?
  • Did you write any new necessary tests?
    仅检查了需要的几个第三方库是否安装

@hiyouga
Copy link
Copy Markdown
Owner

hiyouga commented Mar 23, 2025

可以用 datasets 库的多进程加速一下?

@SnowFox4004
Copy link
Copy Markdown
Contributor Author

感谢提醒,已修改,速度显著提升,默认启用proc=6

Comment thread scripts/eval_bleu_rouge.py Outdated
return average_score


def deprecated_main(filename: str):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the deprecated func

Comment thread scripts/eval_bleu_rouge.py Outdated
Comment thread scripts/eval_bleu_rouge.py Outdated
- 使用 fire.Fire
- 修改代码格式
@SnowFox4004
Copy link
Copy Markdown
Contributor Author

感谢提醒,已修改

Deleted the code of using sys.argv
@hiyouga hiyouga self-requested a review March 23, 2025 11:16
@hiyouga hiyouga merged commit 7d4dc25 into hiyouga:main Mar 23, 2025
@hiyouga hiyouga added the solved This problem has been already solved label Mar 23, 2025
Salmon-f42 pushed a commit to IshiKura-a/LLaMA-Factory that referenced this pull request Apr 29, 2025
[assets] update wechat (hiyouga#7288)

[dataset] fix ultrachat_200k dataset (hiyouga#7259)

The `HuggingFaceH4/ultrachat_200k` dataset doesn't contain the default "train" split. The correct split is "train_sft".

[data] gemma3 plugin pan and scan (hiyouga#7294)

* gemma3 pan and scan

* add test case

* fix test

[inference] support sglang backend (hiyouga#7278)

* Mimic SGLang offline Engine

* Add more tests and args

* Pass all current tests

* Clean Code

* fix sample_params

* clean code

* Fix Stream Chat

* change sglang from engine mode to server mode

* fix

* Fix Review Issues

* Use SGLang Built-In Utilities

* Fix test SGLang

* Some Doc Issue

* fix sglang engine

* add readme

---------

Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>

[model] support hunyuan 7b (hiyouga#7317)

* [Model]supported tencent-hunyuan model

* [Model]supported tencent-hunyuan model(fix)

* [Model]supported tencent-hunyuan model(fix)

[assets] update videos (hiyouga#7340)

* Update README.md

* Update README_zh.md

[data] fix template (hiyouga#7349)

[misc] set dev version (hiyouga#7351)

[assets] update wechat (hiyouga#7361)

[version] fix minicpmo (hiyouga#7378)

[3rdparty] fix redundant process group destroy for ray (hiyouga#7395)

* fix redundant process group destroy for ray

* Update tuner.py

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[misc] fix sglang deps (hiyouga#7432)

* feat: Add transformer version requirement for sglang

* feat: add srt to sglang which is required for running sglang

Other options are srt_hip, srt_xpu, srt_npu, srt_hpu, srt_cpu, for different computation architectures.

[deps] upgrade vllm to 0.8 (hiyouga#7436)

[deps] upgrade transformers to 4.50.0 (hiyouga#7437)

* upgrade transformers

* fix hf cache

* fix dpo trainer

[scripts] support compute score on vllm's predictions (hiyouga#7419)

* enable manual bleu&rouge eval by adding `scripts/eval_bleu_rouge.py`

* added libraries check

* update: 使用datasets库的多进程加速处理

* update:
- 使用 fire.Fire
- 修改代码格式

* Update eval_bleu_rouge.py: correctly uses fire

Deleted the code of using sys.argv

* Update eval_bleu_rouge.py

---------

Co-authored-by: SnowFox4004 <manba@out>
Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[misc] fix license (hiyouga#7440)

[misc] fix ci (hiyouga#7441)

* fix ci

* improve ci

[docker] upgrade to torch 2.6 (hiyouga#7442)

[trainer] fix vlm loss for transformers 4.49 (hiyouga#7448)

[assets] fix gemma3 readme (hiyouga#7449)

[assets] update wechat (hiyouga#7455)

[misc] enable liger kernel for gemma3 (hiyouga#7462)

[misc] enable liger kernel for gemma3 text and paligemma (hiyouga#7466)

* add gemma3 text

* add paligemma (1,2 and 2 mix)

[misc] update liger-kernel's monkey patch (hiyouga#7453)

* Update liger_kernel.py

* Update setup.py

[model] fix lora on quant models (hiyouga#7456)

Co-authored-by: root <root@ai>

[model] add qwen2vl 32b & upgrade peft (hiyouga#7469)

* add qwen2vl 32b

* fix ci

* upgrade peft to 0.15

* fix ci

* fix ci

[trainer] fix wsd scheduler (hiyouga#7304)

* [trainer] Warmup_stable_decay supports setting the number of stable and decay steps according to the warmup_ratio ratio

* Update trainer_utils.py

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[3rdparty] support swanlab lark notification (hiyouga#7481)

[data] fix pixtral plugin (hiyouga#7505)

* preserve `image_sizes`

* add comments

[assets] update wechat (hiyouga#7523)

[deps] pin pydantic to 2.10.6 (hiyouga#7546)

[model] add Qwen2.5-Omni model (hiyouga#7537)

* preserve image_sizes

* preserve image_sizes

* init plugin

* support audio-text2text lora

* nit

* support image/video-text2text, audio-text2text

* remove args

* remove lines

* add docs && nit

* remove some comments

* fix && add merge part script

* add license

[data] fix qwen2.5 omni collator (hiyouga#7553)

[trainer] new kto mismatch pair creation strategy (hiyouga#7509)

[data] shard the dataset to allow multiprocessing when streaming is enabled (hiyouga#7530)

* Shard the dataset when streaming to allow multiprocessing

* Allow user to not set dataset_shards to ensure backward compatibility

[webui] fix launch with proxy (hiyouga#7332)

[data] specify position_ids in PackedSupervisedDatasetProcessor for neat_packing (hiyouga#7318)

* use position_ids for neat_packing with fa2

* revert fa2 changes

[model] fix use_cache patching for gemma3 multimodal (hiyouga#7500)

[model] fix kv cache (hiyouga#7564)

[infer] vllm video/audio inference (hiyouga#7566)

[trainer] fix batch processing in PPO trainer (hiyouga#7576)

[data] fix qwen2.5 omni plugin (hiyouga#7573)

* align key with qwen2vl

* nit && change scripts

[data] fix qwen2.5 omni plugin (hiyouga#7578)

* specific entry

* Update mm_plugin.py

* fix fps cal

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[assets] update wechat (hiyouga#7594)

[model] add llama4 (hiyouga#7611)

[assets] update readme (hiyouga#7612)

[misc] fix packing and eval plot (hiyouga#7623)

[sglang] support transformers 4.51.0 (hiyouga#7639)

[trainer] fix key error (hiyouga#7635)

[data] Fix bugs of `use_audio_in_video` in Qwen2.5 Omni (hiyouga#7638)

* cache _mm_inputs

* nit

* support for use_audio_in_video

* remove cache

* fix data

* Update mllm_video_audio_demo.json

[assets] update readme (hiyouga#7644)

[assets] update readme (hiyouga#7654)

[data] add coig-p dataset (hiyouga#7657)

[misc] fix cuda warn on intel GPU (hiyouga#7655)

[bugfix] enable_gemma_liger_kernel (hiyouga#7660)

- The `enable_liger_kernel` function for the Gemma model series was not executed due to the existing `if` statement in the code.
- Changed the line to an `elif` statement so that the `apply_liger_kernel` function is executed properly.

resolved: hiyouga#7628

[ray] allow for specifying ray.init kwargs (i.e. runtime_env) (hiyouga#7647)

* ray init kwargs

* Update trainer_utils.py

* fix ray args

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[data] support for specifying a dataset in cloud storage (hiyouga#7567)

* add support for loading datasets from s3/gcs

* add comments to readme

* run linter and address comments

* add option to pass in kwargs to ray init (i.e. runtime env)

* address comment

* revert mixed up changes

[assets] update wechat (hiyouga#7674)

[deps] fix uv conflicts (hiyouga#7686)

* fix hiyouga#7678

* Update setup.py

* Update tests.yml

* Update publish.yml

* Update Makefile

[model] add GLM-4-0414 (hiyouga#7695)

* Update README_zh.md

* update

[deps] upgrade transformers (hiyouga#7704)

[misc] upgrade cli (hiyouga#7714)

[misc] fix env vars (hiyouga#7715)

[model] Support Kimi_VL thinking/instruct (hiyouga#7719)

* add kimi_vl

* patch config

* check version

* Update mm_plugin.py

* Update mm_plugin.py

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[assets] update model readme (hiyouga#7724)

[docker] patch docker-rocm (hiyouga#7725)

* Update Dockerfile

* Fix typo

* Fix syntax for /bin/sh conditional

* Add build args to docker-compose

* Change shell to /bin/bash

This is required for "==" syntax in conditional string comparison

[deps] upgrade vllm (hiyouga#7728)

[api] fix chat messages (hiyouga#7732)

[assets] wechat (hiyouga#7740)

[infer] support vllm-ascend (hiyouga#7739)

[misc] improve entrypoint (hiyouga#7345)

* 纯粹优化下入口代码,因为看到if else太多了

* Update cli.py

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[model] support intern-VL 2.5-3 series (hiyouga#7258)

* add internvl and rebase

* fix for internvl2&3

* remove lines

* fix video_inputs & lint

* nit

* add constants

* remove lines

* fix

* fix error

* pass ci

* pass ci

* skip internvl & nit

[infer] set env for vllm ascend (hiyouga#7745)

[breaking] bump transformers to 4.45.0 & improve ci (hiyouga#7746)

* update ci

* fix

* fix

* fix

* fix

* fix

[trainer] fix pt loss (hiyouga#7748)

* fix pt loss

* robust

* fix

* test

[assets] update wechat (hiyouga#7792)

[misc] fix bug in constant (hiyouga#7765)

Co-authored-by: Sachin Beldona <sbeldona@cs.cmu.edu>

[model] fix gemma3 export (hiyouga#7786)

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[misc] fix new tokens adding (hiyouga#7253)

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[data] Fix wrong position ids with packed attention masks (hiyouga#7754)

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[parser] support omegaconf (hiyouga#7793)

[trainer] Add Muon Optimizer (hiyouga#7749)

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[example] add bash usage (hiyouga#7794)

[data] improve mmplugin (hiyouga#7795)

[trainer] support early stop (hiyouga#7797)

[misc] update internvl constants (hiyouga#7801)

[model] add arch check for InternVL (hiyouga#7803)

[assets] update model readme (hiyouga#7804)

[data] fix internvl plugin (hiyouga#7817)

[model] fix moe zero3 (hiyouga#7826)

Merge commit from fork

[model] fix vit gradient checkpointing (hiyouga#7830)

[assets] update wechat (hiyouga#7840)

[ray] add storage filesystem to ray config (hiyouga#7854)

fix attn patch for kimivl (hiyouga#7867)

[data] fix minicpmo vllm infer (hiyouga#7870)

[trainer] make projector trainable in freeze training (hiyouga#7872)

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

[data] fix qwen2 omni plugin (hiyouga#7875)

[model] fix dsv3 leaf node (hiyouga#7879)

[data] fix qwen2.5 omni template (hiyouga#7883)

[model] add qwen3 (hiyouga#7885)

support lora sft dsv3

update code

update eval yaml

rebase sync w/ major branch

update baseline
yoonseok312 pushed a commit to pensieve-ai/LLaMA-Factory-vlm that referenced this pull request Apr 29, 2025
* enable manual bleu&rouge eval by adding `scripts/eval_bleu_rouge.py`

* added libraries check

* update: 使用datasets库的多进程加速处理

* update:
- 使用 fire.Fire
- 修改代码格式

* Update eval_bleu_rouge.py: correctly uses fire

Deleted the code of using sys.argv

* Update eval_bleu_rouge.py

---------

Co-authored-by: SnowFox4004 <manba@out>
Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
liu-qingyuan pushed a commit to liu-qingyuan/LLaMA-Factory-Megafake that referenced this pull request Jun 6, 2025
* enable manual bleu&rouge eval by adding `scripts/eval_bleu_rouge.py`

* added libraries check

* update: 使用datasets库的多进程加速处理

* update:
- 使用 fire.Fire
- 修改代码格式

* Update eval_bleu_rouge.py: correctly uses fire

Deleted the code of using sys.argv

* Update eval_bleu_rouge.py

---------

Co-authored-by: SnowFox4004 <manba@out>
Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

solved This problem has been already solved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

允许手动对vllm_infer.py生成的jsonl文件进行评分

2 participants