bugfix: fix longcat-image cache dispatch#638
Merged
hsliuustc0106 merged 6 commits intovllm-project:mainfrom Jan 5, 2026
Merged
bugfix: fix longcat-image cache dispatch#638hsliuustc0106 merged 6 commits intovllm-project:mainfrom
hsliuustc0106 merged 6 commits intovllm-project:mainfrom
Conversation
Contributor
Author
|
@e1ijah1 Can you also take a look? |
Collaborator
|
fix DCO, and add test result |
Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: DefTruth <qiustudent_r@163.com>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk> Signed-off-by: DefTruth <qiustudent_r@163.com>
Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: DefTruth <qiustudent_r@163.com>
Signed-off-by: DefTruth <qiustudent_r@163.com>
… 'abort' (vllm-project#624) Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: DefTruth <qiustudent_r@163.com>
eff8721 to
cf6fae7
Compare
Contributor
|
We could also bump the cache-dit version as well. |
SamitHuang
approved these changes
Jan 5, 2026
Contributor
Author
done |
hsliuustc0106
approved these changes
Jan 5, 2026
tzhouam
added a commit
to tzhouam/vllm-omni
that referenced
this pull request
Jan 6, 2026
Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: DefTruth <qiustudent_r@163.com> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk> Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com>
princepride
added a commit
to princepride/vllm-omni
that referenced
this pull request
Jan 10, 2026
Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: DefTruth <qiustudent_r@163.com> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk> Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com>
sniper35
pushed a commit
to sniper35/vllm-omni
that referenced
this pull request
Jan 10, 2026
Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: DefTruth <qiustudent_r@163.com> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk> Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com>
ZJY0516
pushed a commit
to LawJarp-A/vllm-omni
that referenced
this pull request
Jan 10, 2026
Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: DefTruth <qiustudent_r@163.com> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk> Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fixed #630
pipeline name mismatch leading to wrong cache dispatch.
w/ this pr
python text_to_image.py \ --model $LONGCAT_IMAGE_DIR \ --prompt "Add a white art board written with colorful text 'vLLM-Omni' on grassland. Add a paintbrush in the bear's hands. position the bear standing in front of the art board as if painting" \ --output output_image_edit.png \ --num_inference_steps 50 \ --cfg_scale 4.0 \ --cache_backend cache_dit WARNING 01-05 07:48:45 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work INFO 01-05 07:48:46 [omni.py:122] Initializing stages for model: /workspace/dev/vipdev/hf_models/LongCat-Image INFO 01-05 07:48:46 [initialization.py:35] No OmniTransferConfig provided INFO 01-05 07:48:46 [omni_stage.py:107] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'diffusion', 'runtime': {'process': True, 'devices': '0', 'max_batch_size': 1}, 'engine_args': {'model': '/workspace/dev/vipdev/hf_models/LongCat-Image', 'vae_use_slicing': False, 'vae_use_tiling': False, 'cache_backend': 'cache_dit', 'cache_config': {'Fn_compute_blocks': 1, 'Bn_compute_blocks': 0, 'max_warmup_steps': 4, 'residual_diff_threshold': 0.24, 'max_continuous_cached_steps': 3, 'enable_taylorseer': False, 'taylorseer_order': 1, 'scm_steps_mask_policy': None, 'scm_steps_policy': 'dynamic'}, 'parallel_config': {'pipeline_parallel_size': 1, 'data_parallel_size': 1, 'tensor_parallel_size': 1, 'sequence_parallel_size': 1, 'ulysses_degree': 1, 'ring_degree': 1, 'cfg_parallel_size': 1}, 'model_stage': 'diffusion'}, 'final_output': True, 'final_output_type': 'image'} INFO 01-05 07:48:46 [omni.py:297] [Orchestrator] Waiting for 1 stages to initialize (timeout: 300s) [Stage-0] WARNING 01-05 07:48:54 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work [Stage-0] INFO 01-05 07:48:54 [omni_stage.py:434] Starting stage worker with model: /workspace/dev/vipdev/hf_models/LongCat-Image [Stage-0] WARNING 01-05 07:48:55 [envs.py:194] Flash Attention library "flash_attn" not found, using pytorch attention implementation [Stage-0] INFO 01-05 07:48:55 [diffusion_engine.py:213] Starting server... [Stage-0] WARNING 01-05 07:49:03 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work [Stage-0] WARNING 01-05 07:49:04 [envs.py:194] Flash Attention library "flash_attn" not found, using pytorch attention implementation [Stage-0] INFO 01-05 07:49:04 [gpu_worker.py:174] Worker 0 created result MessageQueue [Stage-0] INFO 01-05 07:49:04 [scheduler.py:228] Chunked prefill is enabled with max_num_batched_tokens=2048. [W105 07:49:04.723704639 socket.cpp:767] [c10d] The client socket cannot be initialized to connect to [localhost]:30089 (errno: 97 - Address family not supported by protocol). [W105 07:49:04.724641872 socket.cpp:767] [c10d] The client socket cannot be initialized to connect to [10.189.108.254]:30089 (errno: 97 - Address family not supported by protocol). [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Stage-0] INFO 01-05 07:49:04 [gpu_worker.py:75] Worker 0: Initialized device and distributed environment. [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:04<00:00, 1.02it/s] The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release. The tokenizer you are loading from '/workspace/dev/vipdev/hf_models/LongCat-Image' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. The tokenizer you are loading from '/workspace/dev/vipdev/hf_models/LongCat-Image' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:03<00:00, 3.44s/it] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:03<00:00, 3.44s/it] [Stage-0] INFO 01-05 07:49:15 [diffusers_loader.py:214] Loading weights took 3.87 seconds [Stage-0] INFO 01-05 07:49:16 [gpu_worker.py:97] Model loading took 37.7358 GiB and 11.258677 seconds [Stage-0] INFO 01-05 07:49:16 [gpu_worker.py:102] Worker 0: Model loaded successfully. [Stage-0] INFO 01-05 07:49:16 [cache_dit_backend.py:468] Using custom cache-dit enabler for model: LongCatImagePipeline [Stage-0] INFO 01-05 07:49:16 [cache_dit_backend.py:211] Enabling cache-dit on LongCatImage transformer with BlockAdapter: Fn=1, Bn=0, W=4, WARNING 01-05 07:49:16 [block_adapters.py:131] pipe is None, use FakeDiffusionPipeline instead. INFO 01-05 07:49:16 [block_adapters.py:220] Auto fill blocks_name: ['transformer_blocks', 'single_transformer_blocks']. INFO 01-05 07:49:16 [block_adapters.py:153] Found transformer NOT from diffusers: vllm_omni.diffusion.models.longcat_image.longcat_image_transformer disable check_forward_pattern by default. INFO 01-05 07:49:16 [cache_adapter.py:77] Adapting Cache Acceleration using custom BlockAdapter! WARNING 01-05 07:49:16 [block_adapters.py:469] Skipped Forward Pattern Check: ForwardPattern.Pattern_1 WARNING 01-05 07:49:16 [block_adapters.py:469] Skipped Forward Pattern Check: ForwardPattern.Pattern_1 INFO 01-05 07:49:16 [cache_adapter.py:142] Use default 'enable_separate_cfg' from block adapter register: False, Pipeline: FakeDiffusionPipeline. INFO 01-05 07:49:16 [cache_adapter.py:332] Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_CFG0, Calibrator Config: None WARNING 01-05 07:49:16 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_1 INFO 01-05 07:49:16 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for transformer_blocks, cache_context: transformer_blocks_140031439747024, context_manager: FakeDiffusionPipeline_140031439852352. WARNING 01-05 07:49:16 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_1 INFO 01-05 07:49:16 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for single_transformer_blocks, cache_context: single_transformer_blocks_140031439017008, context_manager: FakeDiffusionPipeline_140031439852352. [Stage-0] INFO 01-05 07:49:16 [cache_dit_backend.py:475] Cache-dit enabled successfully on LongCatImagePipeline [Stage-0] INFO 01-05 07:49:16 [gpu_worker.py:310] Worker 0: Scheduler loop started. [Stage-0] INFO 01-05 07:49:16 [gpu_worker.py:233] Worker 0 ready to receive requests via shared memory [Stage-0] INFO 01-05 07:49:16 [scheduler.py:46] SyncScheduler initialized result MessageQueue [Stage-0] INFO 01-05 07:49:16 [omni_stage.py:627] Max batch size: 1 INFO 01-05 07:49:16 [omni.py:290] [Orchestrator] Stage-0 reported ready INFO 01-05 07:49:16 [omni.py:316] [Orchestrator] All stages initialized successfully ============================================================ Generation Configuration: Model: /workspace/dev/vipdev/hf_models/LongCat-Image Inference steps: 50 Cache backend: cache_dit Parallel configuration: ulysses_degree=1, ring_degree=1 Image size: 1024x1024 ============================================================ Adding requests: 0%| | 0/1 [00:00<?, ?it/s[Stage-0] INFO 01-05 07:49:16 [omni_diffusion.py:112] Prepared 1 requests for generation. | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 unit/s, output: 0.00 unit/s] [Stage-0] INFO 01-05 07:49:16 [cache_dit_backend.py:496] Refreshing cache context for transformer with num_inference_steps: 50 INFO 01-05 07:49:16 [cache_adapter.py:723] ✅ Refreshed cache context: transformer_blocks_140031439747024, Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_N50_CFG0, Calibrator Config: None INFO 01-05 07:49:16 [cache_adapter.py:723] ✅ Refreshed cache context: single_transformer_blocks_140031439017008, Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_N50_CFG0, Calibrator Config: None You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. [Stage-0] INFO 01-05 07:49:33 [diffusion_engine.py:86] Generation completed successfully. [Stage-0] INFO 01-05 07:49:33 [diffusion_engine.py:109] Post-processing completed in 0.0955 seconds INFO 01-05 07:49:33 [log_utils.py:549] {'type': 'request_level_metrics', INFO 01-05 07:49:33 [log_utils.py:549] 'request_id': '0_45567883-3c2e-4cfa-a36e-9aa752e94fb0', INFO 01-05 07:49:33 [log_utils.py:549] 'e2e_time_ms': 17122.5323677063, INFO 01-05 07:49:33 [log_utils.py:549] 'e2e_tpt': 0.0, INFO 01-05 07:49:33 [log_utils.py:549] 'e2e_total_tokens': 0, INFO 01-05 07:49:33 [log_utils.py:549] 'transfers_total_time_ms': 0.0, INFO 01-05 07:49:33 [log_utils.py:549] 'transfers_total_bytes': 0, INFO 01-05 07:49:33 [log_utils.py:549] 'stages': {0: {'stage_gen_time_ms': 17087.39161491394, INFO 01-05 07:49:33 [log_utils.py:549] 'num_tokens_out': 0, INFO 01-05 07:49:33 [log_utils.py:549] 'num_tokens_in': 0}}} Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:17<00:00, 17.12s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms] INFO 01-05 07:49:33 [omni.py:711] [Summary] {'e2e_requests': 1,████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:17<00:00, 17.12s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms] INFO 01-05 07:49:33 [omni.py:711] 'e2e_total_time_ms': 17123.9755153656, INFO 01-05 07:49:33 [omni.py:711] 'e2e_sum_time_ms': 17122.5323677063, INFO 01-05 07:49:33 [omni.py:711] 'e2e_total_tokens': 0, INFO 01-05 07:49:33 [omni.py:711] 'e2e_avg_time_per_request_ms': 17122.5323677063, INFO 01-05 07:49:33 [omni.py:711] 'e2e_avg_tokens_per_s': 0.0, INFO 01-05 07:49:33 [omni.py:711] 'wall_time_ms': 17123.9755153656, INFO 01-05 07:49:33 [omni.py:711] 'final_stage_id': {'0_45567883-3c2e-4cfa-a36e-9aa752e94fb0': 0}, INFO 01-05 07:49:33 [omni.py:711] 'stages': [{'stage_id': 0, INFO 01-05 07:49:33 [omni.py:711] 'requests': 1, INFO 01-05 07:49:33 [omni.py:711] 'tokens': 0, INFO 01-05 07:49:33 [omni.py:711] 'total_time_ms': 17122.780323028564, INFO 01-05 07:49:33 [omni.py:711] 'avg_time_per_request_ms': 17122.780323028564, INFO 01-05 07:49:33 [omni.py:711] 'avg_tokens_per_s': 0.0}], INFO 01-05 07:49:33 [omni.py:711] 'transfers': []} Adding requests: 0%| | 0/1 [00:17<?, ?it/s] [Stage-0] INFO 01-05 07:49:33 [omni_stage.py:636] Received shutdown signal [Stage-0] INFO 01-05 07:49:33 [gpu_worker.py:265] Worker 0: Received shutdown message [Stage-0] INFO 01-05 07:49:33 [gpu_worker.py:287] event loop terminated. [Stage-0] INFO 01-05 07:49:33 [gpu_worker.py:318] Worker 0: Shutdown complete. Total generation time: 21.5126 seconds (21512.61 ms) INFO 01-05 07:49:37 [text_to_image.py:168] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[OmniRequestOutput(request_id='0_45567883-3c2e-4cfa-a36e-9aa752e94fb0', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt="Add a white art board written with colorful text 'vLLM-Omni' on grassland. Add a paintbrush in the bear's hands. position the bear standing in front of the art board as if painting", latents=None, metrics={})], images=[], prompt=None, latents=None, metrics={})] Saved generated image to output_image_edit.pngw/o cache: 42.39s
python text_to_image.py \ --model $LONGCAT_IMAGE_DIR \ --prompt "Add a white art board written with colorful text 'vLLM-Omni' on grassland. Add a paintbrush in the bear's hands. position the bear standing in front of the art board as if painting" \ --output output_image_edit_nocache.png \ --num_inference_steps 50 \ --cfg_scale 4.0 WARNING 01-05 07:52:18 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work INFO 01-05 07:52:18 [omni.py:122] Initializing stages for model: /workspace/dev/vipdev/hf_models/LongCat-Image INFO 01-05 07:52:18 [initialization.py:35] No OmniTransferConfig provided INFO 01-05 07:52:18 [omni_stage.py:107] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'diffusion', 'runtime': {'process': True, 'devices': '0', 'max_batch_size': 1}, 'engine_args': {'model': '/workspace/dev/vipdev/hf_models/LongCat-Image', 'vae_use_slicing': False, 'vae_use_tiling': False, 'cache_backend': None, 'cache_config': None, 'parallel_config': {'pipeline_parallel_size': 1, 'data_parallel_size': 1, 'tensor_parallel_size': 1, 'sequence_parallel_size': 1, 'ulysses_degree': 1, 'ring_degree': 1, 'cfg_parallel_size': 1}, 'model_stage': 'diffusion'}, 'final_output': True, 'final_output_type': 'image'} INFO 01-05 07:52:18 [omni.py:297] [Orchestrator] Waiting for 1 stages to initialize (timeout: 300s) [Stage-0] WARNING 01-05 07:52:27 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work [Stage-0] INFO 01-05 07:52:27 [omni_stage.py:434] Starting stage worker with model: /workspace/dev/vipdev/hf_models/LongCat-Image [Stage-0] WARNING 01-05 07:52:28 [envs.py:194] Flash Attention library "flash_attn" not found, using pytorch attention implementation [Stage-0] INFO 01-05 07:52:28 [diffusion_engine.py:213] Starting server... [Stage-0] WARNING 01-05 07:52:36 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work [Stage-0] WARNING 01-05 07:52:37 [envs.py:194] Flash Attention library "flash_attn" not found, using pytorch attention implementation [Stage-0] INFO 01-05 07:52:37 [gpu_worker.py:174] Worker 0 created result MessageQueue [Stage-0] INFO 01-05 07:52:37 [scheduler.py:228] Chunked prefill is enabled with max_num_batched_tokens=2048. [W105 07:52:37.848827436 socket.cpp:767] [c10d] The client socket cannot be initialized to connect to [localhost]:30063 (errno: 97 - Address family not supported by protocol). [W105 07:52:37.850094326 socket.cpp:767] [c10d] The client socket cannot be initialized to connect to [10.189.108.254]:30063 (errno: 97 - Address family not supported by protocol). [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Stage-0] INFO 01-05 07:52:37 [gpu_worker.py:75] Worker 0: Initialized device and distributed environment. [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:04<00:00, 1.09it/s] The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release. The tokenizer you are loading from '/workspace/dev/vipdev/hf_models/LongCat-Image' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. The tokenizer you are loading from '/workspace/dev/vipdev/hf_models/LongCat-Image' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.90s/it] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.90s/it] [Stage-0] INFO 01-05 07:52:48 [diffusers_loader.py:214] Loading weights took 3.82 seconds [Stage-0] INFO 01-05 07:52:48 [gpu_worker.py:97] Model loading took 37.7358 GiB and 10.921257 seconds [Stage-0] INFO 01-05 07:52:48 [gpu_worker.py:102] Worker 0: Model loaded successfully. [Stage-0] INFO 01-05 07:52:48 [gpu_worker.py:310] Worker 0: Scheduler loop started. [Stage-0] INFO 01-05 07:52:48 [gpu_worker.py:233] Worker 0 ready to receive requests via shared memory [Stage-0] INFO 01-05 07:52:48 [scheduler.py:46] SyncScheduler initialized result MessageQueue [Stage-0] INFO 01-05 07:52:48 [omni_stage.py:627] Max batch size: 1 INFO 01-05 07:52:48 [omni.py:290] [Orchestrator] Stage-0 reported ready INFO 01-05 07:52:48 [omni.py:316] [Orchestrator] All stages initialized successfully ============================================================ Generation Configuration: Model: /workspace/dev/vipdev/hf_models/LongCat-Image Inference steps: 50 Cache backend: None (no acceleration) Parallel configuration: ulysses_degree=1, ring_degree=1 Image size: 1024x1024 ============================================================ Adding requests: 0%| | 0/1 [00:00<?, ?it/s[Stage-0] INFO 01-05 07:52:48 [omni_diffusion.py:112] Prepared 1 requests for generation. | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 unit/s, output: 0.00 unit/s] You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. [Stage-0] INFO 01-05 07:53:31 [diffusion_engine.py:86] Generation completed successfully. [Stage-0] INFO 01-05 07:53:31 [diffusion_engine.py:109] Post-processing completed in 0.0922 seconds INFO 01-05 07:53:31 [log_utils.py:549] {'type': 'request_level_metrics', INFO 01-05 07:53:31 [log_utils.py:549] 'request_id': '0_2dc245b9-ac08-43d9-ae97-23e33938d612', INFO 01-05 07:53:31 [log_utils.py:549] 'e2e_time_ms': 42394.21105384827, INFO 01-05 07:53:31 [log_utils.py:549] 'e2e_tpt': 0.0, INFO 01-05 07:53:31 [log_utils.py:549] 'e2e_total_tokens': 0, INFO 01-05 07:53:31 [log_utils.py:549] 'transfers_total_time_ms': 0.0, INFO 01-05 07:53:31 [log_utils.py:549] 'transfers_total_bytes': 0, INFO 01-05 07:53:31 [log_utils.py:549] 'stages': {0: {'stage_gen_time_ms': 42355.54218292236, INFO 01-05 07:53:31 [log_utils.py:549] 'num_tokens_out': 0, INFO 01-05 07:53:31 [log_utils.py:549] 'num_tokens_in': 0}}} Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:42<00:00, 42.39s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms] INFO 01-05 07:53:31 [omni.py:711] [Summary] {'e2e_requests': 1,████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:42<00:00, 42.39s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms] INFO 01-05 07:53:31 [omni.py:711] 'e2e_total_time_ms': 42396.44718170166, INFO 01-05 07:53:31 [omni.py:711] 'e2e_sum_time_ms': 42394.21105384827, INFO 01-05 07:53:31 [omni.py:711] 'e2e_total_tokens': 0, INFO 01-05 07:53:31 [omni.py:711] 'e2e_avg_time_per_request_ms': 42394.21105384827, INFO 01-05 07:53:31 [omni.py:711] 'e2e_avg_tokens_per_s': 0.0, INFO 01-05 07:53:31 [omni.py:711] 'wall_time_ms': 42396.44718170166, INFO 01-05 07:53:31 [omni.py:711] 'final_stage_id': {'0_2dc245b9-ac08-43d9-ae97-23e33938d612': 0}, INFO 01-05 07:53:31 [omni.py:711] 'stages': [{'stage_id': 0, INFO 01-05 07:53:31 [omni.py:711] 'requests': 1, INFO 01-05 07:53:31 [omni.py:711] 'tokens': 0, INFO 01-05 07:53:31 [omni.py:711] 'total_time_ms': 42394.585609436035, INFO 01-05 07:53:31 [omni.py:711] 'avg_time_per_request_ms': 42394.585609436035, INFO 01-05 07:53:31 [omni.py:711] 'avg_tokens_per_s': 0.0}], INFO 01-05 07:53:31 [omni.py:711] 'transfers': []} Adding requests: 0%| | 0/1 [00:42<?, ?it/s] [Stage-0] INFO 01-05 07:53:31 [omni_stage.py:636] Received shutdown signal [Stage-0] INFO 01-05 07:53:31 [gpu_worker.py:265] Worker 0: Received shutdown message [Stage-0] INFO 01-05 07:53:31 [gpu_worker.py:287] event loop terminated. [Stage-0] INFO 01-05 07:53:31 [gpu_worker.py:318] Worker 0: Shutdown complete. Total generation time: 45.9301 seconds (45930.09 ms) INFO 01-05 07:53:34 [text_to_image.py:168] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[OmniRequestOutput(request_id='0_2dc245b9-ac08-43d9-ae97-23e33938d612', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt="Add a white art board written with colorful text 'vLLM-Omni' on grassland. Add a paintbrush in the bear's hands. position the bear standing in front of the art board as if painting", latents=None, metrics={})], images=[], prompt=None, latents=None, metrics={})] Saved generated image to output_image_edit_nocache.pnglongcat-image-edit w/ cache-dit: 80.8s
longcat-image-edit w/o cache-dit: 196.4s