Skip to content

[Hotfix] solve fp8 w8a8 ci test fail #4531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 18, 2025
Merged

[Hotfix] solve fp8 w8a8 ci test fail #4531

merged 6 commits into from
Mar 18, 2025

Conversation

BBuf
Copy link
Collaborator

@BBuf BBuf commented Mar 18, 2025

Motivation

Modifications

Checklist

@zhyncs zhyncs merged commit dd865be into main Mar 18, 2025
3 of 20 checks passed
@zhyncs zhyncs deleted the fix_fp8_w8a8_ci_test branch March 18, 2025 06:17
@qeternity
Copy link
Contributor

This commit has broken loading of older Marlin packed models.

KeyError: 'model.layers.0.mlp.gate_up_proj.B'

Looking into it now.

@qeternity
Copy link
Contributor

qeternity commented Mar 22, 2025

Ok so this is actually related to the deprecation of the SGLang types, which now correctly passes the check_marlin_supported check in GPTQMarlinConfig. Early Marlin reference code set the model config quant method to gptq with flags like is_marlin_format. So now that we are using vllm.scalar_type.ScalarType instead of sglang.srt.layers.quantization.utils.ScalarType the type check passes, and causes this error (previously the type check failed, forcing the Marlin config usage).

Changing the model config to marlin quant method resolves this.

@qeternity
Copy link
Contributor

Not sure if we want to fix this, or just have people change older configs, but PR here: #4675. Feel free to close it if out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants