Skip to content

Conversation

@mdvoretc-intel
Copy link
Contributor

Details:

  • The change allows parameters to be recognized alongside constants as valid weight inputs for transformations producing FullyConnectedCompressed nodes

Description of the issue:

At present, the FC_COMPRESSED_WEIGHT_PATTERN macro contains a pattern for dequantization of a constant integer weight. This pattern is used to recognize and fold cases where fused weight dequantization can be used, replacing them with FullyConnectedCompressed nodes. Due to expecting a constant weight input, this pattern fails to recognize quantized LoRA weights, which are provided as parameters:
fc_compressed_param_before
With the changes in this patch, these weights can be recognized, and the transformations can proceed and produce nodes that would then leverage oneDNN fused QGEMM for execution:
fc_compressed_param_after

Tickets:

@github-actions github-actions bot added category: GPU OpenVINO GPU plugin category: transformations OpenVINO Runtime library - Transformations labels Oct 2, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Oct 2, 2025
@mdvoretc-intel
Copy link
Contributor Author

build_jenkins

@mdvoretc-intel mdvoretc-intel marked this pull request as ready for review October 29, 2025 11:36
@mdvoretc-intel mdvoretc-intel requested review from a team as code owners October 29, 2025 11:36
@mdvoretc-intel mdvoretc-intel requested review from CuriousPanCake and removed request for a team October 29, 2025 11:36
@mdvoretc-intel
Copy link
Contributor Author

build_jenkins

@mdvoretc-intel mdvoretc-intel force-pushed the param_quant_weight branch 2 times, most recently from b82b902 to 476f80f Compare October 30, 2025 09:25
Copy link
Contributor

@mklimenk mklimenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two branches of the if (pattern_map.count(weights_const_m)) { condition share a lot of similarities, please consider refactoring it to avoid code duplication

This change enables use of quantized LoRA weights, passed as parameters during
execution, to be recognized by the transformaions that produce
FullyConnectedCompressed nodes for QGEMM execution.
The test previously expected the transformation to fail due to the use of input2
as a weight. The new logic allows use of parameters as weights, so the test has
been adjusted to expect a successful transformation.
Copy link
Contributor

@mklimenk mklimenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much cleaner now, thanks!

@mdvoretc-intel
Copy link
Contributor Author

@CuriousPanCake please review.

@github-actions github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Nov 11, 2025
@mdvoretc-intel
Copy link
Contributor Author

@CuriousPanCake please review.

@CuriousPanCake
Copy link
Contributor

@Lyamin-Roman please take a look

This has required addressing a previously not covered use case where a reshape
follows float conversion, as well as adding logic for reshaping the param weight
if the reshape covers the entire decompression pattern.
@mdvoretc-intel
Copy link
Contributor Author

Consider add new tests

Tests added.

@mdvoretc-intel
Copy link
Contributor Author

@Lyamin-Roman please review.

}
}

TEST_F(TransformationTestsF, ConvertFCToCompressed11) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[random spot] Please add functional accuracy tests
It looks like you can extend an existing test with additional parameters
src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/matmul_weights_decompression.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted this. The tests have a singular parameter for input precision which prevents proper creation of weight parameters in compressed types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests added with a configure_model() override to ensure weight parameters are provided in appropriate type. Tests currently fail due to the lack of u4 transposition support, WIP to fix.

@Lyamin-Roman Lyamin-Roman self-requested a review November 28, 2025 16:10
@mryzhov mryzhov self-assigned this Dec 1, 2025
Copy link
Contributor

@mryzhov mryzhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the requested changes

@p-durandin
Copy link
Contributor

build_jenkins

@susbhere
Copy link
Contributor

susbhere commented Dec 6, 2025

build_jenkins

1 similar comment
@p-durandin
Copy link
Contributor

build_jenkins

void configure_model() override {
ov::preprocess::PrePostProcessor p(function);
{
auto& params = function->get_parameters();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this unused variable, it causes build to fail.
@ZackyLake

@github-actions
Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Dec 29, 2025
@github-actions github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Jan 6, 2026
@github-actions github-actions bot removed the category: Core OpenVINO Core (aka ngraph) label Jan 8, 2026
@mdvoretc-intel
Copy link
Contributor Author

@mryzhov All requested changes have been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin ExternalIntelPR External contributor from Intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants