[Performance] Solve high memory usage issue during model compilation using OpenVINO backend on Keras 3 #31482

mvafin · 2025-08-11T10:58:58Z

Please add a better comment before which transformation it should be called

mvafin · 2025-08-08T13:18:16Z

I don't think this is a good way to fix this. Doing this in MOC means we will have decomposed einsum in IR.

As I understand this is really needed only for einsum that have constant inputs to constant fold it before reaching plugin. Can we do it differently? Maybe modify this transformation to work only on constant inputs for offline step? @CuriousPanCake

@mvafin
I updated it to check if at least one of the inputs is a constant, and it worked too.

from:

================================================================================ FIXED MEMORY TEST: KERAS GPT2 + OPENVINO ================================================================================ [STAGE] 0_INITIAL: 775.24 MB (swap: 0.00 MB) - Initial state after imports >>> Loading GPT2 model from preset... [STAGE] 1_MODEL_LOADED: 2314.67 MB (swap: 0.00 MB) - gpt2_medium_en model loaded (10.0s) [STAGE] 2_BEFORE_INFERENCE: 2314.67 MB (swap: 0.00 MB) - Before first inference >>> Running first inference (compilation + execution)... ⏳ Converting Keras -> OPENVINO and compiling... [STAGE] 3_FIRST_INFERENCE: 4512.82 MB (swap: 0.00 MB) - First inference completed via generate (7.7s) >>> Second inference (no compilation)... [STAGE] 4_SECOND_INFERENCE: 4510.38 MB (swap: 0.00 MB) - Second inference (2.0s) [STAGE] 5_FINAL: 4510.38 MB (swap: 0.00 MB) - Final state ================================================================================ PERFORMANCE RESULTS ================================================================================ ✅ Generated text: 'Hello everyone, We've been busy' ✅ Second generation: 'Testimony before the House Judiciary Committee on April' Backend: openvino First inference latency: 7.69s Second inference latency: 2.045s Throughput: 0.65 tokens/sec Speedup: 3.8x 📊 DETAILED MEMORY ANALYSIS: +---------------------+------------+-------------+--------------+---------------+ | STAGE | RAM (MB) | SWAP (MB) | RAM CHANGE | SWAP CHANGE | +=====================+============+=============+==============+===============+ | Initial | 775.2 | 0 | - | - | +---------------------+------------+-------------+--------------+---------------+ | After model load | 2314.7 | 0 | +1539.4 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ | Before inference | 2314.7 | 0 | +0.0 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ | After 1st inference | 4512.8 | 0 | +2198.1 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ | After 2nd inference | 4510.4 | 0 | -2.4 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ | Final | 4510.4 | 0 | +0.0 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ | Peak recorded | 4522.9 | 0 | +3747.7 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ 🔍 MAIN MEMORY CONSUMERS: 📚 Model loading: +1539.4 MB RAM +0.0 MB swap (41.2% of total) ⚡ Compilation/inference: +2198.1 MB RAM +0.0 MB swap (58.9% of total) 📈 SUMMARY: 💾 Total RAM growth: +3735.1 MB 💿 Total swap change: +0.0 MB 📊 Peak RAM consumption: +3747.7 MB above initial 🔥 Highest RAM recorded: 4522.9 MB 💿 Peak swap consumption: +0.0 MB above initial 🔥 Highest swap recorded: 0.0 MB 🎯 MEMORY HEALTH CHECK: ❌ CRITICAL: RAM usage 3748 MB is very high (target <1GB) ✅ GOOD: Low peak swap usage 0 MB 🚨 ALERT: Combined memory impact 4523 MB is very high 🎯 Test completed: {'success': True, 'model_loading_mb': 1539.4296875, 'compilation_mb': 2198.1484375, 'total_mb': 3735.13671875, 'peak_mb': 3747.6640625, 'peak_swap_mb': 0.0}

to

[STAGE] 0_INITIAL: 781.90 MB (swap: 0.00 MB) - Initial state after imports >>> Loading GPT2 model from preset... [STAGE] 1_MODEL_LOADED: 2321.91 MB (swap: 0.00 MB) - gpt2_medium_en model loaded (13.4s) [STAGE] 2_BEFORE_INFERENCE: 2321.91 MB (swap: 0.00 MB) - Before first inference >>> Running first inference (compilation + execution)... ⏳ Converting Keras -> OPENVINO and compiling... [STAGE] 3_FIRST_INFERENCE: 3548.79 MB (swap: 0.00 MB) - First inference completed via generate (7.6s) >>> Second inference (no compilation)... [STAGE] 4_SECOND_INFERENCE: 3546.42 MB (swap: 0.00 MB) - Second inference (2.7s) [STAGE] 5_FINAL: 3546.42 MB (swap: 0.00 MB) - Final state ================================================================================ PERFORMANCE RESULTS ================================================================================ ✅ Generated text: 'Hello! I'm a student studying computer programming' ✅ Second generation: 'Testimonials I was a new' Backend: openvino First inference latency: 7.62s Second inference latency: 2.673s Throughput: 0.92 tokens/sec Speedup: 2.9x 📊 DETAILED MEMORY ANALYSIS: +---------------------+------------+-------------+--------------+---------------+ | STAGE | RAM (MB) | SWAP (MB) | RAM CHANGE | SWAP CHANGE | +=====================+============+=============+==============+===============+ | Initial | 781.9 | 0 | - | - | +---------------------+------------+-------------+--------------+---------------+ | After model load | 2321.9 | 0 | +1540.0 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ | Before inference | 2321.9 | 0 | +0.0 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ | After 1st inference | 3548.8 | 0 | +1226.9 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ | After 2nd inference | 3546.4 | 0 | -2.4 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ | Final | 3546.4 | 0 | +0.0 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ | Peak recorded | 3567.8 | 0 | +2785.9 | +0.0 | +---------------------+------------+-------------+--------------+---------------+ 🔍 MAIN MEMORY CONSUMERS: 📚 Model loading: +1540.0 MB RAM +0.0 MB swap (55.7% of total) ⚡ Compilation/inference: +1226.9 MB RAM +0.0 MB swap (44.4% of total) 📈 SUMMARY: 💾 Total RAM growth: +2764.5 MB 💿 Total swap change: +0.0 MB 📊 Peak RAM consumption: +2785.9 MB above initial 🔥 Highest RAM recorded: 3567.8 MB 💿 Peak swap consumption: +0.0 MB above initial 🔥 Highest swap recorded: 0.0 MB 🎯 MEMORY HEALTH CHECK: ❌ CRITICAL: RAM usage 2786 MB is very high (target <1GB) ✅ GOOD: Low peak swap usage 0 MB 🎯 Test completed: {'success': True, 'model_loading_mb': 1540.0078125, 'compilation_mb': 1226.88671875, 'total_mb': 2764.5234375, 'peak_mb': 2785.86328125, 'peak_swap_mb': 0.0}

-Original file line number
+Diff line change
@@ Expand Up / @@ -90,6 +90,7 @@ @@
     #include "transformations/op_conversions/convert_scatter_elements_to_scatter.hpp"
     #include "transformations/op_conversions/convert_subtract.hpp"
     #include "transformations/op_conversions/convert_ti_to_sequences.hpp"
+    #include "transformations/op_conversions/einsum_decomposition.hpp"
     #include "transformations/resolve_names_collisions.hpp"
     #include "transformations/smart_reshape/lstm_states_broadcast.hpp"
     #include "transformations/smart_reshape/matmul_sr.hpp"
@@ Expand Down Expand Up @@
         REGISTER_PASS(manager, PushConstantToSubgraph)
         REGISTER_PASS(manager, ConstantFolding)
         REGISTER_PASS(manager, Validate)
+        // the order is important
+        const char* enable_einsum = std::getenv("OV_ENABLE_EINSUM_DECOMPOSITION");
+        if (enable_einsum) {
+            REGISTER_PASS(manager, EinsumDecomposition)
+        }
         // FusedFilteringBoxesBySize transformation has the complex pattern
         // which can be affected by further transformations. So we have to
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] Solve high memory usage issue during model compilation using OpenVINO backend on Keras 3 #31482

Uh oh!

Diff view

Diff view

There are no files selected for viewing

mvafin Aug 11, 2025

Uh oh!

Mohamed-Ashraf273 Aug 11, 2025

Uh oh!

mvafin Aug 8, 2025

Uh oh!

Mohamed-Ashraf273 Aug 8, 2025

Uh oh!

Mohamed-Ashraf273 Aug 8, 2025

Uh oh!

Uh oh!

[Performance] Solve high memory usage issue during model compilation using OpenVINO backend on Keras 3 #31482

Uh oh!

[Performance] Solve high memory usage issue during model compilation using OpenVINO backend on Keras 3 #31482

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

mvafin Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Mohamed-Ashraf273 Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

mvafin Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Mohamed-Ashraf273 Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Mohamed-Ashraf273 Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!