Skip to content
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
#include "transformations/op_conversions/convert_scatter_elements_to_scatter.hpp"
#include "transformations/op_conversions/convert_subtract.hpp"
#include "transformations/op_conversions/convert_ti_to_sequences.hpp"
#include "transformations/op_conversions/einsum_decomposition.hpp"
#include "transformations/resolve_names_collisions.hpp"
#include "transformations/smart_reshape/lstm_states_broadcast.hpp"
#include "transformations/smart_reshape/matmul_sr.hpp"
Expand Down Expand Up @@ -163,6 +164,11 @@ bool ov::pass::MOCTransformations::run_on_model(const std::shared_ptr<ov::Model>
REGISTER_PASS(manager, PushConstantToSubgraph)
REGISTER_PASS(manager, ConstantFolding)
REGISTER_PASS(manager, Validate)
// the order is important
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a better comment before which transformation it should be called

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

const char* enable_einsum = std::getenv("OV_ENABLE_EINSUM_DECOMPOSITION");
if (enable_einsum) {
REGISTER_PASS(manager, EinsumDecomposition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good way to fix this. Doing this in MOC means we will have decomposed einsum in IR.

As I understand this is really needed only for einsum that have constant inputs to constant fold it before reaching plugin. Can we do it differently? Maybe modify this transformation to work only on constant inputs for offline step? @CuriousPanCake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvafin
I updated it to check if at least one of the inputs is a constant, and it worked too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from:

================================================================================
FIXED MEMORY TEST: KERAS GPT2 + OPENVINO
================================================================================
[STAGE] 0_INITIAL: 775.24 MB (swap: 0.00 MB) - Initial state after imports

>>> Loading GPT2 model from preset...
[STAGE] 1_MODEL_LOADED: 2314.67 MB (swap: 0.00 MB) - gpt2_medium_en model loaded (10.0s)
[STAGE] 2_BEFORE_INFERENCE: 2314.67 MB (swap: 0.00 MB) - Before first inference

>>> Running first inference (compilation + execution)...
    ⏳ Converting Keras -> OPENVINO and compiling...
[STAGE] 3_FIRST_INFERENCE: 4512.82 MB (swap: 0.00 MB) - First inference completed via generate (7.7s)

>>> Second inference (no compilation)...
[STAGE] 4_SECOND_INFERENCE: 4510.38 MB (swap: 0.00 MB) - Second inference (2.0s)
[STAGE] 5_FINAL: 4510.38 MB (swap: 0.00 MB) - Final state

================================================================================
PERFORMANCE RESULTS
================================================================================
✅ Generated text: 'Hello everyone,

We've been busy'
✅ Second generation: 'Testimony before the House Judiciary Committee on April'
Backend: openvino
First inference latency: 7.69s
Second inference latency: 2.045s
Throughput: 0.65 tokens/sec
Speedup: 3.8x

📊 DETAILED MEMORY ANALYSIS:
+---------------------+------------+-------------+--------------+---------------+
| STAGE               |   RAM (MB) |   SWAP (MB) | RAM CHANGE   | SWAP CHANGE   |
+=====================+============+=============+==============+===============+
| Initial             |      775.2 |           0 | -            | -             |
+---------------------+------------+-------------+--------------+---------------+
| After model load    |     2314.7 |           0 | +1539.4      | +0.0          |
+---------------------+------------+-------------+--------------+---------------+
| Before inference    |     2314.7 |           0 | +0.0         | +0.0          |
+---------------------+------------+-------------+--------------+---------------+
| After 1st inference |     4512.8 |           0 | +2198.1      | +0.0          |
+---------------------+------------+-------------+--------------+---------------+
| After 2nd inference |     4510.4 |           0 | -2.4         | +0.0          |
+---------------------+------------+-------------+--------------+---------------+
| Final               |     4510.4 |           0 | +0.0         | +0.0          |
+---------------------+------------+-------------+--------------+---------------+
| Peak recorded       |     4522.9 |           0 | +3747.7      | +0.0          |
+---------------------+------------+-------------+--------------+---------------+

🔍 MAIN MEMORY CONSUMERS:
   📚 Model loading:         +1539.4 MB RAM      +0.0 MB swap  (41.2% of total)
   ⚡ Compilation/inference:  +2198.1 MB RAM      +0.0 MB swap  (58.9% of total)

📈 SUMMARY:
   💾 Total RAM growth:      +3735.1 MB
   💿 Total swap change:        +0.0 MB
   📊 Peak RAM consumption:  +3747.7 MB above initial
   🔥 Highest RAM recorded: 4522.9 MB
   💿 Peak swap consumption:     +0.0 MB above initial
   🔥 Highest swap recorded: 0.0 MB

🎯 MEMORY HEALTH CHECK:
   ❌ CRITICAL: RAM usage 3748 MB is very high (target <1GB)
   ✅ GOOD: Low peak swap usage 0 MB
   🚨 ALERT: Combined memory impact 4523 MB is very high

🎯 Test completed: {'success': True, 'model_loading_mb': 1539.4296875, 'compilation_mb': 2198.1484375, 'total_mb': 3735.13671875, 'peak_mb': 3747.6640625, 'peak_swap_mb': 0.0}

to

[STAGE] 0_INITIAL: 781.90 MB (swap: 0.00 MB) - Initial state after imports
>>> Loading GPT2 model from preset...
[STAGE] 1_MODEL_LOADED: 2321.91 MB (swap: 0.00 MB) - gpt2_medium_en model loaded (13.4s)
[STAGE] 2_BEFORE_INFERENCE: 2321.91 MB (swap: 0.00 MB) - Before first inference
>>> Running first inference (compilation + execution)...
    ⏳ Converting Keras -> OPENVINO and compiling...
[STAGE] 3_FIRST_INFERENCE: 3548.79 MB (swap: 0.00 MB) - First inference completed via generate (7.6s)
>>> Second inference (no compilation)...
[STAGE] 4_SECOND_INFERENCE: 3546.42 MB (swap: 0.00 MB) - Second inference (2.7s)
[STAGE] 5_FINAL: 3546.42 MB (swap: 0.00 MB) - Final state

================================================================================
PERFORMANCE RESULTS
================================================================================
✅ Generated text: 'Hello! I'm a student studying computer programming'
✅ Second generation: 'Testimonials

I was a new'
Backend: openvino
First inference latency: 7.62s
Second inference latency: 2.673s
Throughput: 0.92 tokens/sec
Speedup: 2.9x

📊 DETAILED MEMORY ANALYSIS:
+---------------------+------------+-------------+--------------+---------------+
| STAGE               |   RAM (MB) |   SWAP (MB) | RAM CHANGE   | SWAP CHANGE   |
+=====================+============+=============+==============+===============+
| Initial             |      781.9 |           0 | -            | -             |
+---------------------+------------+-------------+--------------+---------------+
| After model load    |     2321.9 |           0 | +1540.0      | +0.0          |
+---------------------+------------+-------------+--------------+---------------+
| Before inference    |     2321.9 |           0 | +0.0         | +0.0          |
+---------------------+------------+-------------+--------------+---------------+
| After 1st inference |     3548.8 |           0 | +1226.9      | +0.0          |
+---------------------+------------+-------------+--------------+---------------+
| After 2nd inference |     3546.4 |           0 | -2.4         | +0.0          |
+---------------------+------------+-------------+--------------+---------------+
| Final               |     3546.4 |           0 | +0.0         | +0.0          |
+---------------------+------------+-------------+--------------+---------------+
| Peak recorded       |     3567.8 |           0 | +2785.9      | +0.0          |
+---------------------+------------+-------------+--------------+---------------+

🔍 MAIN MEMORY CONSUMERS:
   📚 Model loading:         +1540.0 MB RAM      +0.0 MB swap  (55.7% of total)
   ⚡ Compilation/inference:  +1226.9 MB RAM      +0.0 MB swap  (44.4% of total)

📈 SUMMARY:
   💾 Total RAM growth:      +2764.5 MB
   💿 Total swap change:        +0.0 MB
   📊 Peak RAM consumption:  +2785.9 MB above initial
   🔥 Highest RAM recorded: 3567.8 MB
   💿 Peak swap consumption:     +0.0 MB above initial
   🔥 Highest swap recorded: 0.0 MB

🎯 MEMORY HEALTH CHECK:
   ❌ CRITICAL: RAM usage 2786 MB is very high (target <1GB)
   ✅ GOOD: Low peak swap usage 0 MB

🎯 Test completed: {'success': True, 'model_loading_mb': 1540.0078125, 'compilation_mb': 1226.88671875, 'total_mb': 2764.5234375, 'peak_mb': 2785.86328125, 'peak_swap_mb': 0.0}

}

// FusedFilteringBoxesBySize transformation has the complex pattern
// which can be affected by further transformations. So we have to
Expand Down
Loading