Quartus Streaming Conv, Pooling & Image layers #656

bo3z · 2022-09-20T15:48:10Z

Description

Adds support for image-related layers (Conv 1D & 2D, Avg & Max Pooling, Global Pooling, Zero Padding, Upsampling) in io_stream in a similar manner to Vivado

Conv 1D & 2D implemented using line buffer, similar to Vivado. Main difference is in the implementation of padding for Conv layers - Vivado inserts a padding layer; Quartus performs padding in the Conv layer. This approach stays in line with the Keras model graph and the total number of layers.

Same padding is not supported for Pooling layers.

Written a custom struct to act as a shift register in hardware (Intel HLS does not offer an out-of-the-box shift register). However, any struct with a similar implementation (and meeting certain time / loop requirements) will be synthesised as a shift register. This can be verified by viewing the synthesis report in report.html > Area Analysis of System

Upsampling and Zero Padding layers written in a largely similar way to Vivado

Resource usage and latency results coming soon.

Transpose layer to be added soon.

Bug fix introduced by PR Parallel CNNs, Pooling & Image Layers for Quartus Backend #561 for parallel transpose layers

It is recommended to review this PR commit by commit, as each commit adds a single piece of functionality, is self-contained and the project can be compiled individually

Type of change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change which adds functionality)

Tests

All of the existing tests were expanded to include tests for Quartus in io_stream. No new tests were written. A summary of the tests is given below.

test_keras_api.py - Ensures correct parsing of the layers in io_stream and correct syntax (no compilation errors) of Conv 1D & Conv 2D layers.

test_cnn_mnist.py, test_cnn_mnist_qkeras.py, test_conv1d.py - Verify the numerical accuracy and compilation of Conv 1D, Conv 2D, Max & Avg Pooling layers.

test_upsampling.py and test_zeropadding.py - Ensures numerical accuracy and successful compilation of Zero Padding and Upsampling layers.

test_globalpooling.py Ensures numerical accuracy and successful compilation of Global Pooling layers.

Synthesis results

Below are results obtained through full Quartus synthesis of Conv2D layers for a fixed input (32x32x3) when varying the number of filters and the reuse factors. Other layers were tested for correct synthesis.

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have added tests that prove my fix is effective or that my feature works.B

jmitrevs · 2022-10-05T14:42:42Z

pytest.activations is failing:

E       AssertionError: 
E       Not equal to tolerance rtol=0.02, atol=0.02
E       
E       Mismatched elements: 8000 / 8000 (100%)
E       Max absolute difference: 1.12238881
E       Max relative difference: 8914.97600568
E        x: array([[0.793945, 0.791992, 0.798828, ..., 0.804688, 0.791016, 0.799805],
E              [0.791016, 0.802734, 0.804688, ..., 0.799805, 0.799805, 0.794922],
E              [0.795898, 0.808594, 0.803711, ..., 0.793945, 0.796875, 0.801758],...
E        y: array([[-0.227973, -0.279667, -0.045713, ...,  0.226889, -0.28958 ,
E                0.031885],
E              [-0.292061,  0.154492,  0.214236, ...,  0.041079, -0.003215,...
test_activations.py:55: AssertionError

Can you see why?

bo3z · 2022-10-07T10:42:46Z

pytest.activations is failing:

E       AssertionError: 
E       Not equal to tolerance rtol=0.02, atol=0.02
E       
E       Mismatched elements: 8000 / 8000 (100%)
E       Max absolute difference: 1.12238881
E       Max relative difference: 8914.97600568
E        x: array([[0.793945, 0.791992, 0.798828, ..., 0.804688, 0.791016, 0.799805],
E              [0.791016, 0.802734, 0.804688, ..., 0.799805, 0.799805, 0.794922],
E              [0.795898, 0.808594, 0.803711, ..., 0.793945, 0.796875, 0.801758],...
E        y: array([[-0.227973, -0.279667, -0.045713, ...,  0.226889, -0.28958 ,
E                0.031885],
E              [-0.292061,  0.154492,  0.214236, ...,  0.041079, -0.003215,...
test_activations.py:55: AssertionError

Can you see why?

This was addressed in a PR #655 that was already merged. It comes from the fact that the parallel Softsign was optimised in #585, by removing unnecessary values in the LUT but required changes in logic.

jmitrevs · 2022-10-10T18:05:16Z

It generally looks good to me so I approved it. I sort of wanted to trigger the pytests again, but couldn't figure out how.

jmitrevs · 2022-10-10T18:07:18Z

I can merge it later today unless someone wants to check more.

vloncar · 2022-10-10T18:08:24Z

I need some more time to go through this.

hls4ml/templates/quartus/firmware/nnet_utils/nnet_padding_stream.h

fix order of label and loop_coalesce

test/pytest/test_keras_api.py

vloncar · 2022-11-02T10:27:28Z

test/pytest/test_cnn_mnist.py

                                    ])
 def test_mnist_cnn(keras_model, mnist_data, backend, io_type, strategy):
  x_train, y_train, x_test, y_test = mnist_data

-  hls_config = hls4ml.utils.config_from_keras_model(keras_model, granularity='name')     
+  hls_config = hls4ml.utils.config_from_keras_model(keras_model, granularity='name', default_precision='ap_fixed<32, 9>')     


Hmmm, why do streaming implementations fail without this change and io_parallel ones don't?

That's odd, was it a one-off failed test?

I ran all combinations twice, all io_parallel passed, all io_stream failed without this change.

To summarize this issue, after a lot of debugging, turns out this change hides 4 separate bugs:

Vivado io_stream implementation of AveragePooling doesn't use a wider type to compute the sum needed in average like the io_parallel implementation does, hence the test fails with default settings. This will be addressed with a bit-exact flow in the future, but for now a simple tweak with accum_t would be enough for both io_stream implementations but...

Line-buffer implementation doesn't use accum_t for the sum, rather data_T.

In Quartus backend, after changes from Softmax LUT Optimization #570 the exp() lookup table stores only negative values, but since the no saturation bits are used in io_parallel implementation of stable softmax strategy (that's a mouthful), we can pass the positive value (negative value wrapped around) and get incorrect results. This doesn't fail this test, but is incorrect nevertheless.

In io_stream implementation of stable softmax strategy, the saturation bits are used, so the most negative value (e.g, 0b1100000.0000 in binary) gets sliced to 0, and exp(0) = 1, so this throws off the result, hence the drop in accuracy and the test fails.

3 and 4. are addressed in this PR, I'll create a new one for 1. and 2. so as to not pollute this PR. Once that is in, we can remove this change and finally merge this PR.

vloncar · 2022-11-09T23:04:48Z

@jmitrevs All the issues have been resolved. Do you want to take another pass at this or we merge it?

jmitrevs · 2022-11-10T18:25:23Z

Using a slightly older branch, I noticed that in a project I created the using stream definition is in both defines.h and nnet_helpers.h. Is that still the case and needed? (I was hacking the definition in one and I got an error that the two definitions didn't match.

vloncar · 2022-11-12T00:02:56Z

I removed the definitions from nnet_helpers.h. All tests (python compile, make and quartus compile) pass.

The only issue remaining with this PR is that occasionally the padding routines don't work with a cryptic error from the compiler: Compiler Error: Multiple reflexive accesses from stream 'layer2_out' is not allowed. This happens for ZeroPadding1D/2D and Conv1D/2D (with same padding) under certain scenarios. This still needs some understanding, potentially with help from Intel, so I wouldn't block the merge of this just because of that. @jmitrevs?

jmitrevs · 2022-11-12T01:04:44Z

Just for completeness, this alternate unoptimized 1d padding implementation does not suffer the error:

template<class data_T, class res_T, typename CONFIG_T>
void zeropad1d_cl(stream<data_T> &data, stream<res_T> &res) {

    res_T res_array[CONFIG_T::out_width];

    ZeroOutputArray:
    for (int i = 0; i < CONFIG_T::out_width; i++) {
        for (int j = 0; j < CONFIG_T::n_chan; j++) {
            res_array[i][j] = 0;
        }
    }

    CopyMain:
    for (int i = 0; i < CONFIG_T::in_width; i++) {
        auto dataval = data.read();
        for (int j = 0; j < CONFIG_T::n_chan; j++) {
            res_array[i+CONFIG_T::pad_left][j] = dataval[j];
        }
    }

    StreamOut:
    for (int i = 0; i < CONFIG_T::out_width; i++) {
        res.write(res_array[i]);
    }
}

Nevertheless, why what we have fails is not clear to me. I'll leave some time for comments, but if no one objects, we can merge this weekend.

…g-conv There were no comments to not merge, so I'll go ahead and merge.

bo3z added 8 commits September 20, 2022 13:25

Quartus Shift Register Struct

499a221

Quartus Streaming 1D / 2D Conv

45a875c

Quartus Streaming Pooling 1D / 2D

ea3a48c

Quartus - avoid overwritting of precision in Writer (InPlace Variables)

caf3bb3

Quartus Streaming Conv & Pooling Tests

94f34e0

Quartus Streaming Zero Padding

f480ad1

Quartus Streaming Upsampling

86e5219

Quartus fix parallel transpose

afe099c

bo3z marked this pull request as draft September 20, 2022 15:48

bo3z added 2 commits September 22, 2022 16:19

Quartus Streaming Global Pooling

99a2627

Quartus Streaming Transpose

33a2698

bo3z marked this pull request as ready for review September 23, 2022 14:37

Fix loop pipeline fail in Io_stream

33f6008

bo3z requested review from vloncar and jmitrevs September 23, 2022 14:44

jmitrevs approved these changes Oct 10, 2022

View reviewed changes

jmitrevs requested changes Oct 18, 2022

View reviewed changes

hls4ml/templates/quartus/firmware/nnet_utils/nnet_padding_stream.h Show resolved Hide resolved

hls4ml/templates/quartus/firmware/nnet_utils/nnet_padding_stream.h Show resolved Hide resolved

jmitrevs and others added 4 commits October 25, 2022 17:50

fix order of label and loop_coalesce

0dbafe2

Merge pull request #1 from jmitrevs/jmitrevs-quartus-streaming-conv

74770c1

fix order of label and loop_coalesce

Merge remote-tracking branch 'upstream/main' into quartus_streaming_conv

c95a788

Some minor cleanups

b578582

vloncar reviewed Nov 2, 2022

View reviewed changes

test/pytest/test_keras_api.py Show resolved Hide resolved

vloncar reviewed Nov 2, 2022

View reviewed changes

vloncar added 2 commits November 7, 2022 19:28

Tracing support for streaming IO in Quartus

97e300a

Fix stable softmax strategy in Quartus

6fd7f56

vloncar mentioned this pull request Nov 8, 2022

Use wider accum_t for (average) pooling #681

Merged

5 tasks

Merge branch 'main' into quartus_streaming_conv

50c5a3b

vloncar added 2 commits November 10, 2022 21:02

Fix incorrect pragma in stable softmax

8c71800

Remove stream definitions from nnet_helpers.h

4db0002

jmitrevs approved these changes Nov 12, 2022

View reviewed changes

jmitrevs merged commit e4a5988 into fastmachinelearning:main Nov 14, 2022

calad0i pushed a commit to calad0i/hls4ml that referenced this pull request Jul 1, 2023

Merge pull request fastmachinelearning#656 from bo3z/quartus-streamin…

12c2cdb

…g-conv There were no comments to not merge, so I'll go ahead and merge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quartus Streaming Conv, Pooling & Image layers #656

Quartus Streaming Conv, Pooling & Image layers #656

Uh oh!

bo3z commented Sep 20, 2022 •

edited

Loading

Uh oh!

jmitrevs commented Oct 5, 2022

Uh oh!

bo3z commented Oct 7, 2022

Uh oh!

jmitrevs commented Oct 10, 2022

Uh oh!

jmitrevs commented Oct 10, 2022

Uh oh!

vloncar commented Oct 10, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vloncar Nov 2, 2022

Uh oh!

bo3z Nov 2, 2022

Uh oh!

vloncar Nov 2, 2022

Uh oh!

vloncar Nov 7, 2022

Uh oh!

vloncar commented Nov 9, 2022

Uh oh!

jmitrevs commented Nov 10, 2022

Uh oh!

vloncar commented Nov 12, 2022

Uh oh!

jmitrevs commented Nov 12, 2022

Uh oh!

Uh oh!

Quartus Streaming Conv, Pooling & Image layers #656

Quartus Streaming Conv, Pooling & Image layers #656

Uh oh!

Conversation

bo3z commented Sep 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tests

Synthesis results

Checklist

Uh oh!

jmitrevs commented Oct 5, 2022

Uh oh!

bo3z commented Oct 7, 2022

Uh oh!

jmitrevs commented Oct 10, 2022

Uh oh!

jmitrevs commented Oct 10, 2022

Uh oh!

vloncar commented Oct 10, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vloncar Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

bo3z Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

vloncar Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

vloncar Nov 7, 2022

Choose a reason for hiding this comment

Uh oh!

vloncar commented Nov 9, 2022

Uh oh!

jmitrevs commented Nov 10, 2022

Uh oh!

vloncar commented Nov 12, 2022

Uh oh!

jmitrevs commented Nov 12, 2022

Uh oh!

Uh oh!

bo3z commented Sep 20, 2022 •

edited

Loading