-
Notifications
You must be signed in to change notification settings - Fork 462
Quartus Streaming Conv, Pooling & Image layers #656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quartus Streaming Conv, Pooling & Image layers #656
Conversation
pytest.activations is failing:
Can you see why? |
This was addressed in a PR #655 that was already merged. It comes from the fact that the parallel Softsign was optimised in #585, by removing unnecessary values in the LUT but required changes in logic. |
It generally looks good to me so I approved it. I sort of wanted to trigger the pytests again, but couldn't figure out how. |
I can merge it later today unless someone wants to check more. |
I need some more time to go through this. |
test/pytest/test_cnn_mnist.py
Outdated
]) | ||
def test_mnist_cnn(keras_model, mnist_data, backend, io_type, strategy): | ||
x_train, y_train, x_test, y_test = mnist_data | ||
|
||
hls_config = hls4ml.utils.config_from_keras_model(keras_model, granularity='name') | ||
hls_config = hls4ml.utils.config_from_keras_model(keras_model, granularity='name', default_precision='ap_fixed<32, 9>') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, why do streaming implementations fail without this change and io_parallel ones don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's odd, was it a one-off failed test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran all combinations twice, all io_parallel passed, all io_stream failed without this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize this issue, after a lot of debugging, turns out this change hides 4 separate bugs:
- Vivado io_stream implementation of AveragePooling doesn't use a wider type to compute the sum needed in average like the io_parallel implementation does, hence the test fails with default settings. This will be addressed with a bit-exact flow in the future, but for now a simple tweak with
accum_t
would be enough for both io_stream implementations but... - Line-buffer implementation doesn't use
accum_t
for the sum, ratherdata_T
. - In Quartus backend, after changes from Softmax LUT Optimization #570 the
exp()
lookup table stores only negative values, but since the no saturation bits are used in io_parallel implementation of stable softmax strategy (that's a mouthful), we can pass the positive value (negative value wrapped around) and get incorrect results. This doesn't fail this test, but is incorrect nevertheless. - In io_stream implementation of stable softmax strategy, the saturation bits are used, so the most negative value (e.g,
0b1100000.0000
in binary) gets sliced to 0, andexp(0) = 1
, so this throws off the result, hence the drop in accuracy and the test fails.
3 and 4. are addressed in this PR, I'll create a new one for 1. and 2. so as to not pollute this PR. Once that is in, we can remove this change and finally merge this PR.
@jmitrevs All the issues have been resolved. Do you want to take another pass at this or we merge it? |
Using a slightly older branch, I noticed that in a project I created the using stream definition is in both defines.h and nnet_helpers.h. Is that still the case and needed? (I was hacking the definition in one and I got an error that the two definitions didn't match. |
I removed the definitions from The only issue remaining with this PR is that occasionally the padding routines don't work with a cryptic error from the compiler: |
Just for completeness, this alternate unoptimized 1d padding implementation does not suffer the error: template<class data_T, class res_T, typename CONFIG_T>
void zeropad1d_cl(stream<data_T> &data, stream<res_T> &res) {
res_T res_array[CONFIG_T::out_width];
ZeroOutputArray:
for (int i = 0; i < CONFIG_T::out_width; i++) {
for (int j = 0; j < CONFIG_T::n_chan; j++) {
res_array[i][j] = 0;
}
}
CopyMain:
for (int i = 0; i < CONFIG_T::in_width; i++) {
auto dataval = data.read();
for (int j = 0; j < CONFIG_T::n_chan; j++) {
res_array[i+CONFIG_T::pad_left][j] = dataval[j];
}
}
StreamOut:
for (int i = 0; i < CONFIG_T::out_width; i++) {
res.write(res_array[i]);
}
} Nevertheless, why what we have fails is not clear to me. I'll leave some time for comments, but if no one objects, we can merge this weekend. |
…g-conv There were no comments to not merge, so I'll go ahead and merge.
Description
Type of change
Tests
All of the existing tests were expanded to include tests for Quartus in
io_stream
. No new tests were written. A summary of the tests is given below.Synthesis results
Below are results obtained through full Quartus synthesis of Conv2D layers for a fixed input (32x32x3) when varying the number of filters and the reuse factors. Other layers were tested for correct synthesis.
Checklist