Changes to `ingest-qonnx` #461

thesps · 2021-11-25T15:50:21Z

@jmitrevs some updates for you.

Firstly, I've pulled master branch to bring this up to date.

Secondly, the main thing I've changed is that Quant nodes don't get converted to BatchNormalization any more.

Now, a Quant node with a Constant node 0th input is replaced with a Constant. It's basically the same logic as what you previously did, but instead of the node going through transformations like Quant to BatchNormalization to Constant, it just goes Quant to Constant. I'm not 100% sure that scale and zeropt are handled properly, but I haven't changed the behaviour of that just yet.

A Quant node with something that is not a Constant node as its 0th input is replaced with an Activation (linear). If one of these Quant nodes has a scale or zeropt, an ApplyAlpha (aka BatchNormalization) is inserted to take care of that. Again, some more verification is needed that we're handling those correctly.

I've also added a test_qonnx.py with a test of the TFC_2w2a model that works locally, but not yet in CI because I think I messed up the environment. I'll fix that...

Fix batched multiple inputs

* yaml.safe_load instead of yaml.load * Use yaml.safe_load in converters __init__.py

* Update `zcu102` and `pynq-z2` `axi-stream` driver

…443) * fix 2 reshape issues: don't reshape streams for flatten and remove final reshape * Add a test for a model with Reshape as the final layer * swap * only remove for io_parallel; warn for both io_parallel and io_stream Co-authored-by: Sioni Summers <[email protected]>

… relevant test. Use 5000 MNIST samples rather than full dataset for faster testing

* Support softmax over multidimensional tensors * Style cleanup * Added axis part in keras_to_hls.py * Added some extensions to test_softmax.py but multidimensional softmax is still getting bad performances (i.e. below the one set in the assertion) * Clean up the softmax test * Make sure io_parallel softmax is not used on multi-dim input Co-authored-by: nicologhielmetti <[email protected]>

…into ingest-qonnx-thesps

jmitrevs · 2021-11-25T18:14:18Z

My reasoning for going to BatchNormalization was to make things simple, with few special cases, since you need to have a Constant + BN and BN + BN fusion regardless for other reasons. Quant -> BN in all cases and then make use of generic optimizations. What the Quant node really became was an annotation and a precision applied to the output (and a corresponding quantizer). It's a holiday here in the US today so I am not sure I'll get a chance to look at this carefully until next week.

…nd activation-quant to Activation

jmitrevs · 2021-11-26T16:00:48Z

Can you explain the reasoning for the Quant changes a bit better? How would it work in the "Reshape -> Mul -> Sub -> Quant" sequence in the beginning of TFC_2W2A_clean, for example? The old scheme works into the following steps:

Reshape -> Mul -> Sub -> Quant
Reshape -> BN -> BN -> BN(w/ quant annotation)
Reshape -> BN(w/ quant annotation)

The MatMul -> Dense checks for quant annotation of the input to determine the bit size. (Moving the output bitwidth caclulation to a separate optimization step is planned but not yet implemented; currently it's all in the MatMul->Dense optimization)

The reason I went for Quant -> annotated BatchNormalization was simplicity. There are no special cases. That's what I liked. The real "Quant" part goes into the output annotation, and it can become a part of any node. BN is just for the scale and offset, and it should not add any extra operations after the fusing. (We discussed at a meeting how to handle quantized input and though we did not come to a conclusion, generally the idea of putting a quantizer at the beginning was not favored, so I did not worry about an initial quantizer adding a BN that would not be fused.)

thesps · 2021-11-26T16:29:30Z

Essentially, I don't think it's strictly safe to use a layer type for a different purpose than the one it was designed for. We'd introduce a kind of maintenance overhead to the BatchNormalization layer to remember that it needs to be used for Quant nodes too (for example when designing optimizers), which I argue is unexpected. So BatchNormalization layers should be expected to scale-and-shift, and nothing else.

The BatchNormalization fusion works quite neatly in the TFC-2w2a example because there is always a BatchNormalization before a Quant (not counting weight activations for the moment). In the general case there might not be a BatchNormalization layer, so for example the pattern MatMul -> Quant would transform to MatMul -> BatchNormalization (including multiplication by 1 and addition of 0). So it's neater to go MatMul -> Activation (linear, quantized) to explicitly perform the Quant operation and nothing else.

For the TFC-2w2a case, this:

Reshape -> Mul -> Sub -> Quant
Reshape -> BN -> BN -> BN(w/ quant annotation)
Reshape -> BN(w/ quant annotation)

becomes:

Reshape -> Mul -> Sub -> Quant
Reshape -> BN -> BN -> Activation
Reshape -> BN -> Activation

The weight quantization section changes from

Constant -> Quant -> MatMul
Constant -> BN -> MatMul
BN -> MatMul
...

to

Constant -> Quant -> MatMul
Constant -> MatMul
...

I added a CI test with the TFC-2w2a model, so you can see the full HLS project here.

The reason I went for Quant -> annotated BatchNormalization was simplicity. There are no special cases.

So with my changes there are only two cases: quantized weights (or constants in general) and quantized activations. I think it's sensible to differentiate them anyway because the first is a compile-time operation, while the second is a run-time operation.

BN is just for the scale and offset, and it should not add any extra operations after the fusing.

We actually need to handle scale and offset a bit differently to be correct in the end, I think. We should look at an example with a real scale. But, if I've understood properly a pattern like Constant -> Quant -> MatMul (ie weight quantization) with a scale != 1 should eventually become, for example Dense -> ScaleAndShift (BatchNormalization) because the weights of the MatMul need to be scaled within the range representable by the number of bits specified in the Quant, then the scales need to be 'reinserted' afterwards for correctness. So the scale needs to be anyway handled a bit differently than just multiplying the Constant and then dropping it.

jmitrevs · 2021-11-26T16:46:36Z

Actually, Constant -> BN becomes an annotated constant in my case, not a BN, so the final result of Constant -> Quant is just a Constant in both cases.

I think the main difference is does one think of a Quant as becoming an annotation that can be applied to any node (Constant, BN, Dense,...) or is it a special Activation node that we need to keep around? I treat Quant as an annotation to be added to a node.

In the examples we have the Quant node is always at the inputs of the MatMul or Conv, not the output. This determines the quantization of the inputs to the MatMul or Conv, so you can se the bit widths of the operations. The form input -> Activation -> Dense seems a bit strange. But as an input quantization, it makes sense, and then you can derive an output quantization by propagating, and annotate the Dense with that. Alternately a Quant following the MatMul or Conv can explictly quantize the results of the calculation, but it doesn't quantize the actual calculation.

jmitrevs · 2021-11-26T16:53:46Z

By the way, concerning

Dense -> ScaleAndShift (BatchNormalization) because the weights of the MatMul need to be scaled within the range representable by the number of bits specified in the Quant, then the scales need to be 'reinserted' afterwards for correctness.

I always assumed that any scaling a quant does that needs to be undone needs to be explicit in the ONNX. The scale and shift are real. It would not be obvious when to scale back otherwise. A Quant node is local, taking inputs and producing modified outputs.

jmitrevs · 2021-11-26T17:05:41Z

I could see the argument for a special NOP quantization layer if scale is 1 and offset is 0 if it makes merging easier, though. I wasn't sure if it simplifies or complicates things, so I didn't do it, but it is worth revisiting.

The main thing, though, is what is the final result of a quant node in our model? I thought of it as an output annotation specifying the precision.

thesps · 2021-11-26T17:18:42Z

Actually, Constant -> BN becomes an annotated constant in my case, not a BN, so the final result of Constant -> Quant is just a Constant in both cases.

Yep, my bad, I'd already seen that in both cases it becomes a Constant.

In the examples we have the Quant node is always at the inputs of the MatMul or Conv, not the output.

It doesn't have to be like that though, a Quant node can go anywhere. It just happens that the examples do Layer -> BatchNorm -> Quantized Activation (linear), but other patterns are possible. If the quantized-activation was a quantized-ReLU, for example, the ONNX graph could look like Dense (e.g.) -> BatchNormalization -> ReLU -> Quant. And then you can't profit from the BatchNormalization merging anyway.

I could see the argument for a special NOP quantization layer if scale is 1 and offset is 0 if it makes merging easier, though

The idea in the PR is that the quantization of a "run time tensor" (ie not a Constant) is an explicit operation, represented by its own layer.

I treat Quant as an annotation to be added to a node.

I think that's right for quantized weights, but not for activations (or rather non-constant-tensors).

For the scale and zero-point, the idea is that for a quantized-weight we need to do this in the compiler by modifying the Constant, and this in the FPGA. So the scale needs to be propagated as an attribute of the weights. I'm not doing that yet in this PR either, but that's how it needs to be handled.

A Quant node is local, taking inputs and producing modified outputs.

So actually this isn't totally true. Recall for example the conversations that we had with the FINN team about propagating scale factors through a model. The point is to separate the "real value" of the tensor into a part that can be represented with low bit precision, and a part that can be represented as a floating point scale factor that can be moved around (that we handle by inserting an ApplyAlpha aka BatchNormalization layer).

I'm getting started using this code to generate some simple models with a real (!= 1) scale (With a small modification to save the QONNX model)

There's another example of a QONNX model here with scale != 1 and quantized-ReLUs here.

jmitrevs · 2021-11-26T17:48:25Z

Remember last Friday we were discussing whether it's MatMul->Quant->ReLU or MatMul->ReLU->Quant, and we decided that it could even be MatMul->Quant->ReLU->Quant. They mean different things:

MatMul->Quant->ReLU: quantize the output of MatMul and then do ReLU
MatMul->ReLU->Quant: do ReLU and quantize its output
MatMul->Quant->ReLU->Quant: quantize the output of MatMul and then do ReLU, and quantize again.

(The acutal quantization of the MatMul operation is specified upstream in all cases). In my scheme, the result at the end for the three cases should be:

MatMul (w/ annotated output)->ReLU: quantize the output of MatMul and then do ReLU
MatMul->ReLU (w/ annotated output): do ReLU and quantize its output
MatMul (w/ annotated output)->ReLU (w/ annotated output): quantize the output of MatMul and then do ReLU, and quantize again.

(If it doesn't go to this, unless there's a good reason to, then something should be modified for it to go to this.) In all cases, though, the Quant becomes an annotation, not a special operation (unless explicit scaling or shifting is required), and the annotation basically determines quantization of the output. Our model requires a precision of the output in all cases, so it seemed natural to me to apply it there.

Is your proposal that the 3 options above be:

MatMul->Linear->ReLU: quantize the output of MatMul and then do ReLU
MatMul->ReLU->Lienar: do ReLU and quantize its output
MatMul->Linear->ReLU->Linear: quantize the output of MatMul and then do ReLU, and quantize again.

jmitrevs · 2021-11-26T18:06:53Z

So if the proposal is to make a Quant a Linear, preceded, only if necessary, by a BN, I think that would be good. I don't think you necessarily need the Const special case since I think the optimizations should handle it, though it's not a big deal.

The key question, though, is what should be the final form of the Quant. I still think an annotation and output precision is the way to go, but I could be convinced otherwise.

jmitrevs · 2021-11-26T19:38:52Z

As for the rescaling, I really don't see how that can be inserted automatically. We should maybe discuss this more as a group. I was asking Nhan about it before and my understanding after the discussion became that really the scaling is not undone automatically. Everything needs to be explicit in the graph.

thesps · 2021-11-27T00:11:46Z

I put together a small example of how scaling works here. It's just a single MatMul with a (4 bit) Quant on the weights. Those weights are:

weights:
 [[-0.42833856  0.2461826   0.78714716 -0.7732045 ]
 [ 0.2447649  -0.86163914 -0.11244959 -0.44183114]
 [-0.06119122  0.4923652   0.11244959  0.55228895]]

And the scales are:

scales: [[0.06119122 0.1230913  0.11244959 0.11045779]]

The idea is that only weights / scales are integers:

weights / scales:
 [[-7.  2.  7. -7.]
 [ 4. -7. -1. -4.]
 [-1.  4.  1.  5.]]

Evaluating on some example data:

X: [[ 0.4732726  -0.66137505 -0.6119138 ]]
y_qonnx:            [[-0.3271585   0.385093    0.37809706 -0.41167364]]

To get the correct output, we can either do np.dot(X,w) or np.dot(X, w/s) * s. And the point is that since w are float and only w/s are 4-bit integers, the weights of the Dense in the HLSModel should be w/s, then we need to do the * s in an inserted layer (ApplyAlpha / BatchNormalization, or some better name) after the Dense.

np.dot(X, w/s) * s: [[-0.32715854  0.385093    0.3780971  -0.4116736 ]]

For completeness, here's the hls4ml output (which doesn't work in this ingest-qonnx-thesps branch nor ingest-qonnx yet).

y_hls4ml:           [-5.3447266  3.1308594  3.3583984 -3.7216797]

Everything needs to be explicit in the graph

Hopefully the example shows how all the information is there in the Quant node, we just kind of need to 'factorize' which operations happen where in order to have both low-bitwidth Dense, Conv, etc. layers, and correct results by using scale factors.

I still think an annotation and output precision is the way to go, but I could be convinced otherwise.

I also think this is probably the way, my work here is incomplete in that the 'Activation' I'm inserting should get merged somewhere else later. But, with the scale factors complication, I think these activation-quantizers need to be handled differently from weight-quantizers, and later in the flow.

jmitrevs · 2021-11-27T01:14:45Z

I will have to try to understand it. Let's talk more next week. I should see what FINN does. The example, though, is for quantized weights, which wouldn't create an Activation node.

jmitrevs · 2021-12-02T15:29:18Z

Based on the discussion yesterday I will merge this request.

jmduarte and others added 13 commits October 11, 2021 16:09

fix batched multiple inputs

c5c0dad

Merge pull request #414 from fastmachinelearning/multiple_inputs_fixes

bdd0bc5

Fix batched multiple inputs

Fixed 'qkeras_mnist_dense' example build problem #423

0b1b1c9

Update for pyyaml 6.0 (#435)

c76c543

* yaml.safe_load instead of yaml.load * Use yaml.safe_load in converters __init__.py

axi_stream_driver update (#420)

9652b88

* Update `zcu102` and `pynq-z2` `axi-stream` driver

Reorder loops in im2col_2d_cl given resource strategy issue. Reenable…

fa1ff24

… relevant test. Use 5000 MNIST samples rather than full dataset for faster testing

Disable some unsupported layers

4203ed2

Set appropriate data type for quantized_relu activations

d31921c

Display unsigned types properly in profiling

0ef2766

Merge branch 'master' of https://github.com/fastmachinelearning/hls4ml …

27c0dcf

…into ingest-qonnx-thesps

Add a QONNX test, update the image

f67c3a1

Don't convert Quant to BatchNorm. Convert weight-Quant to Constant, a…

95ed2e9

…nd activation-quant to Activation

thesps force-pushed the ingest-qonnx-thesps branch from 7c79cdb to 95ed2e9 Compare November 26, 2021 11:49

jmitrevs merged commit 2bf3afe into ingest-qonnx Dec 2, 2021

jmduarte deleted the ingest-qonnx-thesps branch November 2, 2022 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes to `ingest-qonnx` #461

Changes to `ingest-qonnx` #461

Uh oh!

thesps commented Nov 25, 2021

Uh oh!

jmitrevs commented Nov 25, 2021

Uh oh!

jmitrevs commented Nov 26, 2021

Uh oh!

thesps commented Nov 26, 2021

Uh oh!

jmitrevs commented Nov 26, 2021 •

edited

Loading

Uh oh!

jmitrevs commented Nov 26, 2021 •

edited

Loading

Uh oh!

jmitrevs commented Nov 26, 2021

Uh oh!

thesps commented Nov 26, 2021 •

edited

Loading

Uh oh!

jmitrevs commented Nov 26, 2021

Uh oh!

jmitrevs commented Nov 26, 2021

Uh oh!

jmitrevs commented Nov 26, 2021

Uh oh!

thesps commented Nov 27, 2021

Uh oh!

jmitrevs commented Nov 27, 2021

Uh oh!

jmitrevs commented Dec 2, 2021

Uh oh!

Uh oh!

Changes to ingest-qonnx #461

Changes to ingest-qonnx #461

Uh oh!

Conversation

thesps commented Nov 25, 2021

Uh oh!

jmitrevs commented Nov 25, 2021

Uh oh!

jmitrevs commented Nov 26, 2021

Uh oh!

thesps commented Nov 26, 2021

Uh oh!

jmitrevs commented Nov 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmitrevs commented Nov 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmitrevs commented Nov 26, 2021

Uh oh!

thesps commented Nov 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmitrevs commented Nov 26, 2021

Uh oh!

jmitrevs commented Nov 26, 2021

Uh oh!

jmitrevs commented Nov 26, 2021

Uh oh!

thesps commented Nov 27, 2021

Uh oh!

jmitrevs commented Nov 27, 2021

Uh oh!

jmitrevs commented Dec 2, 2021

Uh oh!

Uh oh!

Changes to `ingest-qonnx` #461

Changes to `ingest-qonnx` #461

jmitrevs commented Nov 26, 2021 •

edited

Loading

jmitrevs commented Nov 26, 2021 •

edited

Loading

thesps commented Nov 26, 2021 •

edited

Loading