Skip to content

Argmax Softmax #627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 27, 2022
Merged

Argmax Softmax #627

merged 3 commits into from
Oct 27, 2022

Conversation

bo3z
Copy link
Contributor

@bo3z bo3z commented Aug 3, 2022

A# Description

📝 A new implementation of Softmax activation for Vivado and Quartus

  • Softmax is a monotonically increasing function; therefore, in classification problems, the output is equivalent to the largest output before Softmax activation (i.e. Softmax
  • Default implementation is still stable, as we are sometimes interested in the normalized output probability.

Two implementations are added:

  • ArgMax - Returns a one-hot encoded vector. This would be set through the hls4ml config, so for example hls_config['LayerName']['softmax']['strategy'] = 'argmax' (very similar to what we do now with stable and latency implementations of Softmax).
  • Logits - Removes the Softmax layer. Again handled through hls4ml config, through an optional boolean attribute skip (defaults to false), so for example: hls_config['LayerName']['softmax']['skip'] = True. There would be an optimizer that removes the Softmax node from the model graph and rewires the network.

Type of change

  • New feature (non-breaking change which adds functionality)

Tests

  • Expanded test/pytest/test_softmax.py with new implementation

Checklist

  • I have read the guidelines for contributing.
  • [] I have commented my code, particularly in hard-to-understand areas.
  • [] I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.

@bo3z bo3z requested a review from vloncar August 3, 2022 10:32
@thesps
Copy link
Contributor

thesps commented Aug 3, 2022

I like the idea, but I think the implementation could be handled differently.

Am I right thinking that this "strategy argmax" effectively replaces the softmax function with a linear activation?
If so, this feels like something that should be done with an optimizer pass rather than in the HLS, i.e. one that looks for a softmax at the end of the model and deletes it. And it's confusing to still call this softmax. From the description I also expected a layer that returns the actual argmax value, I think that would be nice. There you'd need some HLS to do that function, and I'd say also an optimizer pass to insert it. We could even implement a Python 'argmax layer' for Keras, such that users can add it to the model at the end to explicitly say that this is how the model should be converted.

I think also it would not only be softmax that can be replaced in this way, but any activation function that meets the criteria you described (which is most of them).

@jmduarte
Copy link
Member

jmduarte commented Aug 3, 2022

I agree with Sioni's comments and I would add there could be two different optimizer passes depending on the choice of strategy:

  • "argmax" -> replaces softmax with an argmax layer (returning either a 1-hot encoded vector or the max index)
  • "logits" -> just removes softmax layer entirely and returns the logits

@bo3z
Copy link
Contributor Author

bo3z commented Aug 3, 2022

I agree with the comments, the reason I didn't remove the Softmax layer completely is because in the test we have a simple one-layer network, which if removed, wouldn't really work. @thesps you're right, Softmax is essentially replaced with linear activation in this PR. My proposed change is then two "implementations":

  1. ArgMax - Returns a one-hot encoded vector. This would be set through the hls4ml config, so for example hls_config['LayerName']['softmax']['strategy'] = 'argmax' (very similar to what we do now with stable and latency implementations of Softmax). I would still handle this implementation through HLS, as it acts as a (rough) approximation of Sofmtax. I would avoid only returning the index of ArgMax, as it changes the dimensionality of the output (can be done, but doesn't feel correct)
  2. Logits - Removes the Softmax layer. Again handled through hls4ml config, through an optional boolean attribute skip (defaults to false), so for example: hls_config['LayerName']['softmax']['skip'] = True. There would be an optimizer that removes the Softmax node from the model graph and rewires the network.

I would avoid having a custom Keras layer, for two reasons:

  1. Users might not be aware of our implementation (we should include it in the documentation & tutorials) and they might already have a pre-trained model using Keras Softmax (this is a very weak point, as replacing a Keras layer with a custom one is a few lines of code)
  2. hls4ml is not Keras specific, there are converters for PyTorch and ONNX as well

@bo3z bo3z requested a review from thesps September 2, 2022 15:08
@thesps
Copy link
Contributor

thesps commented Oct 13, 2022

Apologies for the late followup review. This looks really nice now, all these implemented options are useful. The problems now are:

  • some tests fail, including ones testing the new feature. Could you look into that?
  • there are some merge conflicts, although they look easy enough to resolve

@vloncar
Copy link
Contributor

vloncar commented Oct 18, 2022

Since Benjamin is back at university now, I resolved the two issues, can you merge now @thesps ?

@thesps thesps merged commit 32ee8dc into fastmachinelearning:main Oct 27, 2022
calad0i pushed a commit to calad0i/hls4ml that referenced this pull request Jul 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants