add float8 and bfloat8 conversion functions#6495
Conversation
|
chloeeyi seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6495 +/- ##
==========================================
- Coverage 93.27% 92.77% -0.50%
==========================================
Files 845 807 -38
Lines 266119 255130 -10989
==========================================
- Hits 248222 236702 -11520
- Misses 17897 18428 +531 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds float8 and bfloat8 conversion functions to support 8-bit floating-point formats. The implementation provides bidirectional conversions between float16 and two 8-bit formats: E4M3 (1-bit sign, 4-bit exponent, 3-bit mantissa) and E5M2 (1-bit sign, 5-bit exponent, 2-bit mantissa).
Key changes:
- Added float16 ↔ float8 E4M3 conversion functions with proper handling of special values
- Added float16 ↔ bfloat8 E5M2 inline conversion functions using direct bit truncation/extension
- Implemented proper handling for edge cases including zero, infinity, NaN, overflow, and underflow
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/mat.h | Added function declarations and inline implementations for float8 and bfloat8 conversion functions |
| src/mat.cpp | Implemented float16_to_float8 and float8_to_float16 with comprehensive special value handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks for your contribution ! |
No description provided.