-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-19724. [RISC-V] Add rv64 Zbc (CLMUL) bulk CRC32 (CRC32C not optimized) #8031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
…etection Co-authored-by: gong-flying <[email protected]>
|
💔 -1 overall
This message was automatically generated. |
|
@steveloughran could you please review this PR if you have time? thanks! |
|
Hi @pan3793 @slfan1989 , could you please take a look when you have a moment? Happy to address any feedback. Thanks! |
|
Is everyone with a risc-v setup able to test this? |
@PeterPtroc Thank you for your contribution! However, RISC-V is beyond my current knowledge, and I’m sorry I’m unable to assist with reviewing this part of the code. I recommend reaching out to other team members for assistance with the review. |
to reviewers, #7924 may help you to set up a dev box on x86 or aarch platform by leveraging Docker & QEMU to simulate riscv env, but it's super super slow, either has no means of performance evaluation. |
steveloughran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I've tried to review this and asked google gemini to look at specific aspects (alignment, assembly, pointer). It's happy "Yes, the RISC-V assembly code in this pull request looks excellent. It is correct, safe, and follows modern best practices for inline assembly.". If I hadn't been arguing with it and latex citations all afternoon I'd treat its opinions as valid. But here they are as good as my judgement.
I propose adding a comment on each method so that whoever reads this code next understands what it tries to do. Same for the processing of the misaligned data at the start of an operation, and any leftovers.
that's all: explain things for future developers.
+1 pending these changes
...p-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32_riscv.c
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32_riscv.c
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32_riscv.c
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32_riscv.c
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32_riscv.c
Outdated
Show resolved
Hide resolved
|
Hi @PeterPtroc , |
|
@PeterPtroc as noted, @leiwen2025 can help here. @leiwen2025 -can you look at this PR as is and review it. Ideally: check it out and do a -Pnative build running the native tests. If you two are using different instructions, how do they differ. Having just looked at what clmul/clmulh does, I can see why it offers benefits
Looking at #7912 it's calling vclmul.vv -this is generally going to be faster, isn't it? Which means that while the code is more complex, ultimately it's going be the best option on cores with the right feature flaggs. This makes me think that this one can go in but the vector one goes in as the followup, with the choice of operation dependent on feature, with priority of: vclmul, cmul, classic. |
@steveloughran Thanks! I’m happy to help. I’ll check out the PR as-is and run a |
Hi @steveloughran, I have completed the native tests, and the results are consistent with the data shown in the PR. Should I display the data? |
Co-authored-by: gong-flying <[email protected]>
|
sorry for the late reply. I am also excited to collaborate with @leiwen2025 in the next phase to integrate the vectorized solution. Our goal is to implement a multi-tiered optimization strategy: The decision to prioritize the scalar (Zbc/Zbkc) implementation is based on several key factors:
I will update the PR with the requested comments and documentation from @steveloughran shortly to ensure the implementation is well-explained for future developers. |
|
💔 -1 overall
This message was automatically generated. |
Description of PR
Below are the performance changes observed using the built-in CRC32 benchmark. Although performance is poor when bpc <= 64, there are substantial improvements when bpc > 64. To keep the codebase simple and maintainable, I did not add bpc-size-specific handling.
How was this patch tested?
Built hadoop-common with native profile on riscv64; verified it's function by TestNativeCrc32.
Ran Hadoop’s CRC32 benchmark on riscv64 (OpenEuler/EulixOS) with JDK 17.
Here is the commands and results:
Command:
Results
Command:
Results (Origin)
Results (With this commit)
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?