Skip to content

Add BLAKE3? #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nirvdrum opened this issue Dec 13, 2024 · 10 comments
Open

Add BLAKE3? #77

nirvdrum opened this issue Dec 13, 2024 · 10 comments

Comments

@nirvdrum
Copy link

The BLAKE3 hash function is growing in popularity due to improvements in both security and performance over other hashing functions. There's an open source plugin for this extension that adds BLAKE3 that works well, but it would be great if this were integrated directly in digest. That way, the gem provides a great experience out of the box. Additionally, it would be nicer for JRuby and TruffleRuby since both of those implementations provide an API-compatible version of digest in their respective distributions.

There's an official implementation of BLAKE3 in C that we could wrap in Ruby.

I can work on a PR to add BLAKE3 support directly in this gem, but I wanted to check if there are any concerns to address or blockers that would prohibit the integration.

@nirvdrum
Copy link
Author

@knu With BLAKE2, at least, you had expressed a preference to use OpenSSL for hashing. I don't when BLAKE3 will become available in OpenSSL, but it's been an open issue for over 4.5 years. I suspect it will be a while yet before it's available for use in Ruby.

The blake3-ruby gem has proven to be popular with 220k installations. I know that may seem small compared to something like Rails, but I think it's fairly large number for a hash function. This indicates there is broad interest in choosing BLAKE3 over functions available in either the digest or openssl gems. I think it'd be great for the Ruby community if it were even easier to use BLAKE3.

@knu
Copy link
Member

knu commented Dec 18, 2024

Thanks for the information. I worked hard to make Digest easy to extend, ensuring that adding new algorithms as external modules would be easy. From what I can see from the implementation the blake3-rb gem you mentioned is a perfect example of that, and the third-party digest module having been used as a popular add-on is good news to me.

That said, what does the author of the gem think or say about the integration? And if we were to import theirs, would it be a soft requirement to keep using Rust? Requiring a Rust compiler to build and install the digest library might be a show-stopper and we would probably need to consider making it optional and/or distributing binary gems, which I'm not really willing to do.

@nirvdrum
Copy link
Author

nirvdrum commented Dec 18, 2024

@knu Thanks for the quick reply. Having a plug-in system is nice, but I think Ruby should have robust defaults out of the box. Up until recently I wasn't even aware you could have digest plugins. I bet most people look at what's available in either digest or openssl and give up if they don't find the hash function they're looking for. That, in turn, could look like Ruby is falling behind.

I don't know if we literally need to upstream that gem. I'd suggest using the official BLAKE3 C implementation. From there, we can largely translate the gem to C, although owing to the plugin mechanism, the gem is quite straightforward.

@ianks, do you have any objections to upstreaming your work on blake3-rb? I know the Rust implementation affords some really nice properties, but the official Ruby gems stick to C for ease of distribution. That's a concession I'm willing to make in order to get BLAKE3 available out of the box, but I'd like to confirm that's okay with you. A pleasant side effect of being integrated upstream is that JRuby and TruffleRuby could optimize it for the JVM.

@headius @enebo Would either of you be up for implementing BLAKE3 using the JRuby extension API?

@ianks
Copy link

ianks commented Dec 18, 2024

Absolutely fine by me. Happy to see official blake3 support come to fruition. Thanks for taking this on @nirvdrum.

@headius
Copy link

headius commented Dec 18, 2024

JRuby and TruffleRuby could optimize it for the JVM

I don't see how we'd be able to optimize it for the JVM if it is imported as a C library.

JRuby doesn't support the C extension API so any integration would have to be over FFI, which then requires copying data back and forth. TruffleRuby supports the C extension API, but cannot optimize across that boundary, so the performance would be no faster than in CRuby (and I assume slower because of the overhead of crossing that boundary).

Would either of you be up for implementing BLAKE3 using the JRuby extension API?

There are Java implementations of Blake3 that could be used from JRuby just by writing Ruby code but I have not evaluated any of them for performance. If they are small enough, it would be preferable to import their Java code directly into this library, rather than adding another external dependency to a standard library gem.

We have not heard from any users interested in this feature, so it would not be a high priority. It wouldn't be hard for someone else to integrate it, though.

@enebo
Copy link

enebo commented Dec 18, 2024

@nirvdrum I have interest if this is something important enough to be needed but it is very low on the priority list as far as free time I can spend on it unfortunately. I agree with @headius that this needs to be a Java implementation so it requires some extra work in figuring out what we can leverage.

@eregon
Copy link
Member

eregon commented Dec 19, 2024

👍 to add BLAKE3 from me.

As we have seen in TruffleRuby already some time ago, having a digest implementation as a C API (to be precise, not using Digest's extension mechanism but just using a method defined in C via rb_define_method) is many times slower than having it directly supported and written in Java (there was a Twitter thread about it, but I can't find it anymore). That's even when TruffleRuby was using Sulong and JIT-compiling C code, the overhead of the C API for that is simply too high on non-CRuby (e.g. conversions on the boundary like managed byte[] to native char* copy, etc).

So for good digest performance, they need to be implemented directly as part of TruffleRuby's Java code for TruffleRuby, as C extension for CRuby and as a JRuby extension for JRuby.

I worked hard to make Digest easy to extend, ensuring that adding new algorithms as external modules would be easy.

The main issues I see with that mechanism are: JRuby cannot support that API, because it relies on C extensions. TruffleRuby does not currently support it, maybe it could but it would be quite slow as mentioned above and just the FFI overhead to call the native functions would be too high (as again it needs at least a byte[]->char* copy).

It seems unfortunately very difficult to create an extension mechanism for Digest which is efficient for the 3 main Ruby implementations.
I think supporting popular and well-established digest algorithms like BLAKE3 here makes a lot of sense and is the best solution in terms of performance. For many use cases, the performance of the digest is critical.

@byroot
Copy link
Member

byroot commented Feb 6, 2025

@knu any more opinions on this? It would be really nice if digest came with a few more popular algorithms (I'm thinking CRC32).

Because yes it can be extended with external gems, but often gems prefer to minimize dependencies so they end up using what's in digest and often that means MD5, which then cause issues with things like FIPS (some certification that remove MD5 from OpenSSL).

So I'd be happy to submit a PR if that's something you'd accept.

@knu
Copy link
Member

knu commented Apr 15, 2025

@byroot As we talked about earlier, let's go ahead with this. CRC32 sounds nice to me, too.

@byroot
Copy link
Member

byroot commented Apr 15, 2025

Thank you! I'll come up with some PRs after RubyKaigi

casperisfine pushed a commit to casperisfine/digest that referenced this issue Apr 24, 2025
Ref: ruby#77

`CRC32` is relatively commonly needed for network protocol and
some archive formats like `zip`.

This is a clean implementation derived from the Wikipedia article.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants