Skip to content

Add Encoding::encode_mut_str #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zacknewman opened this issue Apr 5, 2025 · 5 comments · Fixed by #137
Closed

Add Encoding::encode_mut_str #136

zacknewman opened this issue Apr 5, 2025 · 5 comments · Fixed by #137

Comments

@zacknewman
Copy link

zacknewman commented Apr 5, 2025

Would it be possible to add a method like Encoding::encode_mut except have it return &str instead of ()? The purpose of such a method would be to allow for downstream code to use stack-allocated strings without having to use unsafe code or call core::str::from_utf8 and incur unnecessary overhead of UTF-8 validation.

Essentially the method would look similar to below with possible changes to call str::from_utf8_unchecked on a sub-slice of output in the event it's not guaranteed the entirety of output would be written to:

pub fn encode_mut_str<'a: 'b, 'b>(&self, input: &[u8], output: &'a mut [u8]) -> &'b str {
    self.encode_mut(input, output);
    // SAFETY:
    // Encoded data is always valid UTF-8.
    unsafe { str::from_utf8_unchecked(output) }
}

Currently downstream code is forced to do either

use data_encoding::BASE64URL_NOPAD;
pub fn foo() {
    let input = [0];
    let mut output = [0; 2];
    BASE64URL_NOPAD.encode_mut(input.as_slice(), output.as_mut_slice());
    // SAFETY:
    // `data_encoding::Encoding::encode_mut` always encodes UTF-8 data.
    let val = unsafe { str::from_utf8_unchecked(output.as_slice()) };
}

or

use data_encoding::BASE64URL_NOPAD;
pub fn foo() {
    let input = [0];
    let mut output = [0; 2];
    BASE64URL_NOPAD.encode_mut(input.as_slice(), output.as_mut_slice());
    let val = str::from_utf8(output.as_slice()).unwrap_or_else(|_e| unreachable!("there is a bug in data_encoding::Encoding::encode_mut"));
}

With encode_mut_str, then the above could be written like:

use data_encoding::BASE64URL_NOPAD;
pub fn foo() {
    let input = [0];
    let mut output = [0; 2];
    let val = BASE64URL_NOPAD.encode_mut_str(input.as_slice(), output.as_mut_slice());
}

When unsafe code is used, it's a lot better for the code to be entirely localized and not have to rely on an external crate to uphold the safety invariants. I realize this can be considered micro-optimization, but I thought I'd ask.

@ia0
Copy link
Owner

ia0 commented Apr 5, 2025

No I don't consider this a micro-optimization. This is a very legitimate request and something I was even considering for Encoding::encode_mut() for v3. I'm sending #137 to add this to v2. After merge, I'll probably wait a week to sleep on it and in case someone has a concern. Then I'll publish to crates.io.

@ia0 ia0 closed this as completed in #137 Apr 5, 2025
@ia0 ia0 closed this as completed in d816163 Apr 5, 2025
@zacknewman
Copy link
Author

That was fast!

@ia0
Copy link
Owner

ia0 commented Apr 12, 2025

This got released in #138, see Encoding::encode_mut_str() in docs.rs.

@zacknewman
Copy link
Author

zacknewman commented Apr 12, 2025

Thanks for the update. Tangentially related, before 2.0.0 was released, you released "release-candidate" versions (e.g., 2.0.0-rc.1). That was 7–8 years ago though. Since then, I've read your extremely well-thought-out and logical comments pertaining to "pre-release version numbers". As both a user of this crate and someone who will be a maintainer of crates of my own with post 1.0.0 versions that wants to publish "in-progress" newer versions, is this the strategy you will still follow? Specifically, can we expect a 3.0.0-rc.1 version before a 3.0.0 version where there will be no backward compatibility guarantees between 3.0.0-rc.x and 3.0.0-rc.y where x ≤ y?

If so, what stops you from publishing a 3.0.0-rc.1 version now? Do you intend to publish a 3.0.0-alpha.1 that has the same freedom to break backward compatibility, but where it's not only "allowed" but likely for breaking changes to happen in subsequent alpha releases in contrast to rc versions which while allowed to have breaking changes are unlikely?

  • After a few months of testing by your users, you are confident enough to reach the ultimate stage of development: 3.0.0-rc.1.0.
  • Reaching anything else than 3.0.0-rc.1.x would be catastrophic because you would break the promise that this only needs minor adjustments before going stable (i.e. to the stable branch).

I do find ^that interesting. It appears 3.0.0-rc versions are assumed to be backward compatible. If that is the case, then why not publish 3.0.0? Is the logic that the versioning scheme is based on a continuum of "breaking changes will very likely occur" to "breaking changes are guaranteed to not occur (modulo bugs)"? So the alpha tags are assumed to increment in a "breaking way" (e.g., alpha.1.2 -> alpha.2.0). beta tags are somewhat in the middle where changes may occur in a breaking way but the benefit should be well worth it. rc tags will only increment in a "breaking way" when something catastrophic needs to be fixed and can't be done in a non-breaking way (e.g., fixing UB). Finally, production versions truly never increment in a breaking way no matter what (e.g., even to address UB).

@ia0
Copy link
Owner

ia0 commented Apr 13, 2025

is this the strategy you will still follow?

Not exactly, because of how Cargo pre-releases work. Instead I'll follow something along the lines of rust-lang/cargo#2222 (comment): use data-encoding-v3 for the pre-releases.

Specifically, can we expect a 3.0.0-rc.1 version before a 3.0.0 version where there will be no backward compatibility guarantees between 3.0.0-rc.x and 3.0.0-rc.y where x < y?

I don't think I'll publish pre-releases on data-encoding for 3.0.0. Instead data-encoding-v3 should be used, according to the README.

what stops you from publishing a 3.0.0-rc.1 version now?

I'm not satisfied with the current code, but I've still published [email protected] anyway with a clear warning at the top of the readme.

Regarding the alpha/beta/rc thing, it doesn't matter since I'm not going to publish pre-releases. Instead I'll just document the current status regarding stability (and other properties) in the crate readme of data-encoding-v3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants