-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Unsafetyify From<Vec<char>> #35098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsafetyify From<Vec<char>> #35098
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
8c2706d
to
8a5d5a2
Compare
impl<'a> From<&'a [char]> for String { | ||
#[inline] | ||
fn from(v: &'a [char]) -> String { | ||
let mut s = String::with_capacity(v.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about the suggestion from #35054 (comment)?
char
usually implies that we work with human language so we can use domain knowledge.
len + len / 8
(=1.125
) or len + len / 16
(=1.0625
) cover most of European languages.
For reference,
ratio(en) ≈ 1.002
ratio(fr) ≈ 1.040
ratio(de) ≈ 1.016
ratio(hu) ≈ 1.091
(lots of diacritics)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m really uninterested in tackling this implementation in this PR.
But since you’ve asked, I ought to state my opinion on the topic, at least. I agree that nobody would store ASCII-only text as UTF-32 (there’s bytestrings, after all) and any ratio > 1
is therefore better than ratio = 1
. ratio = 1.5
to 1.6
coupled with the fact that reallocation doubles capacity could be a good choice, especially given the fact that ratio(jp)
, ratio(cn)
and ratio(ko)
are all somewhere between 2 and a bit over 3.
That being said, my gut tells me that nobody would be using this conversion with any serious expectations towards its performance, thus thinking about this problem is not very productive.
We know for sure this method cannot slice out-of-bounds because: * 0 ≤ self.pos ≤ 3 * self.buf.len() = 4 This way the slicing will always succeed, but LLVM is incapable of figuring out both these conditions hold, resulting in suboptimal code, especially after inlining.
Machine code turned out pretty nicely.
8a5d5a2
to
0ce9323
Compare
We discussed this at the libs triage, and @alexcrichton raised some soundness concerns. The allocator is provided with alignment information when deallocating the memory backing the |
Fair point. I feel like something like @alexcrichton do you think calling that function is a correct way to “realign” memory? |
As far as I know I don't think we have a way to realign memory, unfortunately :( |
Closing for now due to the unsafety concerns (and lack of knowledge of a solution to them), but feel free to reopen if a solution is thought of! |
Follow-up PR for #35054