Skip to content

Some de-~[] work in std::io #13165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 7, 2014
Merged

Some de-~[] work in std::io #13165

merged 3 commits into from
Apr 7, 2014

Conversation

sfackler
Copy link
Member

Reader, Writer, MemReader, MemWriter, and MultiWriter now work with Vec<u8> instead of ~[u8]. This does introduce some extra copies since from_utf8_owned isn't usable anymore, but I think that can't be helped until ~str's representation changes.

@lilyball
Copy link
Contributor

Would it make sense to introduce a str::from_utf8_vec() method that takes a Vec<u8> and returns a ~str, and use that here? It wouldn't be any more efficient, but when ~str changes its representation to match Vec<u8> it would then automatically make all the users of this API more efficient.

@alexcrichton
Copy link
Member

I grow slightly worried about all of this removal of ~[T], even in a DST world the type will still exist, and it is arguably the appropriate type to return in many of these locations. For example, internally a MemWriter should be use a Vec, but you should unwrap it to a ~[u8] (frozen vector) because it's no longer pushable/poppable. Other apis, such as push_exact should definitely take Vec<u8> rather than ~[u8].

I'm basically worried about possibly unnecessary churn. @nick29581, do you think that ~[T] needs to be removed entirely for DST to get implemented? Can it stay where appropriate?

@lilyball
Copy link
Contributor

@alexcrichton ~[T] and Vec<T> will not have the same memory representation, which means you cannot simply unwrap and return a ~[T], that would require a brand new allocation (thankfully you can skip cloning, but you still need to move all the values into the new ~[T] before returning it).

And, of course, if the client does want to be able to push/pop, they'd have to move the ~[T] back into a brand new Vec<T>.

@alexcrichton
Copy link
Member

@kballard, that is not true, in a DST world ~[T] is a 2-word value like &[T] is today, so it has exactly the same representation as Vec<T>, except that it's missing the capacity word.

Moving from Vec<T> to ~[T] would likely "shrink to fit", but consumption from ~[T] to Vec<T> is free.

@sfackler
Copy link
Member Author

Why would we want to forbid anyone from pushing to a vector returned by MemWriter::unwrap?

@alexcrichton
Copy link
Member

We wouldn't forbid it, DST provides cheap transitions between ~[T] and Vec<T>. Most clients of MemWriter::unwrap() will likely not be deconstructing the vector, and it conceptually seems more like a fozen than pushable vector to me.

@lilyball
Copy link
Contributor

@alexcrichton Ok you're right, I was confused and was thinking that the data was stored inline. But it's stored in a separate allocation, which means it can be moved between Vec<T> and ~[T] without reallocation.

In any case, I am not opposed to returning ~[T] from APIs where it makes sense to do so. But during this transition period, any API that still deals with ~[T] makes it impossible to fix the deprecated_owned_vector lint. My feeling is we should transition the standard libraries away from ~[T] as much as possible, implement DST, and then reintroduce ~[T] where it makes sense to do so.

@olsonjeffery
Copy link
Contributor

@alexcrichton so my issue is that, right now, in my build.. i use -D warnings.. so I have to isolate and/or remove all uses of deprecated ~[] syntax, including in IO. In my local codebase, I have a VecNgWriter that accomplishes what this pull does, wrt MemWriter. I also isolated the ~[u8] interaction w/i a function that I annotated w/ an attribute to allow the lint.

That being said, I don't think it's fair (or right) to pretty much put this on users, pending the change of ~[] to the fixed-sized vector type (which, as you observed, is appropriate in this case).

Should users have to wrangle with either 1) warning spam or 2) workarounds/massaging in their code (as I did) in order to continue using std::io in the midst of this transition (which, btw, we have no firm timeline let alone articulated plan for when/how it'll be resolved)? I think this pull provides an answer. Can you say that it'll change again after DST lands? Because this is addressing a problem that users are experiencing right now. (and for the record, I was considering doing this work myself and I'm glad @sfackler beat me to it, because I'm sloooooow).

@alexcrichton
Copy link
Member

@olsonjeffery, the deprecated_owned_vector lint is turned off by default now, and I starting to think that it shouldn't be turned back on.

Again, I don't want us to get ahead of ourselves. I have not heard from anyone implementing DST as to whether removing ~[T] entirely is necessary. Until that time, I think that blindly removing ~[T] in favor of Vec<T> is unwise and possibly causing more churn than necessary.

Should users have to wrangle with either 1) warning spam or 2) workarounds/massaging in their code (as I did) in order to continue using std::io in the midst of this transition

Definitely not 1 (it's turned off), and I would very much like to deliberately change ~[T] to Vec<T> where necessary instead of blindly doing so.

Can you say that it'll change again after DST lands?

This is what I've been saying about not changing to Vec<T>. I think it may be premature in some cases because we'll eventually just revert it back to ~[T].

@bill-myers
Copy link
Contributor

BTW, it's easy to make DST ~[T] resizeable by just putting the capacity before the data (at negative offset), or even having a memory allocator that can be asked about the size of an allocation, and using that interface to determine the capacity.

Removing Vec<T> in favor of such a thing might well be a far better approach since it means that there's aren't two ways to represent vectors with different but not very different semantics.

OTOH to make it efficient it means that you are forced to round up all ~[T] allocations to a power of k (where k is usually 2, but can be smaller), since otherwise you get quadratic time complexity.

@huonw
Copy link
Member

huonw commented Mar 27, 2014

I think the transition from current ~[] to DST ~[] will be far cleaner if ~[] is gone entirely, temporarily (e.g. we won't need to do staging for the various features in std::slice that are still being used, or carefully audit any transmutes of ~[]... because all of that will be remove).

@alexcrichton
Copy link
Member

@huonw, I'm trying to emphasize that no one here is the one implementing DST, so no one knows for sure that we should remove ~[] entirely, this is all speculation. Removing all usage of this type is a huge amount of churn, and it would be nice to avoid it if we can (again, only if we can, I'm all for removing all usage if we need to).

@thestinger
Copy link
Contributor

I think Vec<T> should still be returned in many cases when it's the type being used internally. If we require free to be passed a length in order to support faster allocators (likely), then converting from Vec<T> to ~[T] is going to require a shrink_to_fit() call which will make future pushes by the caller slower.

It's also a significant performance improvement right now, and switching to ~[T] in the future would be a very negligible one as it only drops it down to 2 words from 3, and it's usually just on the stack. In some cases there are recursive data structures where using the DST ~[T] would be an improvement, but a replacement similar (but not identical) to the old ~[T] like LLVM's TinyPtrVector would probably be even better.

@thestinger
Copy link
Contributor

I can understand wanting to avoid the churn and avoiding a backwards compatibility hazard, but I'm not really convinced that ~[T] will be a useful type outside of cases where it's used in a generic context and the type parameter is [T]. In my opinion, the existence of ~[T] will be an unfortunate consequence of the improvements brought by DST. We'll need to figure out guidelines on when to use Vec<T> and when to use ~[T]. Frequently converting between them would be noisier than using Vec<T> everywhere.

@nrc
Copy link
Member

nrc commented Mar 30, 2014

@alexcrichton Sorry for the late reply, been away from work. I was kind of thinking the same thing myself (whether it is necessary to totally kill [T] now). To be honest, I don't know the answer. I haven't got near to that stage in the implementation. Currently I am adding unsized/sized knowledge into the compiler and that is totally unaffected. My guess is that we don't need to, but it might make things a little bit easier. I would be happy to wait and see - that is do nothing for now and only remove ~[T] when/if we know we have to.

In general I prefer to move from one solution to another, rather than from one solution to an intermediate encoding to a second solution since I think the risk of missing something is lower.

@huonw
Copy link
Member

huonw commented Mar 30, 2014

(Any response to #13165 (comment)? Functions using Vec to build their return value would preferably return that Vec, to allow callers to continue to extend/resize the vector without the pointless overhead of losing any extra capacity that was allocated.)

@lilyball
Copy link
Contributor

@huonw I agree. I don't see any benefit to converting to ~[T]. That seems like it will be a largely useless type once we have DST. The only real use for it I can see is if you need a heap-allocated vector of fixed size but don't know what the size will be at compile-time, then creating a ~[T] would work, but that doesn't really save much over using a Vec<T> even in that case.

@nrc
Copy link
Member

nrc commented Mar 30, 2014

@huonw I hesitate to voice an opinion on library design since I have zero experience there. From the language POV, I would prefer to see [T] used wherever possible (in preference to Vec) because if we provide a nice syntax we should strive to make it the default and avoid the situation where there are non-obvious choices to make (or worse, choices based only on the aesthetic opinions of the user). It seems that the story is [T] if the size is fixed and Vec if the size can change, but then the cases you point out are difficult because one can't know the size-mutability intent ahead of time and we should expose the implementation detail that Vec is used internally to construct the result. We could have two of each method, one returning Vec, one [T], but that seems really bad.

Perhaps the ideal situation is that we return [T] and if the caller changes that immediately to Vec then somehow, magically that is free. Not sure how that could work though. Without such magic, returning Vec seems better since it is more general, but then I have the philosophical objection. Sigh.

So, that was a long-winded way of saying 'I don't know'.

@huonw
Copy link
Member

huonw commented Mar 31, 2014

The way I see it, the only time a ~[T] will have any positive effect is when storing many of them in a (mostly) immutable recursive data type (e.g. Rust's AST), so the memory savings of going from 3 words to 2 are significant. For maximum effect, this also requires a length 0 ~[T] can have no allocation (i.e. be (NULL, 0)).

Of course the philosophical objects remain.


One way to think of this would be flipping it around: if we only had Vec and then added in a new ~[T] type, would we go through and convert return values from Vec to ~[T] in instances where the callers "normally" don't need to modify the Vec?

I would think that we wouldn't do that (because makes the function slower, due to needing a shrink-to-fit to return, and penalises those cases when the caller does want to modify it), and that we would only change things to ~[T] in places like the AST.

@thestinger
Copy link
Contributor

@nick29581: If we use lots of ~[T], then I'm worried that users will need to perform a lot of conversions explicit to and from ~[T] which will add noise.

@lilyball
Copy link
Contributor

@thestinger Ideally, once DST lands, nobody will ever use ~[T] anymore, as there's only a very limited case where it's useful (the case that @huonw described).

@huonw
Copy link
Member

huonw commented Apr 5, 2014

Can we move forward on this and just land it? If necessary, we can convert back to ~[] post-DST; it's not like the churn from that will be particularly different to any of the other changes we make (e.g. priv on structs, the flip-flopping on the name of Chan::new & channel, the renaming of try!, removal of @mut/@).

(I'm keen to put #[deprecated] on the various methods like (~[]).push and std::io is one of the biggest remaining users of ~[] as a public interface; if not, we'd need short functions in the prelude for doing the (expensive) conversion between ~[] and Vec to justify this deprecation.)

@sfackler
Copy link
Member Author

sfackler commented Apr 5, 2014

Rebased

@alexcrichton
Copy link
Member

I commented on #13350 how I'm not fully convinced that this is the appropriate API change to make. Reverting this change will be easy enough for the API layer I believe, but moving away from push/pop on ~[T] is important for now I believe.

sfackler added 3 commits April 6, 2014 15:39
There's a little more allocation here and there now since
from_utf8_owned can't be used with Vec.
bors added a commit that referenced this pull request Apr 7, 2014
`Reader`, `Writer`, `MemReader`, `MemWriter`, and `MultiWriter` now work with `Vec<u8>` instead of `~[u8]`. This does introduce some extra copies since `from_utf8_owned` isn't usable anymore, but I think that can't be helped until `~str`'s representation changes.
@bors bors closed this Apr 7, 2014
@bors bors merged commit fcf9b30 into rust-lang:master Apr 7, 2014
chris-morgan added a commit to chris-morgan/rust-http that referenced this pull request Apr 7, 2014
Kroisse added a commit to Kroisse/rust-mustache that referenced this pull request Apr 9, 2014
@sfackler sfackler deleted the io-vec branch May 15, 2014 05:02
JohnTitor pushed a commit to JohnTitor/rust that referenced this pull request Sep 6, 2022
Properly handle break resolution inside non-breakable expressions

We now diagnose invalid `continue` expressions and properly handle things like `async` blocks which prevent labels from resolving further. Cleaned this up since `label_break_value` is on the way to stabilization in rust (🎉 finally) and we weren't handling breaks for blocks properly yet.
dingxiangfei2009 pushed a commit to dingxiangfei2009/rust that referenced this pull request Jul 28, 2024
chore: fix some comments

fix some comments

*Please write a short comment explaining your change (or "none" for internal only changes)*

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants