Some de-~[] work in std::io #13165

sfackler · 2014-03-27T05:55:41Z

Reader, Writer, MemReader, MemWriter, and MultiWriter now work with Vec<u8> instead of ~[u8]. This does introduce some extra copies since from_utf8_owned isn't usable anymore, but I think that can't be helped until ~str's representation changes.

lilyball · 2014-03-27T06:04:09Z

Would it make sense to introduce a str::from_utf8_vec() method that takes a Vec<u8> and returns a ~str, and use that here? It wouldn't be any more efficient, but when ~str changes its representation to match Vec<u8> it would then automatically make all the users of this API more efficient.

alexcrichton · 2014-03-27T17:07:42Z

I grow slightly worried about all of this removal of ~[T], even in a DST world the type will still exist, and it is arguably the appropriate type to return in many of these locations. For example, internally a MemWriter should be use a Vec, but you should unwrap it to a ~[u8] (frozen vector) because it's no longer pushable/poppable. Other apis, such as push_exact should definitely take Vec<u8> rather than ~[u8].

I'm basically worried about possibly unnecessary churn. @nick29581, do you think that ~[T] needs to be removed entirely for DST to get implemented? Can it stay where appropriate?

lilyball · 2014-03-27T17:10:38Z

@alexcrichton ~[T] and Vec<T> will not have the same memory representation, which means you cannot simply unwrap and return a ~[T], that would require a brand new allocation (thankfully you can skip cloning, but you still need to move all the values into the new ~[T] before returning it).

And, of course, if the client does want to be able to push/pop, they'd have to move the ~[T] back into a brand new Vec<T>.

alexcrichton · 2014-03-27T17:13:10Z

@kballard, that is not true, in a DST world ~[T] is a 2-word value like &[T] is today, so it has exactly the same representation as Vec<T>, except that it's missing the capacity word.

Moving from Vec<T> to ~[T] would likely "shrink to fit", but consumption from ~[T] to Vec<T> is free.

sfackler · 2014-03-27T17:17:09Z

Why would we want to forbid anyone from pushing to a vector returned by MemWriter::unwrap?

alexcrichton · 2014-03-27T17:20:35Z

We wouldn't forbid it, DST provides cheap transitions between ~[T] and Vec<T>. Most clients of MemWriter::unwrap() will likely not be deconstructing the vector, and it conceptually seems more like a fozen than pushable vector to me.

lilyball · 2014-03-27T17:24:03Z

@alexcrichton Ok you're right, I was confused and was thinking that the data was stored inline. But it's stored in a separate allocation, which means it can be moved between Vec<T> and ~[T] without reallocation.

In any case, I am not opposed to returning ~[T] from APIs where it makes sense to do so. But during this transition period, any API that still deals with ~[T] makes it impossible to fix the deprecated_owned_vector lint. My feeling is we should transition the standard libraries away from ~[T] as much as possible, implement DST, and then reintroduce ~[T] where it makes sense to do so.

olsonjeffery · 2014-03-27T17:35:48Z

@alexcrichton so my issue is that, right now, in my build.. i use -D warnings.. so I have to isolate and/or remove all uses of deprecated ~[] syntax, including in IO. In my local codebase, I have a VecNgWriter that accomplishes what this pull does, wrt MemWriter. I also isolated the ~[u8] interaction w/i a function that I annotated w/ an attribute to allow the lint.

That being said, I don't think it's fair (or right) to pretty much put this on users, pending the change of ~[] to the fixed-sized vector type (which, as you observed, is appropriate in this case).

Should users have to wrangle with either 1) warning spam or 2) workarounds/massaging in their code (as I did) in order to continue using std::io in the midst of this transition (which, btw, we have no firm timeline let alone articulated plan for when/how it'll be resolved)? I think this pull provides an answer. Can you say that it'll change again after DST lands? Because this is addressing a problem that users are experiencing right now. (and for the record, I was considering doing this work myself and I'm glad @sfackler beat me to it, because I'm sloooooow).

alexcrichton · 2014-03-27T20:13:54Z

@olsonjeffery, the deprecated_owned_vector lint is turned off by default now, and I starting to think that it shouldn't be turned back on.

Again, I don't want us to get ahead of ourselves. I have not heard from anyone implementing DST as to whether removing ~[T] entirely is necessary. Until that time, I think that blindly removing ~[T] in favor of Vec<T> is unwise and possibly causing more churn than necessary.

Should users have to wrangle with either 1) warning spam or 2) workarounds/massaging in their code (as I did) in order to continue using std::io in the midst of this transition

Definitely not 1 (it's turned off), and I would very much like to deliberately change ~[T] to Vec<T> where necessary instead of blindly doing so.

Can you say that it'll change again after DST lands?

This is what I've been saying about not changing to Vec<T>. I think it may be premature in some cases because we'll eventually just revert it back to ~[T].

bill-myers · 2014-03-27T23:22:33Z

BTW, it's easy to make DST ~[T] resizeable by just putting the capacity before the data (at negative offset), or even having a memory allocator that can be asked about the size of an allocation, and using that interface to determine the capacity.

Removing Vec<T> in favor of such a thing might well be a far better approach since it means that there's aren't two ways to represent vectors with different but not very different semantics.

OTOH to make it efficient it means that you are forced to round up all ~[T] allocations to a power of k (where k is usually 2, but can be smaller), since otherwise you get quadratic time complexity.

huonw · 2014-03-27T23:34:29Z

I think the transition from current ~[] to DST ~[] will be far cleaner if ~[] is gone entirely, temporarily (e.g. we won't need to do staging for the various features in std::slice that are still being used, or carefully audit any transmutes of ~[]... because all of that will be remove).

alexcrichton · 2014-03-28T01:59:09Z

@huonw, I'm trying to emphasize that no one here is the one implementing DST, so no one knows for sure that we should remove ~[] entirely, this is all speculation. Removing all usage of this type is a huge amount of churn, and it would be nice to avoid it if we can (again, only if we can, I'm all for removing all usage if we need to).

thestinger · 2014-03-28T03:09:23Z

I think Vec<T> should still be returned in many cases when it's the type being used internally. If we require free to be passed a length in order to support faster allocators (likely), then converting from Vec<T> to ~[T] is going to require a shrink_to_fit() call which will make future pushes by the caller slower.

It's also a significant performance improvement right now, and switching to ~[T] in the future would be a very negligible one as it only drops it down to 2 words from 3, and it's usually just on the stack. In some cases there are recursive data structures where using the DST ~[T] would be an improvement, but a replacement similar (but not identical) to the old ~[T] like LLVM's TinyPtrVector would probably be even better.

thestinger · 2014-03-28T03:13:01Z

I can understand wanting to avoid the churn and avoiding a backwards compatibility hazard, but I'm not really convinced that ~[T] will be a useful type outside of cases where it's used in a generic context and the type parameter is [T]. In my opinion, the existence of ~[T] will be an unfortunate consequence of the improvements brought by DST. We'll need to figure out guidelines on when to use Vec<T> and when to use ~[T]. Frequently converting between them would be noisier than using Vec<T> everywhere.

nrc · 2014-03-30T21:10:35Z

@alexcrichton Sorry for the late reply, been away from work. I was kind of thinking the same thing myself (whether it is necessary to totally kill [T] now). To be honest, I don't know the answer. I haven't got near to that stage in the implementation. Currently I am adding unsized/sized knowledge into the compiler and that is totally unaffected. My guess is that we don't need to, but it might make things a little bit easier. I would be happy to wait and see - that is do nothing for now and only remove ~[T] when/if we know we have to.

In general I prefer to move from one solution to another, rather than from one solution to an intermediate encoding to a second solution since I think the risk of missing something is lower.

huonw · 2014-03-30T21:45:12Z

(Any response to #13165 (comment)? Functions using Vec to build their return value would preferably return that Vec, to allow callers to continue to extend/resize the vector without the pointless overhead of losing any extra capacity that was allocated.)

lilyball · 2014-03-30T22:01:17Z

@huonw I agree. I don't see any benefit to converting to ~[T]. That seems like it will be a largely useless type once we have DST. The only real use for it I can see is if you need a heap-allocated vector of fixed size but don't know what the size will be at compile-time, then creating a ~[T] would work, but that doesn't really save much over using a Vec<T> even in that case.

nrc · 2014-03-30T22:01:19Z

@huonw I hesitate to voice an opinion on library design since I have zero experience there. From the language POV, I would prefer to see [T] used wherever possible (in preference to Vec) because if we provide a nice syntax we should strive to make it the default and avoid the situation where there are non-obvious choices to make (or worse, choices based only on the aesthetic opinions of the user). It seems that the story is [T] if the size is fixed and Vec if the size can change, but then the cases you point out are difficult because one can't know the size-mutability intent ahead of time and we should expose the implementation detail that Vec is used internally to construct the result. We could have two of each method, one returning Vec, one [T], but that seems really bad.

Perhaps the ideal situation is that we return [T] and if the caller changes that immediately to Vec then somehow, magically that is free. Not sure how that could work though. Without such magic, returning Vec seems better since it is more general, but then I have the philosophical objection. Sigh.

So, that was a long-winded way of saying 'I don't know'.

huonw · 2014-03-31T00:30:50Z

The way I see it, the only time a ~[T] will have any positive effect is when storing many of them in a (mostly) immutable recursive data type (e.g. Rust's AST), so the memory savings of going from 3 words to 2 are significant. For maximum effect, this also requires a length 0 ~[T] can have no allocation (i.e. be (NULL, 0)).

Of course the philosophical objects remain.

One way to think of this would be flipping it around: if we only had Vec and then added in a new ~[T] type, would we go through and convert return values from Vec to ~[T] in instances where the callers "normally" don't need to modify the Vec?

I would think that we wouldn't do that (because makes the function slower, due to needing a shrink-to-fit to return, and penalises those cases when the caller does want to modify it), and that we would only change things to ~[T] in places like the AST.

thestinger · 2014-03-31T02:41:11Z

@nick29581: If we use lots of ~[T], then I'm worried that users will need to perform a lot of conversions explicit to and from ~[T] which will add noise.

lilyball · 2014-03-31T04:10:11Z

@thestinger Ideally, once DST lands, nobody will ever use ~[T] anymore, as there's only a very limited case where it's useful (the case that @huonw described).

huonw · 2014-04-05T12:20:39Z

Can we move forward on this and just land it? If necessary, we can convert back to ~[] post-DST; it's not like the churn from that will be particularly different to any of the other changes we make (e.g. priv on structs, the flip-flopping on the name of Chan::new & channel, the renaming of try!, removal of @mut/@).

(I'm keen to put #[deprecated] on the various methods like (~[]).push and std::io is one of the biggest remaining users of ~[] as a public interface; if not, we'd need short functions in the prelude for doing the (expensive) conversion between ~[] and Vec to justify this deprecation.)

sfackler · 2014-04-05T18:42:04Z

Rebased

alexcrichton · 2014-04-06T07:36:10Z

I commented on #13350 how I'm not fully convinced that this is the appropriate API change to make. Reverting this change will be easy enough for the API layer I believe, but moving away from push/pop on ~[T] is important for now I believe.

There's a little more allocation here and there now since from_utf8_owned can't be used with Vec.

`Reader`, `Writer`, `MemReader`, `MemWriter`, and `MultiWriter` now work with `Vec<u8>` instead of `~[u8]`. This does introduce some extra copies since `from_utf8_owned` isn't usable anymore, but I think that can't be helped until `~str`'s representation changes.

Properly handle break resolution inside non-breakable expressions We now diagnose invalid `continue` expressions and properly handle things like `async` blocks which prevent labels from resolving further. Cleaned this up since `label_break_value` is on the way to stabilization in rust (🎉 finally) and we weren't handling breaks for blocks properly yet.

chore: fix some comments fix some comments *Please write a short comment explaining your change (or "none" for internal only changes)* changelog: none

sfackler added 3 commits April 6, 2014 15:39

De-~[] Reader and Writer

d0e60b7

There's a little more allocation here and there now since from_utf8_owned can't be used with Vec.

De-~[] Mem{Reader,Writer}

49a8081

De-~[] IO utils

fcf9b30

bors closed this Apr 7, 2014

bors merged commit fcf9b30 into rust-lang:master Apr 7, 2014

chris-morgan added a commit to chris-morgan/rust-http that referenced this pull request Apr 7, 2014

[T] ↔ Vec<T> as appropriate, rust-lang/rust#13165.

59cdcdc

Kroisse added a commit to Kroisse/rust-mustache that referenced this pull request Apr 9, 2014

Follow rust-lang/rust#13165

fcc8835

sfackler deleted the io-vec branch May 15, 2014 05:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some de-~[] work in std::io #13165

Some de-~[] work in std::io #13165

sfackler commented Mar 27, 2014

lilyball commented Mar 27, 2014

alexcrichton commented Mar 27, 2014

lilyball commented Mar 27, 2014

alexcrichton commented Mar 27, 2014

sfackler commented Mar 27, 2014

alexcrichton commented Mar 27, 2014

lilyball commented Mar 27, 2014

olsonjeffery commented Mar 27, 2014

alexcrichton commented Mar 27, 2014

bill-myers commented Mar 27, 2014

huonw commented Mar 27, 2014

alexcrichton commented Mar 28, 2014

thestinger commented Mar 28, 2014

thestinger commented Mar 28, 2014

nrc commented Mar 30, 2014

huonw commented Mar 30, 2014

lilyball commented Mar 30, 2014

nrc commented Mar 30, 2014

huonw commented Mar 31, 2014

thestinger commented Mar 31, 2014

lilyball commented Mar 31, 2014

huonw commented Apr 5, 2014

sfackler commented Apr 5, 2014

alexcrichton commented Apr 6, 2014

Some de-~[] work in std::io #13165

Some de-~[] work in std::io #13165

Conversation

sfackler commented Mar 27, 2014

lilyball commented Mar 27, 2014

alexcrichton commented Mar 27, 2014

lilyball commented Mar 27, 2014

alexcrichton commented Mar 27, 2014

sfackler commented Mar 27, 2014

alexcrichton commented Mar 27, 2014

lilyball commented Mar 27, 2014

olsonjeffery commented Mar 27, 2014

alexcrichton commented Mar 27, 2014

bill-myers commented Mar 27, 2014

huonw commented Mar 27, 2014

alexcrichton commented Mar 28, 2014

thestinger commented Mar 28, 2014

thestinger commented Mar 28, 2014

nrc commented Mar 30, 2014

huonw commented Mar 30, 2014

lilyball commented Mar 30, 2014

nrc commented Mar 30, 2014

huonw commented Mar 31, 2014

thestinger commented Mar 31, 2014

lilyball commented Mar 31, 2014

huonw commented Apr 5, 2014

sfackler commented Apr 5, 2014

alexcrichton commented Apr 6, 2014