Skip to content

Clarify cast rules, especially regarding fat pointers. #1052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 18, 2015

Conversation

arielb1
Copy link
Contributor

@arielb1 arielb1 commented Apr 10, 2015

Fixes issue #1046.

@nrc nrc self-assigned this Apr 10, 2015
@@ -321,18 +321,26 @@ following holds:

* `e` has type `T` and `T` coerces to `U`;

* `e` has type `*T` and `U` is `*U_0` (i.e., between any raw pointers);
* `e` has type `*T`, `U` is `*U_0`, and either `U_0: Sized` or
unsize_kind(`T`) = unsize_kind(`U_0`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. a fat pointer (*const Trait) can be truncated to thin pointer (*const u8)? I don't think it should be done with as. Fat pointers definitely should have some methods for extracting their components without shady transmutes, but as should convert only between pointers of the same kind IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If U_0: Sized then the cast is legal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that it's legal, it's written clearly, I don't understand why it should be legal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It throws away the DST info - that's a perfectly good cast (same as u32 -> u8).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that as is one of the most overloaded entities in the language and now you are throwing even more semantics into it without sufficient reasons. Ideally even u32 -> u8 would be done with a specialized method, like truncate() and not with omnipotent as.

@nrc
Copy link
Member

nrc commented Apr 10, 2015

LGTM!

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 10, 2015

Are bool -> numeric casts (false as u32) intentional? They work, but are not in any rule.

@petrochenkov
Copy link
Contributor

Assume we have an empty slice or an empty array:

let a: *const [u8] = &[];
let b: &[u8; 0] = &[];

They are perfectly safe per se, but what does happen when we cast them to pointers?

let aa = a as *const u8; // ptr-ptr-cast
let bb = b as *u8; // array-ptr-cast

As a result aa or bb can contain 1) null 2) invalid pointer outside of any range 3) valid pointer outside of the original range. None of these variants looks especially safe.

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 10, 2015

@petrochenkov

You can also do 0x66666666 as *const u8 (addr-ptr-cast), so this being unsafe isn't really relevant.

@petrochenkov
Copy link
Contributor

Yeah, I see, slice.as_ptr() for example is unchecked too and can return garbage as well. I expected some Option-returning approach from such methods.

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 10, 2015

Anyway, do we want to allow usize/ptr -> fn casts? They are not currently allowed and not in any rule.

@petrochenkov
Copy link
Contributor

I guess they are disallowed because fn() is supposed to be safe and non-nullable "function reference" and not "raw function pointer". I.e. for the same reason why usize -> &T isn't allowed.
Edit: I.e. let a = 0 as fn(); a(); would be UB without unsafe block.

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 10, 2015

Questions remaining:
are fat ptr -> usize casts legal (rustc: ICE)
are usize -> fn casts legal (rustc: not)
are thin ptr -> fn casts legal (rustc: not)
are bool -> int casts legal (rustc: yes)

@nrc
Copy link
Member

nrc commented Apr 10, 2015

@arielb1

are fat ptr -> usize casts legal (rustc: ICE)

I think yes, but there are valid reasons for no

are usize -> fn casts legal (rustc: not)

no this should at the least be a double cast usize -> ptr -> fn (but see below)

are thin ptr -> fn casts legal (rustc: not)

I'm not sure about how/if this should work, ping @nikomatsakis for an opinion

are bool -> int casts legal (rustc: yes)

yes, they should be, it's often useful and is C-like

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 11, 2015

Also, there is a char -> integer cast, which is also allowed

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 13, 2015

Today's rustc allows fn -> usize, fn -> *ptr casts. Also, it allows ptr -> any int casts, which are used rather often.

@nikomatsakis
Copy link
Contributor

are thin ptr -> fn casts legal (rustc: not)

I think this should remain illegal, for the reason that @petrochenkov highlighted. That is, casts into safe types like fn ought to be unsafe (vs cases out of such types or into an unsafe ptr etc, where it's less imp't).

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 13, 2015

but fn -> thin ptr/usize casts should be legal (they are, currently).

@nikomatsakis
Copy link
Contributor

@arielb1 right, that seems ok to me.

@nrc
Copy link
Member

nrc commented Apr 20, 2015

I notice that we actually allow casting of raw pointers to and from any integral type, not just usize. I expect this is not what we intend (and of course its backwards incompatible). Does anyone think we should be allow to do this?

@petrochenkov
Copy link
Contributor

IMO, the conversions to/from any integral type except for usize/isize should be prohibited, then there will be one place less where we have to think about truncation.
If someone really wants to convert a pointer to u8, he always can use double as. Even if these conversions are allowed, their semantics will be defined in terms of double conversions anyway, this is one more argument for not merging them into one operation.

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 20, 2015

@petrochenkov

liblibc defines size_t = u64 or u32, and therefore has to use this conversion a lot.

@petrochenkov
Copy link
Contributor

@arielb1
It's a separate problem. Conversions ptr <=> libc::size_t assume that libc::size_t is pointer sized while it is not guaranteed. If libc::size_t is not pointer sized, then these conversions shouldn't compile.
At the same time libc::size_t is indeed pointer sized for wide range of platforms including all the supported ones, so making libc::size_t a typedef for usize would be a reasonable solution. I'll try to make this change and evaluate the impact.

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 20, 2015

@petrochenkov

liblibc does this a lot, through. And I think changing size_t to usize would be a big breaking change.

@petrochenkov
Copy link
Contributor

Another way is to postulate that usize is a newtype for one (implementation defined) of the explicitly sized types and they can be used interchangeably in some contexts, for example casting. So, ptr as u32 will compile on 32-bit machine, but ptr as u8 will not.

Update: making libc::size_t an alias of usize doesn't actually break anything in Rust codebase, but leads to a lot of warnings about using improper types (usize) in FFI. So, the solution from this comment seems to be better than the previous one, given the circumstances.

A mix of the original RFC and facts on the ground.
@arielb1
Copy link
Contributor Author

arielb1 commented May 5, 2015

I now have something that works.


where `&.T` and `*T` are references of either mutability,
and where unsize_kind(`T`) is the kind of the unsize info
in `T` - a vtable or a length (or `()` if `T: Sized`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are potentially two ways to interpret this "definition" for unsize_kind ... either:

  • unsize_kind says whether the type is a trait, or a slice, or a sized type (i.e. there are only three possible values that it returns), or
  • unsize_kind says the same thing but also includes which trait (i.e. which vtable) for the first case, and which particular length for the second case (i.e. that its a much larger range of potential values).

I infer that the intention is the latter; i.e. that it is not legal e.g. to cast from one trait-object to another with an unrelated vtable via as, nor is it legal to cast from one array to another with a different length. But it still would be nice if the text didn't require any thinking on this point.

Anyway, I wouldn't block this change based on this detail; we can always come up with a later PR to try to clarify this later.

@nikomatsakis
Copy link
Contributor

@pnkfelix

I infer that the intention is the latter; i.e. that it is not legal e.g. to cast from one trait-object to another with an unrelated vtable via as, nor is it legal to cast from one array to another with a different length. But it still would be nice if the text didn't require any thinking on this point.

I also took the latter interpretation.

@nrc
Copy link
Member

nrc commented Jun 3, 2015

Our interpretation was the former (iirc). If you're casting its up to you to ensure you aren't doing something stupid.

@nikomatsakis
Copy link
Contributor

Hmm. I'm getting a bit worried about this trait casting point. In particular, if we permit *Foo to be cast to *Bar for two arbitrary traits Foo and Bar, does this mean we just copy the vatble? It's hard to imagine a scenario where this could be correct -- but also, eventually we'd like to support upcasts, which will require adjusting the vtable, and it seems odd for arbitrary *Foo to *Bar casts to be permitted but sometimes cause adjustments to the vtable and sometimes not.

@nikomatsakis
Copy link
Contributor

Hmm, based on the outstanding questions concerning trait object casts and &mut [T] -> *mut T, I'm somewhat inclined to extend the FCP for another week.

@nrc
Copy link
Member

nrc commented Jun 3, 2015

We discussed a bit further on IRC, we agreed that while casting pointer to array seems reasonable, we couldn't come up with a good use case for casting trait objects.

I propose we disallow casting trait objects entirely. We allow casts from *[T] to *[U] and do not check the size of the array in such casts. We continue to allow *Trait to *u8 (etc.). That we allow casts from &mut [T; n] to *mut T.

@pnkfelix
Copy link
Member

pnkfelix commented Jun 3, 2015

I realize now that my note, or at least my inferred interpretation, did not make much sense.

  • I agree that If unsize_kind forced the vtables to be equal, then *Trait1 as *Trait2 would seem pointless, since it effectively would require Trait1 to be identical to Trait2 (modulo things that are not reflected in the vtables, e.g. lifetime parameters that are erased). So that part of my note seems silly. (Though as noted above, the alternative also seems problematic.)
  • Likewise, for unsize_kind returning something with an array length, the problem now is of course that the type [T] does not have a length built in, and thus (expr: *[T1]) as *[T2] cannot actually "check" whether the target type "has" the same array length as the input.
    • So I guess the semantics is ... that it just copies the length over directly? (Or, alternatively, does it attempt to correct by multiplying the input length by size_of::<T2>() / size_of::<T1>() ?)
    • I infer from @nrc's note that we "do not check the size of the array in such casts" is probably meant to imply that we copy the length over directly, but who knows, maybe its the other way.

(I suppose we might consider at least lint-warning that size_of::<T1>() == size_of::<T2> when we encounter (e: *[T1]) as *[T2] ... but we need not hold up this RFC on the definition of such a lint.)

  • Of course if we were to start linting for that, then we probably would also start linting for the same size matching condition for (e: *T1) as *T2 in general.

@pnkfelix
Copy link
Member

pnkfelix commented Jun 9, 2015

@arielb1 As the RFC author, do you have any thoughts on the last minute back-and-forth on the details of trait-objects and &mut casts?

It seems like this RFC is likely to be accepted with amendments, but I am not sure whether you would prefer to make such amendments yourself, or if you are okay with the person doing the merge doing such amendments, or if you in fact disagree with the proposed amendments?

@arielb1
Copy link
Contributor Author

arielb1 commented Jun 9, 2015

@pnkfelix

I am not sure casts between arrays of different types should try to compensate for the length difference - the sizes can be non-multiples of each-other, making it unclear which way the rounding should go.

I expect raw fat pointer casting to be used in the same way as raw thin pointer casting - i.e., the pointer will either be cast back to the right type before being used, or the cast-type is sufficiently representation-equivalent to the expr-type (e.g. casting *const Trait<X> -> *const Trait<Newtype<X>>).

I am fine with making the vtable kind include the trait id (i.e. being Length, Vtable(&TraitDef)).

@nikomatsakis
Copy link
Contributor

I feel pretty strongly that we should not permit *Trait1 to *Trait2. It is possible to do (today) with a transmute, and I would like to support implicit upcasting. Adding the trait-id to the unsize kind would be fine, but of course that means one can only convert from the same trait to the same trait. Presumably this is covered also by the implicit conversion rule, since I think that T -> T "conversions" are permitted.

I'm of two minds about the slice casts. It's hard for me to imagine a scenario where it makes sense to do the cast unless the sizes are equal -- this is a bit different from *T -> *U, because there it could make sense so long as sizeof(U) <= sizeof(T). But I'm sort of ok with letting people shoot themselves in the foot here. I'd love to see an actual use case though.

@arielb1
Copy link
Contributor Author

arielb1 commented Jun 9, 2015

@nikomatsakis

An actual case for casting things of different sizes? I don't think there is really any, but we don't know the sizes at typeck time of course.

@nikomatsakis
Copy link
Contributor

@arielb1 for casting *[T] to *[U]. (Note also that transmute works fine here -- in general, my opinion is we should be moving most "crazy casts" to transmutes, and gradually making the as operator obsolete.)

@nikomatsakis
Copy link
Contributor

@arielb1 I can easily imagine converting e.g. to *[u8] or something, but in that case the length is almost certainly wrong...

@nikomatsakis
Copy link
Contributor

to be clear, I don't really have a problem with allowing *[T] to *[U] casts, I just don't quite know how useful they'll be. But I can imagine it'll be useful here or there.

@nikomatsakis
Copy link
Contributor

So it seems to be me that everyone is roughly on the same page that:

  • *Trait1 -> *Trait1 should be allowed, because it's a no-op.
  • *Trait1 -> *Trait2 should not be allowed, because we want to assign that meaningful semantics later.
  • *[T] -> *[U] could be allowed, but it should not do any magic to alter the length if it is allowed.
    • User beware.

Is this an accurate summary? If so, @arielb1, would you prefer to edit the text appropriately, or should I do so when merging?

cc @rust-lang/lang

@arielb1
Copy link
Contributor Author

arielb1 commented Jun 11, 2015

Changed.

@nikomatsakis
Copy link
Contributor

It's official... the language subteam has decided to accept this RFC.

@nikomatsakis
Copy link
Contributor

It seems that there is at least some impl work remaining here, in that the compiler currently accepts a bit too much, at least with regard to object casts (though I see trans errors, which is nice):

http://is.gd/1jCOSz

@nikomatsakis
Copy link
Contributor

Tracking issue: rust-lang/rust#26391

@nikomatsakis nikomatsakis merged commit 5646632 into rust-lang:master Jun 18, 2015
@petrochenkov
Copy link
Contributor

liblibc defines size_t = u64 or u32, and therefore has to use this conversion a lot.

This argument is no longer valid.
I still think that conversions of pointers to/from non-pointer-sized integers are dangerous and almost always done accidentally by mistake. They should be deprecated at least.

@Centril Centril added A-typesystem Type system related proposals & ideas A-expressions Term language related proposals & ideas A-machine Proposals relating to Rust's abstract machine. A-cast Proposals relating to explicit casts. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cast Proposals relating to explicit casts. A-expressions Term language related proposals & ideas A-machine Proposals relating to Rust's abstract machine. A-typesystem Type system related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants