Skip to content

Enum Layout Guarantees #177

Closed
Closed
@jswrenn

Description

@jswrenn

The motivation of this issue is two-fold:

What follows reflects my understanding of what rust already guarantees, based on (1) the reference (which is sparse, sometimes contradictory, and sometimes slightly wrong) and (2) the current observable behavior of rust in safe code.

I'm hoping for official confirmation that these guarantees are real.

Terminology

C-like Enumerations

A C-like enumeration is one in which every variant is unit-like; for example:

enum CLike {
    VariantA,
    VariantB,
    VariantC,
}

The only enumerations that may specify explicit discriminants without specifying a non-default repr are C-like enums.

Fieldless Enumerations

All C-like enumerations are field-less, because they cannot include tuple-like or struct-like variants. However, not all field-less enums are are C-like. For instance:

enum Fieldless {
    Unit,
    Tuple(),
    Struct{},
}

The only enumerations for which instances may be as-casted to their discriminant values are field-less enums.[as-casting]

Representation

The memory layout of an enumeration is well-specified iff[well-specified]:

  • the enumeration is field-less
  • the enumeration has a primitive repr

For such enumerations, the leading byte(s) reflect the discriminant value.[leading-bytes] The bytes thereafter are either padding, or correspond to the layout of the variant's fields.

Discriminant Representation

For such enumerations:

Discriminant Size

Default Representation

Under the default representation, discriminants are interpreted logically as isize. Accordingly, this is invalid:

enum Enum {
    Variant = 0i128,
    // ERROR  ^^^^^ expected isize, found i128
}

At most size_of::<isize>() bytes will be used to encode the discriminant value. If all discriminant values logically 'fit' into a smaller numeric type T, the compiler may use that smaller type T in the actual layout to encode the discriminant. Exactly size_of::<T>() bytes will be used to encode the discriminant value.[disr-size]

Primitive Representation

Under a primitive representation T, exactly size_of::<T>() bytes will be used to encode the discriminant value.[disr-size]

C Representation

Under a C representation, exactly size_of::<isize>() bytes will be used to encode the discriminant value.[disr-size]

Discriminant Value

For such an enumeration, the discriminant value of a variant is[disr-value]:

  • the explicit given value provided in the variant declaration (if any)
  • one greater than the discriminant value of the preceding variant in the enum declaration
  • zero, if the variant is the first one in the declaration and no explicit discriminant value is provided

Regardless of repr, the logical value of the discriminant will always match the actual in-memory encoding of the discriminant.[leading-bytes] That is, rust won't secretly use the byte corresponding to 3 if the logical discriminant value is 2.

(However, to reiterate: under a default representation, it may be encoded using a smaller numeric type, if the value fits.)


[as-casting]

The reference implies that it is only C-like enums that may be as-casted. This is misleading: all field-less enums may be as-casted, e.g.:

enum Fieldless {
    Unit,
    Tuple(),
    Struct{},
}

assert_eq!(Fieldless::Struct{} as isize, 2);
[well-specified]

These two conditions aren't explicitly stated anywhere in this form. The reference and safe code guarantee (or imply) the layout rules in the following sub-sections, and these just happen to be the two conditions for which those rules apparently apply.

[leading-bytes]

It doesn't seem to be stated anywhere that the leading bytes correspond to the discriminant, or that the byte encoding of the discriminant matches its logical value.

[disr-size]

Per the reference:

Under the default representation, the specified discriminant is interpreted as an isize value although the compiler is allowed to use a smaller type in the actual memory layout. The size and thus acceptable values can be changed by using a primitive representation or the C representation.

[disr-value]

The algorithm with which discriminant values are assigned is documented in the reference., and observable via as-casting in the playground.

I'm extrapolating that these rules should also apply to enums that are not fieldless, since I would be surprised (for enums under a primitive repr) if the presence fields in a variant impacted its discriminant value. (Currently, it doesn't, but there's no safe way to extract the discriminant value of such enums, so this behavior is arguably unspecified at the moment.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-layoutTopic: Related to data structure layout (`#[repr]`)C-open-questionCategory: An open question that we should revisit

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions