Skip to content

oneof: generated code uses pointers for non-primitive types #78

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tamird opened this issue Sep 19, 2015 · 3 comments
Closed

oneof: generated code uses pointers for non-primitive types #78

tamird opened this issue Sep 19, 2015 · 3 comments

Comments

@tamird
Copy link
Contributor

tamird commented Sep 19, 2015

See:

Msg *Strings `protobuf:"bytes,10,opt,name=msg,oneof"`

There's no need for this to be a pointer, since the union interface is implemented for a pointer of the wrapping struct. The downside of using a pointer here is that it results in an additional allocation.

@dsymonds
Copy link
Contributor

While what you say is true, it's a pointer for symmetry with all the other occurrences of messages.

@awalterschulze
Copy link

Please just think about this for another few seconds before dismissing this.

@atombender
Copy link

This should be reopened and addressed.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 3, 2018
…chResponse

All Requests and Responses pass through RequestUnion/ResponseUnion structs
when they are added to BatchRequests/BatchResponses. In order to ensure
that only one Request type can be assigned to one of these RequestUnion
or ResponseUnion structs, we currently use gogoproto's approach to tagged
unions: the `gogoproto.onlyone` option.

This option was introduced before proto3. Proto3
then added the `oneof` option, which for all intents and purposes addresses
the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof.
However, there is one major difference between the two options, which
is in their generated code. `gogoproto.onlyone` will generate
a single flat struct with pointers to each possible variant type.
`oneof` will generate a union interface and an interface "wrapper"
struct for each variant type. The effect of this is that `onlyone`
will generate code that looks like this:

```
type Union struct {
    Variant1 *Variant1Type
    Variant2 *Variant2Type
    ...
}
```

While `oneof` will generate code the looks like this:

```
type Union struct {
    Value isUnion_Value
}

type isUnion_Value interface {
    ...
}

type Union_Variant1 struct {
    Variant1 *Variant1Type
}

type Union_Variant2 struct {
    Variant2 *Variant2Type
}
```

There are pretty obvious tradeoffs to each. For one, `oneof` introduces an
extra layer of indirection, which forces an extra allocation. It also doesn't
generate particularly useful setters and getters. On the other hand, `onlyone`
creates a large struct that grows linearly with the number of variants.
Neither approach is ideal, and there has been **A LOT** of discussion on this:
- golang/protobuf#78
- golang/protobuf#283
- gogo/protobuf#103
- gogo/protobuf#168

Clearly neither approach is ideal, ergonomically or with regard to performance.
However, over time, the tradeoff has been getting worse for us and its time we
consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These
structs have gotten huge as more and more request variants have been added:
`RequestUnion` has grown to 328 bytes and `ResponseUnion` has grown to 320 bytes.
It has gotten to the point where the wasted space is non-negligible.

This change switches over to `oneof` to shrink these union structs down to more
manageable sizes (16 bytes). The downside of this is that in reducing the struct
size we end up introducing an extra allocation. This isn't great, but we can avoid
the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping
the allocation with that of the Request/Response itself. We've seen previous cases
like cockroachdb#4216 where adding in an extra allocation/indirection is a net-win if it
reduces a commonly used struct's size significantly.

The other downside to this change is that the ergonomics of `oneof` aren't quite
as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and
setters called `GetValue` and `SetValue` that provide access to the wrapped
`interface{}`, which we can assert to a `Request`. `oneof` doesn't provide
such facilities. This was the cause of a lot of the discussions linked above.
While this isn't ideal, I think we've waited long enough (~3 years) for a
resolution on those discussions. For now, we'll just generate the getters
and setters ourselves.

This change demonstrated about a 5% improvement when running kv95 on my local
laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were
less pronounced but still present. kv95 showed a throughput improvement of 2.4%.
Running kv100 showed an even more dramatic improvement of 18% on the GCE cluster.
I think this is because kv100 is essentially a hot loop where all reads miss
because the cluster remains empty, so it's the best case for this change. Still,
the impact was shocking.

Release note (performance improvement): Reduce the memory size of commonly used
Request and Response objects.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 3, 2018
…chResponse

All Requests and Responses pass through RequestUnion/ResponseUnion structs
when they are added to BatchRequests/BatchResponses. In order to ensure
that only one Request type can be assigned to one of these RequestUnion
or ResponseUnion structs, we currently use gogoproto's approach to tagged
unions: the `gogoproto.onlyone` option.

This option was introduced before proto3. Proto3
then added the `oneof` option, which for all intents and purposes addresses
the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof.
However, there is one major difference between the two options, which
is in their generated code. `gogoproto.onlyone` will generate
a single flat struct with pointers to each possible variant type.
`oneof` will generate a union interface and an interface "wrapper"
struct for each variant type. The effect of this is that `onlyone`
will generate code that looks like this:

```
type Union struct {
    Variant1 *Variant1Type
    Variant2 *Variant2Type
    ...
}
```

While `oneof` will generate code the looks like this:

```
type Union struct {
    Value isUnion_Value
}

type isUnion_Value interface {
    ...
}

type Union_Variant1 struct {
    Variant1 *Variant1Type
}

type Union_Variant2 struct {
    Variant2 *Variant2Type
}
```

There are pretty obvious tradeoffs to each. For one, `oneof` introduces an
extra layer of indirection, which forces an extra allocation. It also doesn't
generate particularly useful setters and getters. On the other hand, `onlyone`
creates a large struct that grows linearly with the number of variants.
Neither approach is ideal, and there has been **A LOT** of discussion on this:
- golang/protobuf#78
- golang/protobuf#283
- gogo/protobuf#103
- gogo/protobuf#168

Clearly neither approach is ideal, ergonomically or with regard to performance.
However, over time, the tradeoff has been getting worse for us and its time we
consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These
structs have gotten huge as more and more request variants have been added:
`RequestUnion` has grown to 328 bytes and `ResponseUnion` has grown to 320 bytes.
It has gotten to the point where the wasted space is non-negligible.

This change switches over to `oneof` to shrink these union structs down to more
manageable sizes (16 bytes). The downside of this is that in reducing the struct
size we end up introducing an extra allocation. This isn't great, but we can avoid
the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping
the allocation with that of the Request/Response itself. We've seen previous cases
like cockroachdb#4216 where adding in an extra allocation/indirection is a net-win if it
reduces a commonly used struct's size significantly.

The other downside to this change is that the ergonomics of `oneof` aren't quite
as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and
setters called `GetValue` and `SetValue` that provide access to the wrapped
`interface{}`, which we can assert to a `Request`. `oneof` doesn't provide
such facilities. This was the cause of a lot of the discussions linked above.
While this isn't ideal, I think we've waited long enough (~3 years) for a
resolution on those discussions. For now, we'll just generate the getters
and setters ourselves.

This change demonstrated about a 5% improvement when running kv95 on my local
laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were
less pronounced but still present. kv95 showed a throughput improvement of 2.4%.
Running kv100 showed an even more dramatic improvement of 18% on the GCE cluster.
I think this is because kv100 is essentially a hot loop where all reads miss
because the cluster remains empty, so it's the best case for this change. Still,
the impact was shocking.

Release note (performance improvement): Reduce the memory size of commonly used
Request and Response objects.
craig bot pushed a commit to cockroachdb/cockroach that referenced this issue Jul 3, 2018
27112: roachpb: replace `gogoproto.onlyone` with `oneof` in BatchRequest/BatchResponse r=nvanbenschoten a=nvanbenschoten

All Requests and Responses pass through RequestUnion/ResponseUnion structs
when they are added to BatchRequests/BatchResponses. In order to ensure
that only one Request type can be assigned to one of these RequestUnion
or ResponseUnion structs, we currently use gogoproto's approach to tagged
unions: the `gogoproto.onlyone` option.

This option was introduced before proto3. Proto3
then added the `oneof` option, which for all intents and purposes addresses
the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof.
However, there is one major difference between the two options, which
is in their generated code. `gogoproto.onlyone` will generate
a single flat struct with pointers to each possible variant type.
`oneof` will generate a union interface and an interface "wrapper"
struct for each variant type. The effect of this is that `onlyone`
will generate code that looks like this:

```
type Union struct {
    Variant1 *Variant1Type
    Variant2 *Variant2Type
    ...
}
```

While `oneof` will generate code the looks like this:

```
type Union struct {
    Value isUnion_Value
}

type isUnion_Value interface {
    ...
}

type Union_Variant1 struct {
    Variant1 *Variant1Type
}

type Union_Variant2 struct {
    Variant2 *Variant2Type
}
```

There are pretty obvious tradeoffs to each. For one, `oneof` introduces an
extra layer of indirection, which forces an extra allocation. It also doesn't
generate particularly useful setters and getters. On the other hand, `onlyone`
creates a large struct that grows linearly with the number of variants.
Neither approach is great, and there has been **A LOT** of discussion on this:
- golang/protobuf#78
- golang/protobuf#283
- gogo/protobuf#103
- gogo/protobuf#168

Clearly neither approach is ideal, ergonomically or with regard to performance.
However, over time, the tradeoff has been getting worse for us and it's time we
consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These
structs have gotten huge as more and more request variants have been added:
`RequestUnion` has grown to **328 bytes** and `ResponseUnion` has grown to **320 bytes**.
It has gotten to the point where the wasted space is non-negligible.

This change switches over to `oneof` to shrink these union structs down to more
manageable sizes (16 bytes each). The downside of this is that in reducing the struct
size we end up introducing an extra allocation. This isn't great, but we can avoid
the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping
the allocation with that of the Request/Response itself. We've seen previous cases
like #4216 where adding in an extra allocation/indirection is a net-win if it
reduces a commonly used struct's size significantly.

The other downside to this change is that the ergonomics of `oneof` aren't quite
as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and
setters called `GetValue` and `SetValue` that provide access to the wrapped
`interface{}`, which we can assert to a `Request`. `oneof` doesn't provide
such facilities. This was the cause of a lot of the discussions linked above.
While it we be nice for this to be resolved upstream, I think we've waited long
enough (~3 years) for a resolution to those discussions. For now, we'll just
generate the getters and setters ourselves.

This change demonstrated about a **5%** improvement when running kv95 on my local
laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were
less pronounced but still present. kv95 showed a throughput improvement of **2.4%**.
Running kv100 showed a much more dramatic improvement of **18%** on the three-node
GCE cluster. I think this is because kv100 is essentially a hot loop where all reads miss
because the cluster remains empty, so it's the best-case scenario for this change. Still,
the impact was shocking.

Release note (performance improvement): Reduce the memory size of commonly used
Request and Response objects.

27114: opt/sql: fix explain analyze missing option r=asubiotto a=asubiotto

ConstructExplain previously ignored the ANALYZE option so any EXPLAIN
ANALYZE statement would result in execution as an EXPLAIN (DISTSQL)
statement. The ANALYZE option is now observed in ConstructExplain.

Additionally, the stmtType field from the explainDistSQLNode has been
removed because it was not necessary and it was unclear how to pass this
from the `execFactory`.

Release note: None

27116: Makefile: learn that roachtest depends on optimizer-generated code r=benesch a=benesch

As mentioned in cd4415c, the Makefile will one day be smart enough to
deduce this on its own, but for now it's simpler to explicitly list the
commands that require generated code. Note that the simple but coarse
solution of assuming that all commands depend on generated code is
inviable as some of these commands are used to generate the code in the
first place.

Release note: None

27119: storage: extract replica unlinking into store method r=tschottdorf a=benesch

Extract some code that was duplicated in three places into a dedicated
helper method. Prerequisite for #27061.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Alfonso Subiotto Marqués <[email protected]>
Co-authored-by: Nikhil Benesch <[email protected]>
@golang golang locked as resolved and limited conversation to collaborators Jun 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants