Skip to content

[BUG] move-assignment operators defaulting to memberwise semantics makes it impossible to cleanup old value #475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbatez opened this issue May 30, 2023 · 19 comments
Labels
bug Something isn't working

Comments

@jbatez
Copy link
Contributor

jbatez commented May 30, 2023

Consider a smart pointer type for IUnknown that automatically calls IUnknown::Release where appropriate:

ComPtr: type = {
    m_raw: *IUnknown;

    // ...

    operator=: (inout this, move that) = {
        // Take the raw pointer from `that` first.
        // If `&that == &this`, setting it to nullptr
        // first ensures we don't release it.
        new_raw := std::exchange(that.m_raw, nullptr);
        if m_raw { m_raw*.Release(); }
        m_raw = new_raw;
    }

    // ...
}

cppfront generates the following code for the move-assignment operator:

    auto ComPtr::operator=(ComPtr&& that) noexcept -> ComPtr& {
        m_raw = std::move(that).m_raw;
        // Take the raw pointer from `that` first.
        // If `&that == &this`, setting it to nullptr
        // first ensures we don't release it.
        auto new_raw {std::exchange(std::move(that).m_raw, nullptr)}; 
        if (m_raw) {CPP2_UFCS_0(Release, (*cpp2::assert_not_null(m_raw))); }
        m_raw = std::move(new_raw);
        return *this;
    }

cppfront inserted m_raw = std::move(that).m_raw; before my code, meaning there's no way for me to call Release() on the old value.

@jbatez jbatez added the bug Something isn't working label May 30, 2023
@jbatez jbatez changed the title [BUG] assignment operators defaulting to memberwise semantics makes it impossible to cleanup old value [BUG] move-assignment operators defaulting to memberwise semantics makes it impossible to cleanup old value May 30, 2023
@jbatez
Copy link
Contributor Author

jbatez commented May 30, 2023

I wonder if the general structure of generated move-assignment operators should be something more like:

auto Foo::operator=(Foo&& that) noexcept -> Foo& {
    auto tmp {std::move(that)};
    this->destructor_logic();
    member1 = std::move(tmp.member1);
    member2 = std::move(tmp.member2);
    // ...
    // insert given logic here
    // ...
    return *this;
}

where destructor_logic() is a function that contains the logic explicitly written for the destructor without calling any of the implicit member destructors.

Anecdotally, this would work for all the move-assignment operators I tend to write and would mean I could rely on the auto-generated move-assignment if I only wrote the operator=: (out this, move that) move-constructor.

Alternatively, it could be something like:

auto Foo::operator=(Foo&& that) noexcept -> Foo& {
    if (this != &that) {
        this->destructor_logic();
        member1 = std::move(that.member1);
        member2 = std::move(that.member2);
    }
    // ...
    // insert given logic here
    // ...
    return *this;
}

@JohelEGP
Copy link
Contributor

Some background: operator= always defaults to memberwise.

cppfront already supports assigning out parameters before this' members.
See #369 (comment)'s https://cpp2.godbolt.org/z/j54j65j9q.

meaning there's no way for me to call Release() on the old value.

It's possible if you use an IIFE.

See https://cpp2.godbolt.org/z/e5aEe14b6:

Cpp2
ComPtr: type = {
    m_raw: *IUnknown;

    // ...

    operator=: (inout this, move that) = {
        m_raw = :() -> _ = {
          // Take the raw pointer from `that` first.
          // If `&that == &this`, setting it to nullptr
          // first ensures we don't release it.
          new_raw := std::exchange(that.m_raw, nullptr);
          if m_raw { m_raw*.Release(); }
          return new_raw;
        }();
    }

    // ...
}
Cpp1
    auto ComPtr::operator=(ComPtr&& that) noexcept -> ComPtr& {
        m_raw = [&]() -> auto{auto new_raw {std::exchange(std::move(that).m_raw, nullptr)}; if (m_raw) {CPP2_UFCS_0(Release, (*cpp2::assert_not_null(m_raw)));}return new_raw; }();
        return *this;

          // Take the raw pointer from `that` first.
          // If `&that == &this`, setting it to nullptr
          // first ensures we don't release it.

    }

Normally, that'd require captures.
But notice how it lowers to a Cpp1 lambda with [&] capture.
It's undocumented, AFAIK, but it seems to be supported.

@jbatez
Copy link
Contributor Author

jbatez commented May 30, 2023

I suppose it is possible. But definitely unintuitive, and could be awkward if you're dealing with multiple member variables. I'd argue that doing something destructor-ish with the old value is the norm for custom move-assignment operators and you shouldn't have to jump through IIFE hoops to do it.

This isn't the only case where you want to run code before memberwise assignment. Argument validation is another big one, and that'd be useful for constructors as well as assignment operators.

@JohelEGP
Copy link
Contributor

It makes sense to lift the restriction that the memberwise assignments must come first.
There should be no problems, thanks to guaranteed initialization and the required memberwise order.

JohelEGP referenced this issue Nov 8, 2023
By writing separate construction and assignment, plus the new feature of suppressing assignment to a member by writing `member = _ ;` (now allowed only in assignment operators).

I do realize that's an "opt-out" which I normally prefer to avoid, but:

- I considered and decided against (for now) the alternative of not having assignment be memberwise by default. I want to keep the (new to Cpp2) default of memberwise semantics for assignment as with construction. I think that's a useful feature, and normally if you do assign to a member it doesn't arise, and so I think it makes sense to explicitly call out when we're choosing not to do any assignment at all to a member before doing other assignment processing. We'll get experience with how it goes.

- `_` is arguably natural here, since it's pronounced "don't care." There too, we'll see if that is natural generalized, or feels strained. For now it feels natural to me.
@JohelEGP
Copy link
Contributor

JohelEGP commented Nov 8, 2023

From commit cdf71bd, generalizing member = value; would solve this.

Is it just because it's too complex?
I think that there is a rule "The first statements of the function must be assignments to the member variables, and they must appear in order. If they don't, then the default assignment will be inserted."

Exactly. I'm considering using the definite-first-use logic for these initializations/assignments too (which would allow that if), and a lot of the machinery for that is already enabled, but I haven't gone that far yet.

Actually, would simply using member _ =; work for your exact use case?

@hsutter
Copy link
Owner

hsutter commented Nov 8, 2023

Thanks @jbatez ! Yes, the new member = _ ; feature I just pushed in cdf71bd does cover this use case... here is your code plus the line m_raw = _ ; at the top of the assignment operator:

ComPtr: type = {
    m_raw: *IUnknown;

    // ...

    operator=: (inout this, move that) = {
        m_raw = _ ;
        // Take the raw pointer from `that` first.
        // If `&that == &this`, setting it to nullptr
        // first ensures we don't release it.
        new_raw := std::exchange(that.m_raw, nullptr);
        if m_raw { m_raw*.Release(); }
        m_raw = new_raw;
    }

    // ...
}

and I checked that the assignment operator now correctly lowers to this:

    auto ComPtr::operator=(ComPtr&& that) noexcept -> ComPtr& {

        // Take the raw pointer from `that` first.
        // If `&that == &this`, setting it to nullptr
        // first ensures we don't release it.
        auto new_raw {std::exchange(std::move(that).m_raw, nullptr)}; 
        if (m_raw) {CPP2_UFCS_0(Release, (*cpp2::assert_not_null(m_raw))); }
        m_raw = std::move(new_raw);
        return *this;
    }

@hsutter
Copy link
Owner

hsutter commented Nov 8, 2023

Regarding implementing the assignment operator in terms of destruction + re-construction... that's possible, but has some issues.

The main one is that it could lead to observable difference in behavior between copy assignment and move assignment. We wouldn't want to implement copy assignment in terms of destroy+construct for efficiency (e.g., a string or vector could not reuse storage, it would have to release all its storage and then reallocate new storage again). But if we did do that for move assignment only, then a user could experience different side effects only for move assignment, such as a de-registration by destructor logic and a re-registration if, say, the object is related to other objects and self-registers itself somewhere.

For completeness: See also GotW #23, but your suggestion as written already avoids most of the pitfalls mentioned there, which is great. 👍

@hsutter hsutter closed this as completed Nov 8, 2023
@JohelEGP
Copy link
Contributor

JohelEGP commented Nov 8, 2023

#518 should be closed, too.

@jbatez
Copy link
Contributor Author

jbatez commented Nov 9, 2023

Thanks! m_raw = _; does indeed get the job done.

I still find the opt-out nature of it unintuitive, though, fwiw. And the fact that operator=: (inout this, move that) has different defaults than operator=: (inout this, move other: ComPtr) is surprising. I'm just one data point, though. 🙂

@jcanizales
Copy link

I still find the opt-out nature of it unintuitive, though, fwiw.

The fact that there's magical invisible code that you have to act on is indeed One More Thing To Teach, instead of One Less Thing.

@hsutter
Copy link
Owner

hsutter commented Nov 10, 2023

Thanks @jbatez and @jcanizales, let me ask a clarifying question to make sure I understand what is surprising... given code like this:

x: type = {
    member1: std::string = ();
    member2: std::string = ();

    operator=: (out this, move that) = {}
    operator=: (out this, move other: int) = {}

    operator=: (inout this, move that) = {}
    operator=: (inout this, move other: int) = {}
}

and those four functions compiles to this

x::x(x&& that) noexcept
                                    : member1{ std::move(that).member1 }
                                    , member2{ std::move(that).member2 }
{}
x::x(int&& other){}

auto x::operator=(x&& that) noexcept -> x& {
                                      member1 = std::move(that).member1;
                                      member2 = std::move(that).member2;
                                      return *this;
}
auto x::operator=(int&& other) -> x& {
                                            member1 = {};
                                            member2 = {};
                                            return *this;
}

Is the concern the member = {}; defaults in the fourth function (converting assignment)? If it didn't do those initializations by default, my thought was that would be inconsistent with the other three... for example, right now you would get the identical semantics by omitting the inout this ones:

//  This should currently compile to the identical Cpp1 functions as above
x: type = {
    member1: std::string = ();
    member2: std::string = ();

    operator=: (out this, move that) = {}
    operator=: (out this, move other: int) = {}
}

... so I was concerned about it being inconsistent if the programmer got different defaults if they wrote assignment themselves.

So to me the defaults seem the same... but I totally accept that to you the defaults seem different, and I'd like to understand that more. What you would like these four functions to do differently? I'm all ears!

@jcanizales
Copy link

jcanizales commented Nov 10, 2023

Is the concern the member = {}; defaults in the fourth function (converting assignment)?

I think there's a bit of a disconnect or a jump in reasoning that wasn't written down, because @jbatez wasn't writing a converting assignment.

In my case:

If you tell me, in Cpp2 to get the default [regular / copy / move] constructor instead of operator=: (out this, that) = default you write = {}, that's simple enough. But I don't expect that to also mean "and if you write something inside the {}, the compiler is going to inject invisible code before what you wrote". The natural mental model for me is that if I write anything I'm forfeiting the default code. But maybe the intended way to teach this is "operator= members always get memberwise assignment prepended to the function body", without mention at all of a default implementation (and so = {} has no special meaning, it's just that these functions always get extra code). In that case it is simple enough I guess (although "why is my code after the invisible code and not before", etc.).

Even less intuitive to me is "you can modify the invisible code by knowing what the code is and writing opt-outs for specific lines".

I am aware that I'm not bringing an alternative solution.

Now, re the question about the defaulted converting constructor, which I'm not sure how it's related. I don't think a default exists semantically for that case. An implementation that doesn't use the argument seems obviously wrong to me, a bug; and so I don't see the benefit of providing it.

@jcanizales
Copy link

OTOH, if the restriction that "assignments must appear first" is lifted, like @JohelEGP mentions here, then I think the rule "If the body of an operator= skips a member, the member's default initializer is used." is simple to understand. @jbatez 's code assigns the member variable, so it would work as expected without any strange incantation needed above it.

That still leaves the converting assignment buggy if the user doesn't write a body. But I think a compiler should be able to flag the unused argument?

@gregmarr
Copy link
Contributor

Thanks @jbatez and @jcanizales, let me ask a clarifying question to make sure I understand what is surprising... given code like this:

@hsutter
An important point in how what @jbatez wrote is different than what you wrote:

x: type = {
...
    operator=: (inout this, move that) = {}
// Not this function:
//    operator=: (inout this, move other: int) = {}
// But this function:
    operator=: (inout this, move other: x) = {}
}

So the only difference between the two operators is that with the special name and implicit x vs other: x with a non-special name and explicit x. The difference in code:

    auto x::operator=(x&& that) noexcept -> x& {
                                          member1 = std::move(that).member1;
                                          member2 = std::move(that).member2;
                                          return *this;
    }

    auto x::operator=(x&& other) -> x& {
                                              member1 = {};
                                              member2 = {};
                                              return *this;
    }

@hsutter
Copy link
Owner

hsutter commented Nov 10, 2023

Thanks, let me noodle on these suggestions. Maybe it's time to pick up pursuing the init-before-use rather than init-first semantics.

@gregmarr That's a good example too, and I should probably flag the case where the x parameter is not named that. But I was trying to delve into the case Jo described as "the fact that operator=: (inout this, move that) has different defaults than operator=: (inout this, move other: ComPtr) is surprising" (I changed ComPtr to int but it's a what I call a converting op=).

Again, thanks!

@gregmarr
Copy link
Contributor

gregmarr commented Nov 10, 2023

@hsutter

(I changed ComPtr to int but it's a what I call a converting op=).

Even though ComPtr is the name of the class, so move that and move other: ComPtr result in the same type for the rhs parameter?

@gregmarr
Copy link
Contributor

gregmarr commented Nov 10, 2023

I think flagging move other: Type where Type is the type of the class containing the operator= with "use move that instead" would eliminate the confusion about the differing behavior of the two functions.

@hsutter
Copy link
Owner

hsutter commented Nov 10, 2023

Ah, I missed that this was referring to the same type. OK, I should flag that with a 'use move that instead.' Thanks for clarifying!

hsutter added a commit that referenced this issue Nov 10, 2023
Also require an `operator=` second parameter of the same type to be named `that` - see #475 discussion thread
@jbatez
Copy link
Contributor Author

jbatez commented Nov 11, 2023

I personally find that most the implicit behavior that's exclusive to operator= isn't helpful and just adds to cognitive load:

  • operator=: (out this, other: OtherType) implicitly generates operator=: (inout this, other: OtherType), but no other operators do this afaik, so it's an exception to learn. In-fact, I don't think that's even a normal thing to do explicitly in cpp1 syntax, especially considering cpp2 constructors generated from the above are explicit by default. Take for example std::vector(const Allocator&) which doesn't have a corresponding assignment operator overload. Even for implicit conversion constructors, the corresponding assignment operator is either different enough that you couldn't auto-generate it (e.g. re-using allocated storage), or simple enough that you can just omit it, let the conversion constructor do the conversion, and then let the normal copy or move assignment operator take care of assignment.
  • operator=: (... that) generates implicit member-wise copies/moves whereas all other operator= definitions inherit the defaults from the member declarations. I understand the desire to cut-down on boiler-plate, but since it's different behavior from other very similar-looking code, I'd rather it be opt-in somehow than opt-out.
  • operator=: (inout this, ...) implicitly generates member assignments but other assignment operators (e.g. operator+=) don't. I understand the desire to make operator=: (inout this, ...) match operator=: (out this, ...), but conceptually, I don't think of the implicit member construction in constructors as "behavior" that needs to be duplicated, but rather as the default state of the members before I get to my constructor body. If you look at it that way, then the expected state of inout this members before assignment operator code starts executing is the state they were in before the function was invoked at all, not after some implicit member changes.

I could keep going, but the theme of all my complaints is the same. Getting rid of boilerplate is nice, but when the implied behavior is different from the rest of the language, I think it should be opt-in. My proposal:

  • All operator=: (out this ...) (constructors) should get their member construction defaults the same way. that member-wise copy/move behavior should be opt-in somehow (e.g. operator=: (out this, that) = { this.* = that.*; } or something like that).
  • Only that constructors (and maybe implicit conversion constructors?) should auto-generate corresponding assignment operators, converting the member-wise constructor arguments into assignments. Auto-generating assignment operators for other types of constructors doesn't always make sense.
  • Explicit assignment operators that aren't auto-generated from constructors shouldn't have any implicit member-wise assignment. In my experience, they either always match the corresponding constructor exactly (in which case, just exclude it and let it get auto-generated), or they have some non-trivial behavior and don't need defaults (like my original ComPtr example).

zaucy pushed a commit to zaucy/cppfront that referenced this issue Dec 5, 2023
Also require an `operator=` second parameter of the same type to be named `that` - see hsutter#475 discussion thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants