Skip to content

How should we handle repeated calls to the same methods in pattern matching? #2107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
munificent opened this issue Feb 10, 2022 · 6 comments
Closed
Labels
patterns Issues related to pattern matching.

Comments

@munificent
Copy link
Member

munificent commented Feb 10, 2022

Consider:

switch (something) {
  case A(a: B(b: C(c: D(d: true)): print('held true');
  case A(a: B(b: C(c: D(d: false)): print('held true');
}

So we've got two patterns that are deeply recursing into some object and destructuring it. The current proposal says that each of these named field destructurers like a: and b: are compiled to calling getters on the matched value. There are two problems here:

  1. It's potentially slow. Say the first case doesn't match because d: is false. Then we have to try the second case. That means doing the same a, b, c, and d getter calls all over again. Those can be arbitrarily complex or slow operations.

  2. It defeats exhaustiveness checking. Since each case re-evaluates those getters, there's no guarantee that they are actually reliably stable and return the same result on each invocation. This means it's not safe to rely on these values for exhaustiveness checking. A contrived getter could return differing values such that even an apparently exhaustive set of cases never actually matches one.

    For example, you might expect this to be soundly exhaustive:

    var n = switch (something) {
      case Bitbox(b: true): 1;
      case Bitbox(b: false): 2;
    }

    But Bitbox could be defined like:

    class Bitbox {
      bool get b => Random().nextBool();
    }

I think we can solve both of those by specifying that a switch caches the result of any destructuring method call. Here's a sketch:

A known value is a pair of an object and a destructuring path. A destructuring path is a (possibly empty) series of method calls that led to producing that object. For example, in this pattern:

switch (2) {
  case (named: ([{'key': 3})
}

The destructuring path to 3 is [".named", ".field0", "['key']"]. (It's not directly visible, but list patterns also call length on the list, and that gets cached too.)

The current path when evaluating a subpattern is the series of method calls that led to the value being matched by this subpattern. For the outermost pattern, the current path is empty.

  1. The initial value matched by the switch is added to the set of known values with an empty path.
  2. Before a pattern calls a method on the matched value, look for [currentPath..., method] in the set of known values. If found, use that value instead and don't call the method. Otherwise, call the method and store the result in the set of known values with that path.
  3. When recursing into a subpattern with a subvalue, append the method used to acquire that subvalue to the current path.

The set of methods called by patterns is:

  • Named getters and field_n_ getters for record and extractor patterns.
  • length and the [] operator with constant int indexes for list patterns.
  • The [] operator with constant key arguments for map patterns.
  • The == operator for literal and constant patterns with a constant value for the LHS argument.

This definitely adds some complexity to how patterns are compiled and executed, but I think it's worth it. I'm interested in other approaches.

Thoughts?

cc @leafpetersen

@munificent munificent added the patterns Issues related to pattern matching. label Feb 10, 2022
@mraleph
Copy link
Member

mraleph commented Feb 10, 2022

Do we actually have to support getters? Maybe we could limit ourselves to destructurring only "primitive" properties: meaning it must to be a field.

We can't guarantee this now within a language, but maybe that should be a feature included first, e.g. it has to be a non-virtual/sealed field or a type of object that can have its fields overriden with getters.

If class fields are virtual then I think a good alternative is to have "decomposition" method that returns a primitive object (e.g. a tuple/record/data class) in the same way as Scala unapply works. Though I guess this hits that same problem of with side-effects for nested patterns.

@lrhn
Copy link
Member

lrhn commented Feb 10, 2022

I will really prefer not to introduce a new way to inspect objects. Getters is how you deconstruct objects. They already exist, and work, and they allow you to control which properties of our object are publicly visible, while hiding the implementation.
The point of having getters to begin with is that it's an implementation choice whether that getter is implemented by a field or by something else. (Aka: Class fields are virtual, or rather, their getters are, and that's the only way to access the field contents.)

The one potential feature that gets close to non-virtual fields is stable getters, #1518. Instead of special casing fields, it special-cases some of the getters. Not a feature I'm particularly fond of, I think it's too specialized and requires opt-in from people who are not in a position to know whether it's necessary, but it's better than breaking getter/field symmetry.

So I want to support getters. Supporting anything on top of that is ... possible, but not really desirable.

I'd be fine with a switch being allowed/required to cache access paths.

I expect the switch cases to be deterministically checked from top to bottom, so it's detectable, if not predictable, which reads and checks have been done already.

switch (something) {
  case A(a: B(b: C(c: D(d: true)): print('held true'); 
  case A(a: B(b: C(c: D(d: false)): print('held true');
}

The first line does:

  • tmp = something
  • if tmp is A:
    • tmp2 = tmp.a
    • if tmp2 is B
      • tmp3 = tmp2.b
      • if tmp3 is C
        • tmp4 = tmp3.c
        • if tmp4 is D
          • tmp5 = tmp4.d
            • if tmp5 == false: Goto case 1

The second line can then be completely naive and do:

  • if something is A
    • tmp2 ??= tmp.a // only read if not already read
    • if tmp2 is B
      • tmp3 ??= tmp2.b
      • if tmp3 is C
        • tmp4 ??= tmp3.c
        • if tmp4 is D
          • tmp5 ??= tmp4.d
            • if tmp5 == true: Goto case 2

or it can optimize and recognize shared patterns and just do

  • if ?tmp5 (tmp5 was set)
    • if tmp5 == true: Goto case 2

or even inline it in the first sequence after if tmp5 == false: Goto case 1, as if tmp5 == true: Goto case 2 (or even more efficient if it knows tmp5 is definitely a bool).

That's all optimizations, the point is that it never reads the same getter twice, based on "access path" (selector chain), as @munificent writes. Only access selectors, meaning getters, but treating operator[] as a parameterized getter, optionally null-aware access. Maybe allow the ! operator (but will it throw or will it refute the pattern, or does that depend on whether its' a matcher or a binder?)

I'm not sure I want to cache method calls, because that requires also determining if it's "the same" arguments. That's probably an identity check (only way to be sure it really behaves the same), which breaks with patterns otherwise using == checks. I'd even go as far as saying "no method calls in patterns!", if you need to call a method, put it in a guard, like

  case C(c:D(d:var foo)) if (foo.contains("bar)): ...

I think that's something we can meaningfully specify, and can implement fairly efficiently (and then optimize the crap out of),
and it allows the specification to assume exhaustiveness where possible.

@munificent
Copy link
Member Author

Do we actually have to support getters? Maybe we could limit ourselves to destructurring only "primitive" properties: meaning it must to be a field.

We could, but I think it is so so useful to be able to call any getter in a named record destructuring. It means that once we add this syntax, the entire universe of existing Dart getters can now be accessed through it. You could immediately be able to write code like:

for (var (key:, value:) in someMap.entries) {
  print('$key: $value');
}

var (hour:, minute:, second:) = someDateTime;
print('$hour:$minute:$second');

switch (someUri) {
  case (scheme: 'http', path:): print('HTTP GET $path');
  case (scheme: 'https', path:): print('HTTPS GET $path');
  case (scheme: 'ftp', path:): print('FTP $path');
}

Etc. This to me is a huge part of the value proposition of the feature.

Maybe allow the ! operator (but will it throw or will it refute the pattern, or does that depend on whether its' a matcher or a binder?)

I don't think there are currently any patterns that would require executing a !. We've talked about null-check and null-assertion patterns but they aren't in the proposal (yet). A null-check pattern wouldn't use ! and would just check for null, which isn't a method call. A null-assertion pattern would use !, but if that fails, it throws an exception. It doesn't refute, so no later cases would need to be checked.

I'm not sure I want to cache method calls, because that requires also determining if it's "the same" arguments. That's probably an identity check (only way to be sure it really behaves the same),

I believe it's both safe and important to support this. I think the only method calls needed are [] with constant integer indexes for destructuring elements in list patterns and ==() for checking against literal and constant values in those patterns. For the latter, the value you are checking against is a constant, so I think it's reasonable to cache the result.

Yes, a particularly weird user-defined class could have a const constructor and a == method that returns different results when called repeatedly on the same value but... don't do that. :)

If that's really a problem, we could keep the same restriction as switch has today and require the values in literal and constant patterns to have primitive equality.

@eernstg
Copy link
Member

eernstg commented Jun 28, 2022

@lrhn wrote:

The one potential feature that gets close to non-virtual fields is stable getters,
#1518. Instead of special casing fields, it special-cases some of the
getters. Not a feature I'm particularly fond of, I think it's too specialized and
requires opt-in from people who are not in a position to know whether it's necessary,
but it's better than breaking getter/field symmetry.

I'd like to make sure that the stable getters feature is understood the way it was intended, so I have to add some comments to this characterization. I hope I don't sound too grumpy, but a tiny bit of grump seems to be OK. ;-)

The idea is that being immutable is a fundamental and API-worthy part of the declaration of a getter (any getter, be it an implicitly induced getter that comes with an instance variable declaration, or a getter declared with get). Just like the declared return type of a getter, the declared level of immutability of a getter is crucial for developers who wish to call it. It is not just an implementation detail.

That's the reason why I have to respond to this:

it's too specialized

So 'stable getters' makes it a language property. It can be detected by human beings and compilers alike, and both can use this information when they use the getter.

In particular, when a getter is stable, it is safe to evaluate it several times, and reason about the code based on the assumption that it returns the same result every time it's called on the same receiver (or it throws, but it never returns different results).

This is obviously crucial for the correctness of a large percentage of the source code that we're all reading and/or writing every single day. It just happens to be a property that we need, but currently can't know. We can only assume that we don't have to write tricky code to handle the situation where the value actually changed in the middle of an algorithm, and then we don't generally do that, and then it generally works, because lots of getters are stable. In practice.

Just like null safety, this is a property which can be made precise and sound. 'Stable getters' is a proposal for doing just that. Just like null safety, we will be able to improve on the correctness of our software when we know which cases are trivial ("this is a stable getter, so I can just go ahead and assume that it won't change while I'm using it"), and then we can put extra effort into the remaining ones ("this getter is not stable, so I have to check it 5 times during the execution of this algorithm, and I have to do some tricky recovery work if it actually changes").

Just like null safety, a major point is that lots of references aren't null. Similarly, lots of getters are actually stable. So we could just assume that null never occurs, and the results returned by a getter will never change.

But null safety was introduced because we assumed that it's better to have a precise and sound analysis, and then handle the nulls that we may actually encounter explicitly and exhaustively. This brings down the number of bugs.

So, returning to one claim:

it special-cases some of the getters

That's not a fair description. It equips every single getter with an API/type-like level property which is: This getter is stable, or this getter is not stable. No special cases, only a property that we will strictly (soundly) enforce, if the given declaration says so. It's just like num get g declares that g (and anything that might override it) is guaranteed to return a value of type num: Developers need to know this, and they can rely on it at call sites, so it's an API-level property, and it's useful in the quest to achieve software correctness.

[The stable getters feature] requires opt-in from people who are not in a position to know whether it's necessary

It is an integral part of the design of an abstraction (like an abstract class, or a concrete one for that matter, because they can also have any number of subtypes) whether any given getter in the interface is stable, because

  • It is inherently safe to assume that the value of a stable getter is the same each time it is invoked on the same receiver.
  • Any getter which is not known to be stable needs special attention, because the (very common) assumption that it won't change "in the middle of doing something" is unsafe.

In other words, stable getters vs. non-stable getters are similar to non-nullable expressions vs. expressions that may yield null: With a stable getter, we don't have to worry about changes to the value of the getter at arbitrary points in time, and with the non-nullable expression we don't have to worry about null. With a getter that isn't stable, and with a nullable expression, we need to write extra code that handles the changes at arbitrary times and the value null, just in case they occur.

Saying that nobody would know whether a given getter should be designed to be stable is similar to saying that nobody knows which return type to use with a getter: Some subtypes might need to return something else, so we had better use the return type dynamic for all getters, right? Similarly, some subtypes might want to return different things for this getter each time it is called, so we shouldn't make it stable?

I believe that it is good for the software engineering properties (like readability, maintainability, correctness, and more) that we declare types as a matter of API design, not as an implementation detail that unfortunately leaked into the API.

Similarly, immutability is an appropriate API design property that affects all clients of the given declaration, it is not an implementation detail.

Having stable getters is of course a building block for other notions of immutability: We could include strict rules such that no stable getter can have side effects. (I do think that we shouldn't do that, because we'd want to allow caching, but I also specified that developers cannot rely on having a stable getter executed at run time if it was already executed on the same receiver, because it's OK for the compiler to cache it). Immutability for objects in total could be expressed by requiring that every getter is stable.

On top of this, it's worth noting that a stable getter is known by the compiler to be stable, which means that stable expressions can be promoted (if a is stable and b is a stable getter in the interface of the type of a then a.b is stable, and so on). But the crucial part is to make immutability an API-level property, and promotion is just one of the benefits.

[Stable getters is the one] feature that gets close to non-virtual fields

A non-virtual field wouldn't actually suffice:

A? anA;

class A {
  int? i = 1; // Let's say this one is non-virtual, so the compiler knows that it isn't overridden.
  A() {
    anA = this;
  }
}

void innocentFunction() => anA?.i = null;

void main() {
  var a = A();
  if (a.i != null) {
    innocentFunction();
    print(a.i!.isEven); // Throws, so the compiler _must_ require a null-aware construct like `!` or `?.`.
  }
}

@munificent
Copy link
Member Author

munificent commented Jun 28, 2022

I'd like to make sure that the stable getters feature is understood the way it was intended, so I have to add some comments to this characterization. I hope I don't sound too grumpy, but a tiny bit of grump seems to be OK. ;-)

I think you are at an acceptable level of grump-ness. :D

For what it's worth, I am personally interested in having field/getter immutability visible as a static property of the language specifically for the reason you suggest: because it will allow us to promote some fields. And I think stable getters or a similar feature could be a way to enable that without breaking field/getter symmetry (a property I consider very important in the language). It's something I'd like to explore more if we get time to work on capability modifiers for members.

At the same time, I don't think this is the right feature to rely on for pattern matching. Consider:

class BoolBox {
  bool b;
}

test(BoolBox box) {
  switch (box) {
    case Box(b: true): ...
    case Box(b: false): ...
  }
}

I want users to be able to write code like this and I want the language to treat this set of cases as exhaustive even though b isn't stable and isn't even final. I think the mental model that users have for switch cases is that they all operate on the same value that is sort of frozen in time at the moment that the switch statement is entered. We could restrict exhaustiveness checking to only objects that are statically known to be immutable in order to make that model trivially true, but I think that's too restrictive.

Here's maybe a more compelling example. At some point, I'd like to support ... in list patterns to match the (potentially empty) remainder of the elements. That means you could write something like:

test(List<bool> bools) {
  switch (bools) {
    case []: ..
    case [true, ...rest]: ...
    case [false, ...rest]: ...
  }
}

Intuitively, this switch is exhaustive. There are either no elements, or the first element must be true or false followed by any number of other elements. But if exhaustiveness only works for stable immutable properties, then there is no guarantee that the list passed to this returns the same value on element accesses or that the length doesn't change.

That's why I suggest caching the results of every member called on the matched value. This preserves the mental model users have by having all cases operate on the same view of the matched object. We incrementally build an immutable snapshot of just the properties of the object that are actually seen by the cases. This makes cases more efficient (since we don't re-evaluate the same calls multiple times), makes exhaustiveness work, and allows matching over all sorts of mutable objects.

@eernstg
Copy link
Member

eernstg commented Jun 30, 2022

@munificent wrote:

I want the language to treat this set of cases as exhaustive even though b isn't stable and isn't even final.

As you mention, we'd ensure soundness by specifying that the entire matching process for one switch statement/expression would rely on evaluating each "path" (here: box and box.b) at most once, and caching the results such that they can be used to decide on match/no-match in later cases. So there's no soundness problem.

However, the developer who is reading or writing this kind of code might need to be aware of the fact that even in the body of the case for BoolBox(b: false), the expression box.b could evaluate to true, and hence it might be incorrect (according to the "business logic" of the given application) to do any particular thing. So they'd need to do things like this, because it would be a bug (even if only once in a blue moon) to ignore this situation:

class BoolBox {
  bool b;
}

test(BoolBox box) {
  switch (box) {
    case Box(b: true): ...
    case Box(b: false):
      if (!box.b) {
        ... // Normal code for the case where `b` is false.
      } else {
        ... // Error recovery: `x` was modified during pattern matching!
      }
  }
}

I think this problem can be at an acceptable level (for most non-critical pieces of software we can just say "that won't happen"), but I still find it useful to have the language mechanism of strict, language enforced immutability in order to eliminate the need to deal with those "concurrent updates".

Then, in widely used code where some applications could be life-and-death critical, we could have lints on this case, such that the developers of that kind of code could maintain this particular kind of correctness by only matching on stable getters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patterns Issues related to pattern matching.
Projects
None yet
Development

No branches or pull requests

4 participants