Skip to content

[Prototype] Better type inference for lambdas (e.g., as used in folds) #9076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

smarter
Copy link
Member

@smarter smarter commented May 29, 2020

No version of Scala has ever been able to infer the following:

val xs = List(1, 2, 3)
xs.foldLeft(Nil)((acc, x) => x :: acc)

To understand why, let's have a look at the signature of List[A]#foldLeft:

def foldLeft[B](z: B)(op: (B, A) => B): B

When typing the foldLeft call in the previous expression, the compiler
starts by creating an unconstrained type variable ?B, the challenge is then to
successfully type the expression and instantiate ?B := List[Int].

Typing the first argument is easy: Nil is a valid argument if we add a
constraint:

?B >: Nil.type

Typing the second argument is where we get stuck normally: we need to choose a type
for the binding acc, but ?B is a type variable and not a fully-defined type,
this is solved by instantiating ?B to one of its bound, but no matter what
bound we choose, the rest of the expression won't typecheck:

  • if we instantiate ?B := Nil.type, then the body of the lambda x :: acc is
    not a subtype of the expected result type ?B.
  • if we instantiate ?B := Any, then the body of the lambda does not
    typecheck since there is no method :: on Any.

But... what if we just let acc have type ?B without instantiating it first?
This is not completely meaningless: ?B behaves like an abstract type except
that its bounds might be refined as we typecheck code, as long as narrowing
holds (#), this should be safe.

The remaining challenge then is to type the body of the lambda x :: acc which
desugars to acc.::(x), this won't typecheck as-is since :: is not defined on
the upper bound of ?B, so we need to refine this upper bound somehow, the
heuristic we use is:

  1. Look for :: in the lower bound of ?B >: Nil.type, Nil does have such a member!
  2. Find the class where this member is defined: it's List
  3. If the class has type parameters, create one fresh type variable
    for each parameter slot, the resulting type is our new upper bound,
    so here we get ?B <: List[?X] where ?X is a fresh type variable.

We can then proceed to type the body of the lambda:

acc.::(x)

This first creates a type variable ?B2 >: ?X, because :: has type:

def :: [B >: A](elem: B): List[B]

Because the result type of the lambda is ?B, we get an additional constraint:

List[?B2] <: ?B

We know that ?B <: List[?X] so this means that ?B2 <: ?X, but we
also know that B2 >: ?X, so we can instantiate ?B2 := ?X and ?B := List[?X].

Finally, because x has type Int we have ?B2 >: Int which simplifies to:

?X >: Int

Therefore, the result type of the foldLeft is List[?X] where ?X >: Int,
because List is covariant, we instantiate ?X := Int to get the most precise
result type List[Int].

Note that the the use of fresh type variables in 3) was crucial here: if we had
instead used wildcards and added an upper bound ?B <: List[_], then we would
have been able to type acc.::(x), but the result would have type List[Any],
meaning the result of the foldLeft call would be List[Any] when we wanted
List[Int].

Status

All the compiler tests pass, including bootstrapping, but one of third of the
community build breaks currently.

Even if this PR never makes it in, it has been very useful for stress-testing
our constraint solver and lead to several PRs I made over the past few days:
#9012, #9019, #9030, #9053, and there's some more changes in the early commits
of this PR that would be worth getting in by themselves.

Open questions

  • Is this actually sound?
  • Are there other compelling examples where this useful, besides folds?
  • Is the performance impact of this stuff acceptable?
  • How do we deal with overloads?
  • How do we deal with overrides?
  • How does this interact with implicit conversions?
  • How does this interact with implicit search in general, we might find one
    implicit at a given point, but then as we add more constraints to the same
    type variable, the same implicit search could find a different result. How big
    of a problem is that?

(#): narrowing in fact does not hold when @uncheckedVariance is used, which is
why we special-case it in constrainSelectionQualifier

smarter added 6 commits May 28, 2020 18:59
In `mergeIfSuper`, to simplify `tp1 | tp2`, we first check if `tp2` can
be made a subtype of `tp1`. If so we could just return `tp1` but this
isn't what we did before this commit: at that point we checked if `tp1`
could be made a subtype of `tp2` and in that case prefered returning
`tp2` to `tp1`. I haven't been able to find a reason for this (the
comment says "keep existing type if possible" which I don't understand).

On the other hand, I've found cases where this logic broke type
inference because the second subtype check inferred extraneous
constraints (see added testcase). So this commit simply removes this
logic, it does the same for `mergeIfSub` which contains similar logic to
simplify `tp1 & tp2`, though this one is less problematic since it
always runs with frozen constraints.
This is just a workaround, awaiting further discussion in the linked issue.
This is the minimum extra check needed to avoid loops after the inference
changes in this PR and is incomplete, see test case.
This is needed for type inference with a type variable selection (added
in the next commit of this PR) to work properly, but it also affects master.

Because dependencies can now span multiple TypeLambdas,
`addToConstraint` had to be adjusted to properly propagate all
constraints.

i5735 doesn't compile anymore but I don't think it ever really made
sense.
No version of Scala has ever been able to infer the following:

  val xs = List(1, 2, 3)
  xs.foldLeft(Nil)((acc, x) => x :: acc)

To understand why, let's have a look at the signature of `List[A]#foldLeft`:

  def foldLeft[B](z: B)(op: (B, A) => B): B

When typing the foldLeft call in the previous expression, the compiler
starts by creating an unconstrained type variable ?B, the challenge is then to
successfully type the expression and instantiate `?B := List[Int]`.

Typing the first argument is easy: `Nil` is a valid argument if we add a
constraint:

  ?B >: Nil.type

Typing the second argument is where we get stuck normally: we need to choose a type
for the binding `acc`, but `?B` is a type variable and not a fully-defined type,
this is solved by instantiating `?B` to one of its bound, but no matter what
bound we choose, the rest of the expression won't typecheck:
- if we instantiate `?B := Nil.type`, then the body of the lambda `x :: acc` is
  not a subtype of the expected result type `?B`.
- if we instantiate `?B := Any`, then the body of the lambda does not
  typecheck since there is no method `::` on `Any`.

But... what if we just let `acc` have type `?B` without instantiating it first?
This is not completely meaningless: `?B` behaves like an abstract type except
that its bounds might be refined as we typecheck code, as long as narrowing
holds (#), this should be safe.

The remaining challenge then is to type the body of the lambda `x :: acc` which
desugars to `acc.::(x)`, this won't typecheck as-is since `::` is not defined on
the upper bound of `?B`, so we need to refine this upper bound somehow, the
heuristic we use is:

1) Look for `::` in the lower bound of `?B >: Nil.type`, Nil does have such a member!
2) Find the class where this member is defined: it's `List`
3) If the class has type parameters, create one fresh type variable
   for each parameter slot, the resulting type is our new upper bound,
   so here we get `?B <: List[?X]` where `?X` is a fresh type variable.

We can then proceed to type the body of the lambda:

  acc.::(x)

This first creates a type variable `?B2 >: ?X`, because `::` has type:

  def :: [B >: A](elem: B): List[B]

Because the result type of the lambda is `?B`, we get an additional constraint:

  List[?B2] <: ?B

We know that `?B <: List[?X]` so this means that `?B2 <: ?X`, but we
also know that `B2 >: ?X`, so we can instantiate `?B2 := ?X` and `?B := List[?X]`.

Finally,  because `x` has type Int we have `?B2 >: Int` which simplifies to:

  ?X >: Int

Therefore, the result type of the foldLeft is `List[?X]` where `?X >: Int`,
because `List` is covariant, we instantiate `?X := Int` to get the most precise
result type `List[Int]`.

Note that the the use of fresh type variables in 3) was crucial here: if we had
instead used wildcards and added an upper bound `?B <: List[_]`, then we would
have been able to type `acc.::(x)`, but the result would have type `List[Any]`,
meaning the result of the foldLeft call would be `List[Any]` when we wanted
`List[Int]`.

\# Status

All the compiler tests pass, including bootstrapping, but one of third of the
community build breaks currently.

Even if this PR never makes it in, it has been very useful for stress-testing
our constraint solver and lead to several PRs I made over the past few days:
of this PR that would be worth getting in by themselves.

\# Open questions

- Is this actually sound?
- Are there other compelling examples where this useful, besides folds?
- Is the performance impact of this stuff acceptable?
- How do we deal with overloads?
- How do we deal with overrides?
- How does this interact with implicit conversions?
- How does this interact with implicit search in general, we might find one
  implicit at a given point, but then as we add more constraints to the same
  type variable, the same implicit search could find a different result. How big
  of a problem is that?

(#): narrowing in fact does not hold when `@uncheckedVariance` is used, which is
     why we special-case it in `typedSelect` in this commit.
@smarter smarter force-pushed the foldleft-inf-gold branch from 17fb4f9 to e15c580 Compare May 29, 2020 15:19
@smarter
Copy link
Member Author

smarter commented May 29, 2020

@Blaisorblade I make some claims above that what I'm doing is sound as long as narrowing holds, do you think that makes sense? (And do we in fact expect narrowing to hold in Scala?)

@smarter smarter marked this pull request as draft May 29, 2020 15:22
@Blaisorblade
Copy link
Contributor

Blaisorblade commented May 30, 2020

@Blaisorblade I make some claims above that what I'm doing is sound as long as narrowing holds, do you think that makes sense? (And do we in fact expect narrowing to hold in Scala?)

(EDIT) TLDR: I find this cool and let me encourage you! The concerns shouldn't affect soundness, but with lots of question marks.

  • Just to be sure: I guess even in Dotty bugs in inference can't affect soundness if you fully typecheck the result. IIRC, "fully typecheck" requires at least "-Ycheck" and PostTyper and RefChecks after type inference is done.

  • The claims make sense to me, type-soundness-wise; but it seems inference, and especially implicit inference, could violate their analogue of narrowing without lots of care.

  • Scala violates both transitivity and narrowing, but this is an inference problem, so it doesn't threaten soundness but could cause bugs. If x : { A >: L <: U }, then x.A <: U without L <: U, so for instance you cannot narrow T <: x.A to T <: L. The same happens with X :> L <: U. Could that cause a -Ycheck failure? But narrowing to T <: x.A & L should still work, so maybe you can "remember" the old constraint somehow.

Otherwise, narrowing holds without restrictions in both my model and Nada's thesis; only "good bounds" can be broken by narrowing.

I expect it should hold in Scala (ignoring transitivity) because it corresponds to Liskov's substitution principle — but if it doesn't, you should get a soundness bug (at least in a calculus and probably in Scala) as follows:

If some judgment Γ, x : T1 ⊢ e: T breaks when refining the type of x from T1 to a subtype T2 (that is, T2 <: T1), then take (x : T1) ⇒ e, apply it to an argument e2 of type T2, and reduce. Typically, if Γ, x : T2 ⊢ e : T fails, typing Γ ⊢ e[x := e2] : T to fail as well (but it's not guaranteed).

  • Other inference problems might arise if inference is too eager and takes decisions when T is abstract that become wrong when T is concrete.
    For instance, if you solve implicits eagerly, or the constraint solver solves other variables eagerly and throws out solutions early — which can maybe happen because constraint solving is incomplete?

@smarter
Copy link
Member Author

smarter commented May 30, 2020

Just to be sure: I guess even in Dotty bugs in inference can't affect soundness if you fully typecheck the result. IIRC, "fully typecheck" requires at least "-Ycheck" and PostTyper and RefChecks after type inference is done.

In fact, I found a case where Ycheck is slacking on the job: e15c580#diff-7c63b7bfffa9340fb48ac9b61fa56beeR601, but in principle yes. However, anything that is only caught by Ycheck is still a soundness bug since we don't run it by default.

The claims make sense to me, type-soundness-wise; but it seems inference, and especially implicit inference, could violate their analogue of narrowing without lots of care.

Yes that's inevitable I think, I think the best we can do is tune our heuristics (like the one we use to decide which type variables to instantiate before doing an implicit search) to keep existing code behaving more or less the same.

you cannot narrow T <: x.A to T <: L ... but narrowing to T <: x.A & L should still work

Right, I think that's fine since the constraint solver itself will never narrow an upper bound constraint x.A to just L since it also relies on the subtype checker to know whether x.A <:< L.

@Blaisorblade
Copy link
Contributor

However, anything that is only caught by Ycheck is still a soundness bug since we don't run it by default.

Agreed; I was only hoping that those bugs would be easier to catch.

Right, I think that's fine since the constraint solver itself will never narrow an upper bound constraint x.A to just L since it also relies on the subtype checker to know whether x.A <:< L.

But it could narrow X to L if X >: L <: U, right? But I guess transitivity should work there.

@SethTisue
Copy link
Member

SethTisue commented Nov 12, 2020

Note that it matters whether you use Nil or List.empty. With List.empty, in Scala 3.0.0-M1 it compiles, but infers a too-general type:

scala> List(1, 2, 3).foldLeft(List.empty)((x, y) => y :: x)                                                             
val res0: List[Any] = List(3, 2, 1)

I'm pleased to see that this PR improves this example too:

scala> List(1, 2, 3).foldLeft(List.empty)((x, y) => y :: x)
val res0: List[Int] = List(3, 2, 1)

@anatoliykmetyuk
Copy link
Contributor

There was no activity on this one for a long while, so let's close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants