[Prototype] Better type inference for lambdas (e.g., as used in folds) #9076

smarter · 2020-05-29T15:19:17Z

No version of Scala has ever been able to infer the following:

val xs = List(1, 2, 3)
xs.foldLeft(Nil)((acc, x) => x :: acc)

To understand why, let's have a look at the signature of List[A]#foldLeft:

def foldLeft[B](z: B)(op: (B, A) => B): B

When typing the foldLeft call in the previous expression, the compiler
starts by creating an unconstrained type variable ?B, the challenge is then to
successfully type the expression and instantiate ?B := List[Int].

Typing the first argument is easy: Nil is a valid argument if we add a
constraint:

?B >: Nil.type

Typing the second argument is where we get stuck normally: we need to choose a type
for the binding acc, but ?B is a type variable and not a fully-defined type,
this is solved by instantiating ?B to one of its bound, but no matter what
bound we choose, the rest of the expression won't typecheck:

if we instantiate ?B := Nil.type, then the body of the lambda x :: acc is
not a subtype of the expected result type ?B.
if we instantiate ?B := Any, then the body of the lambda does not
typecheck since there is no method :: on Any.

But... what if we just let acc have type ?B without instantiating it first?
This is not completely meaningless: ?B behaves like an abstract type except
that its bounds might be refined as we typecheck code, as long as narrowing
holds (#), this should be safe.

The remaining challenge then is to type the body of the lambda x :: acc which
desugars to acc.::(x), this won't typecheck as-is since :: is not defined on
the upper bound of ?B, so we need to refine this upper bound somehow, the
heuristic we use is:

Look for :: in the lower bound of ?B >: Nil.type, Nil does have such a member!
Find the class where this member is defined: it's List
If the class has type parameters, create one fresh type variable
for each parameter slot, the resulting type is our new upper bound,
so here we get ?B <: List[?X] where ?X is a fresh type variable.

We can then proceed to type the body of the lambda:

acc.::(x)

This first creates a type variable ?B2 >: ?X, because :: has type:

def :: [B >: A](elem: B): List[B]

Because the result type of the lambda is ?B, we get an additional constraint:

List[?B2] <: ?B

We know that ?B <: List[?X] so this means that ?B2 <: ?X, but we
also know that B2 >: ?X, so we can instantiate ?B2 := ?X and ?B := List[?X].

Finally, because x has type Int we have ?B2 >: Int which simplifies to:

?X >: Int

Therefore, the result type of the foldLeft is List[?X] where ?X >: Int,
because List is covariant, we instantiate ?X := Int to get the most precise
result type List[Int].

Note that the the use of fresh type variables in 3) was crucial here: if we had
instead used wildcards and added an upper bound ?B <: List[_], then we would
have been able to type acc.::(x), but the result would have type List[Any],
meaning the result of the foldLeft call would be List[Any] when we wanted
List[Int].

Status

All the compiler tests pass, including bootstrapping, but one of third of the
community build breaks currently.

Even if this PR never makes it in, it has been very useful for stress-testing
our constraint solver and lead to several PRs I made over the past few days:
#9012, #9019, #9030, #9053, and there's some more changes in the early commits
of this PR that would be worth getting in by themselves.

Open questions

Is this actually sound?
Are there other compelling examples where this useful, besides folds?
Is the performance impact of this stuff acceptable?
How do we deal with overloads?
How do we deal with overrides?
How does this interact with implicit conversions?
How does this interact with implicit search in general, we might find one
implicit at a given point, but then as we add more constraints to the same
type variable, the same implicit search could find a different result. How big
of a problem is that?

(#): narrowing in fact does not hold when @uncheckedVariance is used, which is
why we special-case it in constrainSelectionQualifier

In `mergeIfSuper`, to simplify `tp1 | tp2`, we first check if `tp2` can be made a subtype of `tp1`. If so we could just return `tp1` but this isn't what we did before this commit: at that point we checked if `tp1` could be made a subtype of `tp2` and in that case prefered returning `tp2` to `tp1`. I haven't been able to find a reason for this (the comment says "keep existing type if possible" which I don't understand). On the other hand, I've found cases where this logic broke type inference because the second subtype check inferred extraneous constraints (see added testcase). So this commit simply removes this logic, it does the same for `mergeIfSub` which contains similar logic to simplify `tp1 & tp2`, though this one is less problematic since it always runs with frozen constraints.

This is just a workaround, awaiting further discussion in the linked issue.

This is the minimum extra check needed to avoid loops after the inference changes in this PR and is incomplete, see test case.

This is needed for type inference with a type variable selection (added in the next commit of this PR) to work properly, but it also affects master. Because dependencies can now span multiple TypeLambdas, `addToConstraint` had to be adjusted to properly propagate all constraints. i5735 doesn't compile anymore but I don't think it ever really made sense.

No version of Scala has ever been able to infer the following: val xs = List(1, 2, 3) xs.foldLeft(Nil)((acc, x) => x :: acc) To understand why, let's have a look at the signature of `List[A]#foldLeft`: def foldLeft[B](z: B)(op: (B, A) => B): B When typing the foldLeft call in the previous expression, the compiler starts by creating an unconstrained type variable ?B, the challenge is then to successfully type the expression and instantiate `?B := List[Int]`. Typing the first argument is easy: `Nil` is a valid argument if we add a constraint: ?B >: Nil.type Typing the second argument is where we get stuck normally: we need to choose a type for the binding `acc`, but `?B` is a type variable and not a fully-defined type, this is solved by instantiating `?B` to one of its bound, but no matter what bound we choose, the rest of the expression won't typecheck: - if we instantiate `?B := Nil.type`, then the body of the lambda `x :: acc` is not a subtype of the expected result type `?B`. - if we instantiate `?B := Any`, then the body of the lambda does not typecheck since there is no method `::` on `Any`. But... what if we just let `acc` have type `?B` without instantiating it first? This is not completely meaningless: `?B` behaves like an abstract type except that its bounds might be refined as we typecheck code, as long as narrowing holds (#), this should be safe. The remaining challenge then is to type the body of the lambda `x :: acc` which desugars to `acc.::(x)`, this won't typecheck as-is since `::` is not defined on the upper bound of `?B`, so we need to refine this upper bound somehow, the heuristic we use is: 1) Look for `::` in the lower bound of `?B >: Nil.type`, Nil does have such a member! 2) Find the class where this member is defined: it's `List` 3) If the class has type parameters, create one fresh type variable for each parameter slot, the resulting type is our new upper bound, so here we get `?B <: List[?X]` where `?X` is a fresh type variable. We can then proceed to type the body of the lambda: acc.::(x) This first creates a type variable `?B2 >: ?X`, because `::` has type: def :: [B >: A](elem: B): List[B] Because the result type of the lambda is `?B`, we get an additional constraint: List[?B2] <: ?B We know that `?B <: List[?X]` so this means that `?B2 <: ?X`, but we also know that `B2 >: ?X`, so we can instantiate `?B2 := ?X` and `?B := List[?X]`. Finally, because `x` has type Int we have `?B2 >: Int` which simplifies to: ?X >: Int Therefore, the result type of the foldLeft is `List[?X]` where `?X >: Int`, because `List` is covariant, we instantiate `?X := Int` to get the most precise result type `List[Int]`. Note that the the use of fresh type variables in 3) was crucial here: if we had instead used wildcards and added an upper bound `?B <: List[_]`, then we would have been able to type `acc.::(x)`, but the result would have type `List[Any]`, meaning the result of the foldLeft call would be `List[Any]` when we wanted `List[Int]`. \# Status All the compiler tests pass, including bootstrapping, but one of third of the community build breaks currently. Even if this PR never makes it in, it has been very useful for stress-testing our constraint solver and lead to several PRs I made over the past few days: of this PR that would be worth getting in by themselves. \# Open questions - Is this actually sound? - Are there other compelling examples where this useful, besides folds? - Is the performance impact of this stuff acceptable? - How do we deal with overloads? - How do we deal with overrides? - How does this interact with implicit conversions? - How does this interact with implicit search in general, we might find one implicit at a given point, but then as we add more constraints to the same type variable, the same implicit search could find a different result. How big of a problem is that? (#): narrowing in fact does not hold when `@uncheckedVariance` is used, which is why we special-case it in `typedSelect` in this commit.

smarter · 2020-05-29T15:21:34Z

@Blaisorblade I make some claims above that what I'm doing is sound as long as narrowing holds, do you think that makes sense? (And do we in fact expect narrowing to hold in Scala?)

Blaisorblade · 2020-05-30T16:02:34Z

@Blaisorblade I make some claims above that what I'm doing is sound as long as narrowing holds, do you think that makes sense? (And do we in fact expect narrowing to hold in Scala?)

(EDIT) TLDR: I find this cool and let me encourage you! The concerns shouldn't affect soundness, but with lots of question marks.

Just to be sure: I guess even in Dotty bugs in inference can't affect soundness if you fully typecheck the result. IIRC, "fully typecheck" requires at least "-Ycheck" and PostTyper and RefChecks after type inference is done.
The claims make sense to me, type-soundness-wise; but it seems inference, and especially implicit inference, could violate their analogue of narrowing without lots of care.
Scala violates both transitivity and narrowing, but this is an inference problem, so it doesn't threaten soundness but could cause bugs. If x : { A >: L <: U }, then x.A <: U without L <: U, so for instance you cannot narrow T <: x.A to T <: L. The same happens with X :> L <: U. Could that cause a -Ycheck failure? But narrowing to T <: x.A & L should still work, so maybe you can "remember" the old constraint somehow.

Otherwise, narrowing holds without restrictions in both my model and Nada's thesis; only "good bounds" can be broken by narrowing.

I expect it should hold in Scala (ignoring transitivity) because it corresponds to Liskov's substitution principle — but if it doesn't, you should get a soundness bug (at least in a calculus and probably in Scala) as follows:

If some judgment Γ, x : T1 ⊢ e: T breaks when refining the type of x from T1 to a subtype T2 (that is, T2 <: T1), then take (x : T1) ⇒ e, apply it to an argument e2 of type T2, and reduce. Typically, if Γ, x : T2 ⊢ e : T fails, typing Γ ⊢ e[x := e2] : T to fail as well (but it's not guaranteed).

Other inference problems might arise if inference is too eager and takes decisions when T is abstract that become wrong when T is concrete.
For instance, if you solve implicits eagerly, or the constraint solver solves other variables eagerly and throws out solutions early — which can maybe happen because constraint solving is incomplete?

smarter · 2020-05-30T16:57:51Z

Just to be sure: I guess even in Dotty bugs in inference can't affect soundness if you fully typecheck the result. IIRC, "fully typecheck" requires at least "-Ycheck" and PostTyper and RefChecks after type inference is done.

In fact, I found a case where Ycheck is slacking on the job: e15c580#diff-7c63b7bfffa9340fb48ac9b61fa56beeR601, but in principle yes. However, anything that is only caught by Ycheck is still a soundness bug since we don't run it by default.

The claims make sense to me, type-soundness-wise; but it seems inference, and especially implicit inference, could violate their analogue of narrowing without lots of care.

Yes that's inevitable I think, I think the best we can do is tune our heuristics (like the one we use to decide which type variables to instantiate before doing an implicit search) to keep existing code behaving more or less the same.

you cannot narrow T <: x.A to T <: L ... but narrowing to T <: x.A & L should still work

Right, I think that's fine since the constraint solver itself will never narrow an upper bound constraint x.A to just L since it also relies on the subtype checker to know whether x.A <:< L.

Blaisorblade · 2020-05-30T17:45:44Z

However, anything that is only caught by Ycheck is still a soundness bug since we don't run it by default.

Agreed; I was only hoping that those bugs would be easier to catch.

Right, I think that's fine since the constraint solver itself will never narrow an upper bound constraint x.A to just L since it also relies on the subtype checker to know whether x.A <:< L.

But it could narrow X to L if X >: L <: U, right? But I guess transitivity should work there.

SethTisue · 2020-11-12T05:27:18Z

Note that it matters whether you use Nil or List.empty. With List.empty, in Scala 3.0.0-M1 it compiles, but infers a too-general type:

scala> List(1, 2, 3).foldLeft(List.empty)((x, y) => y :: x)                                                             
val res0: List[Any] = List(3, 2, 1)

I'm pleased to see that this PR improves this example too:

scala> List(1, 2, 3).foldLeft(List.empty)((x, y) => y :: x)
val res0: List[Int] = List(3, 2, 1)

anatoliykmetyuk · 2021-07-27T14:12:35Z

There was no activity on this one for a long while, so let's close it.

smarter added 6 commits May 28, 2020 18:59

Avoid a pickling diff in this PR

586fbcf

This is just a workaround, awaiting further discussion in the linked issue.

Try to disallow illegal match types in F-bounds

06b978a

This is the minimum extra check needed to avoid loops after the inference changes in this PR and is incomplete, see test case.

Fix constraint merging exception in scalatest

23e6d6c

smarter force-pushed the foldleft-inf-gold branch from 17fb4f9 to e15c580 Compare May 29, 2020 15:19

smarter marked this pull request as draft May 29, 2020 15:22

OlivierBlanvillain assigned smarter Jun 12, 2020

smarter mentioned this pull request Sep 9, 2020

Precise apply for enum companion objects #9728

Closed

smarter mentioned this pull request Mar 30, 2021

Type inference of lambdas not working #11954

Open

smarter mentioned this pull request May 9, 2021

Dig deeper on avoidance #8900

Closed

anatoliykmetyuk closed this Jul 27, 2021

smarter mentioned this pull request Mar 8, 2022

Type inference for lambda parameters should be able to take the body of the lambda into account #17771

Closed

smarter mentioned this pull request Nov 8, 2023

Improve type inference for functions like fold #18780

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Prototype] Better type inference for lambdas (e.g., as used in folds) #9076

[Prototype] Better type inference for lambdas (e.g., as used in folds) #9076

smarter commented May 29, 2020 •

edited

Loading

smarter commented May 29, 2020

Blaisorblade commented May 30, 2020 •

edited

Loading

smarter commented May 30, 2020

Blaisorblade commented May 30, 2020

SethTisue commented Nov 12, 2020 •

edited

Loading

anatoliykmetyuk commented Jul 27, 2021

[Prototype] Better type inference for lambdas (e.g., as used in folds) #9076

[Prototype] Better type inference for lambdas (e.g., as used in folds) #9076

Conversation

smarter commented May 29, 2020 • edited Loading

Status

Open questions

smarter commented May 29, 2020

Blaisorblade commented May 30, 2020 • edited Loading

smarter commented May 30, 2020

Blaisorblade commented May 30, 2020

SethTisue commented Nov 12, 2020 • edited Loading

anatoliykmetyuk commented Jul 27, 2021

smarter commented May 29, 2020 •

edited

Loading

Blaisorblade commented May 30, 2020 •

edited

Loading

SethTisue commented Nov 12, 2020 •

edited

Loading