Skip to content
This repository was archived by the owner on Feb 22, 2018. It is now read-only.

task model: inference steps for instance variables are not well-ordered #354

Closed
leafpetersen opened this issue Sep 25, 2015 · 28 comments
Closed

Comments

@leafpetersen
Copy link
Contributor

Our initializer based inference for statics and fields is structured so that inference of static variables and fields happens first, and then the information from that is propagated to instance fields. Dependencies between statics are tracked, so that transitive inference for static variables and fields works predictably and deterministically, with no inference happening between members of a cycle. Instance field inference is currently not done transitively - instance fields use the information from static inference, but not from the inference of other fields. However, inference in DDC is done on a per-library cycle basis in topological order (I believe), so instance field inference results from a previous library cycle are used for transitive inference. This means that for the following program,

a.dart: 
          import 'b.dart';
          class A {
            final a2 = new B().b2;
          }
b.dart
          class B {
            final b2 = 1;
          }
main.dart
          import "a.dart";

          test1() {
            int x = 0;
            x = new A().a2;
          }

The a2 instance field of A does get inferred to have type int, since complete inference for b.dart happens before inference for a.dart.

The task model code does not consider compilation units and library cycles - if there is not a specific dependency to force B.b2 to be inferred before A.a2, then by default A.a2 will be inferred before B.b2.

This is a problem because it makes inference dependent on the order of processing of the files, and similarly makes incremental compilation change the inference results. There are two possible fixes that seem reasonable: one is to impose dependencies to cause instance variable inference to happen in an order which matches existing DDC behavior; the other is to compute actual fine grained dependencies for the instance fields and infer them in dependency order as we do for static variables. This is a little tricky, since it interacts with override inference as well, but would probably produce the most useful results. @sigmundch has a comment discussing this in the DDC resolver.dart file.

@sigmundch, @bwilkerson any thoughts on this?

@bwilkerson
Copy link

What does DDC do if you add import 'main.dart'; to b.dart, creating a library cycle? Does it always get the order of instance fields right? (If needed, I think I could mess it up by adding the fields final a3 = 2; and final b3 = new A().a3;.)

I'm guessing the answer is no. If that's true, then the first option won't solve those problems either.

The second option would solve all of the problems and would minimize the amount of work that needs to be done in other compilation units before completing analysis of the target compilation unit.

As a minor variation, rather than split static and instance fields into two groups, we could build a single graph of all (non-local) variables and use that. It might reduce duplication in the code and shouldn't hurt performance.

@leafpetersen
Copy link
Contributor Author

Within a library cycle, instance fields are inferred independently of each other, so if you add that import no inference will be done between the instance fields. It's only across library cycles that instance fields are transitively inferred. This isn't ideal, but it's at least deterministic.

Doing the fine grained dependency is probably better. I don't think it's as simple as you describe though, because I think you need to prioritize override based inference for the fields. If you have an instance field with no type, but with a type in the superclass, you want to prefer that type to a type inferred from an initializer (since the initializer type might be too specific).

@bwilkerson
Copy link

If you have an instance field with no type, but with a type in the superclass, you want to prefer that type to a type inferred from an initializer ...

Unless the field is marked final, which probably doesn't happen very often.

But that makes sense overall. That implies that we need three phases: static variables, overridden fields, then instance fields.

@sigmundch
Copy link
Contributor

I also like the idea of going with the fine-grain inference. It seems I had a TODO about it here :), but I never got around to do it (we probably didn't have evidence that it was going to add that much more value in the applications we looked at). I agree though that we need to give priority to overrides.

@jmesserly
Copy link
Contributor

@leafpetersen are we planning to fix this before task model integration? Or is it something we could wait on?

@leafpetersen
Copy link
Contributor Author

I've been thinking about this, might take a stab at it tomorrow.

@leafpetersen
Copy link
Contributor Author

This is actually more pervasive than just with fields. Vijay sends this example:

foo.dart:

import 'lib.dart' as lib;

main() {
  lib.const1Global = 1;
}

lib.dart:

var const1Global = const {};

Nothing is imposing a dependency edge from resolving the body of main to doing inference on const1Global, so foo.dart may get fully resolved before inference happens on lib.dart.

@leafpetersen
Copy link
Contributor Author

Another example, illustrating that variable based dependencies are not sufficient:

      '/a.dart': '''
          import 'b.dart';
            void foo() {
              String x= B.f.z;
            }
      ''',
      '/b.dart': '''
        class C {
          var z = 3;
        }

        class B {
          static var f = new C();
        }
      ''',
      '/c.dart': '''
          import 'b.dart';
            void foo() {
              String x = B.f.z;
            }
    '''

This currently resolves B.f.z in a.dart to have type dynamic, but B.f.z in c.dart to have type int. The point is that we cannot resolve the body of foo in a.dart and c.dart until we have fully inferred both B and C from b.dart.

@bwilkerson
Copy link

That shouldn't be too hard. We know all of the libraries that are transitively imported and can require that instance member inference has been performed for all of them. It's a little sad, because it increases the amount of work that's required before we can resolve function bodies, but the dependencies have to be right.

@leafpetersen
Copy link
Contributor Author

Yes, I think the plan, at least for the short term, is to use the import dependency ordering. Cycles will need to be handled as a unit though, which may be a bit annoying.

@bwilkerson
Copy link

Cycles will need to be handled as a unit though ...

Cycles involving or between what?

@leafpetersen
Copy link
Contributor Author

Library cycles. If we require that instance member inference has been performed on all of our imports before we do it on ourselves, then we run into the case that we're part of a cycle. In this case, we want to resolve the right hand sides of all of the instance fields in the cycle, then do inference on all of the instance fields in the cycle.

@bwilkerson
Copy link

So to be clear, this issue has two related but independent discussions in it.

The first discussion was about the fact that instance fields can have dependencies between them in the same way that static fields can. I still believe the solution to that is what I said before:

That implies that we need three [inference] phases: static variables, overridden fields, then instance fields.

where both static variables and instance fields have dependencies between them being explicitly handled by the task model. I believe that we decided that if two or more instance fields form a cycle then we would fail to infer types for any of those fields.

I believe we also decided that if a static variable has a dependency on an instance member that we would not worry about that case and just use the declared type of the instance member.

The second discussion is about references to members being in function bodies and the fact that we don't currently have a dependency in the task model that will ensure that instance members are inferred before function bodies referencing them have been resolved.

While we could set up smaller grained dependencies (from function bodies to the members that they refer to), I think we can get by with a larger grained dependency:

We know all of the libraries that are transitively imported and can require that instance member inference has been performed for all of them.

(before resolving function bodies in the importing library).

@leafpetersen
Copy link
Contributor Author

I think these are both instances of the broader problem that we have not yet pinned down a stable static approximation of the true dependency graph. Concretely, consider this example:

lib1.dart

var a1 = 3;
class A {
  var a2 = 3;
}

lib2.dart

import 'lib1.dart';
var b1 = a1;
var b2 = new A().a2;

In the old DDC code, we used the static import graph to choose a topological ordering on the library cycles, and then used the algorithm you mention above (static variables, then fields) to do inference on a per library-cycle basis. This is stable because anything you refer to from outside of your library cycle will be fully resolved before you see it, and anything you refer to from inside of your library cycle has a well-defined order of inference.

In the current task model, we impose the (static variables, then fields) algorithm on a per unit basis. This is not stable by itself. Under this algorithm, the only ordering dependencies are that a1 must happen before a2 and b1. b2 is free to be resolved/inferred at any time, and hence it is undefined (order dependent) whether it ends up as int or as dynamic. This is not unique to fields - the instance method example is just another example of the same problem.

One solution is to impose the (static variables, then fields, then instance methods) algorithm globally. That is, require all static variables in the whole program to be inferred before moving on to fields. This is stable. For this example, we would end up with a1 and b1 having type int, and b2 having type dynamic. This is ok, and could be an interim solution, but it could lead to odd behavior when we start doing incremental compilation. If I split analysis/compilation in two and do analysis on lib1.dart first, then lib2.dart, then suddenly b2 would start getting inferred as int. This could probably be avoided by keeping track of the pre-inference types, but that starts to get complicated.

Another proposal as I understand it is to try to accurately capture the fine grained dependencies induced by fields. I believe that the example above demonstrates that we would need to capture anti-dependencies between statics and instance variables as well, but leaving that on the side, I don't yet see how to accurately compute this in a manner that does not require an iterative fixed point process. Basically, accurately determining the fine grained dependencies for things involving instance members would require arbitrarily interleaving of inference, resolution, and dependency discovery. Consider for example, the following code fragment:

class A {
   var x = a.b.c.d;
}

What does inferring the type of x depend on? Well, it depends on the type of a, so we'd better add a dependency edge from x to a. But it also depends on the type of a.b - what edge do we add here? Before we can add that edge, we need to do inference on a. Once we've done inference on, a, we now resolve a.b, thereby discovering a new dependency edge from x to the element that a.b resolves to. Once we've ensured that inference has been done to satisfy that dependency edge, we can now do resolution on a.b.c, to discover the type of a.b, and thereby finding the element that a.b.c resolves to, so we can do inference on that element.... etc.

Now, I don't mean to imply this is intractable. Even without going down this path of iteratively discovering dependency edges, it's plausible that we can find a nice static approximation that is stable. But it's not obvious to me what that approximation is right now, and I'm very concerned about making progress on this. So I'm proposing for the time being to encode the library cycle dependencies that DDC was using to order inference into the inference tasks, to get us back to the level of goodness that we were at before. From there, we can try to iterate forward.

@bwilkerson
Copy link

Sorry about the confusion. I understand the problem, but I think I was misunderstanding the goal of the discussion (I was trying to explore what the stable approximation might look like).

I have no problem with your proposal, though I'm not sure exactly how the dependencies you're proposing would be computed/expressed.

@leafpetersen
Copy link
Contributor Author

I'll try to get a first cut of some code together for you to make suggestions on, but I think it could look something like this:

  1. Have a task to compute the direct import/export dependencies of a unit
  2. Have a task to compute the strongly connected component for a unit
  • Maybe could use the task mode cycle detection for this, but I'm not sure what properties it has (e.g. does it guarantee to give the complete cycle)?
  • Otherwise just compute as in the existing library resolver?
  1. For a given unit, the first inference task (static variables) has an input dependency on complete inference having been finished for the import/export dependencies (computed by task 1) of each of the units in its strongly connected component.
  2. For a given unit the second inference task (instance variables) has an input dependency on the step 3 task for each of the units in its strongly connected component.
  3. etc.. for the rest of the inference sub-steps.

Does that seem reasonable?

@bwilkerson
Copy link

  1. Have a task to compute the direct import/export dependencies of a unit

Done. See IMPORTED_LIBRARIES and EXPORTED_LIBRARIES.

  1. Have a task to compute the strongly connected component for a unit
  • Maybe could use the task mode cycle detection for this, but I'm not sure what properties it has (e.g. does it guarantee to give the complete cycle)?

I believe that it's only guaranteed to detect a recursive request for information.

  • Otherwise just compute as in the existing library resolver?

Yes, that's probably what we'd need to do.

  1. For a given unit, the first inference task (static variables) has an input dependency on complete inference having been finished for the import/export dependencies (computed by task 1) of each of the units in its strongly connected component.

I'm assuming you mean "each of the units in the import/export dependencies of each of the units in the strongly connected component minus the units in the strongly connected component." If not, then the task model will get in your way. Also, we still need to resolve static variables in dependency order within the strongly connected set of libraries because the task model doesn't support computing results for multiple targets simultaneously.

My concern with this approach is that it puts us right back where we were: in order to resolve a given unit we have to do almost all of the resolution work for all of the libraries in the transitive closure of imports, whether any of that work is needed or not. The only work we can avoid is resolving the bodies of methods. It seems like that largely negates one of the main advantages that the task model was suppose to give us (although I might be wrong about how much time is spent where).

Unfortunately, without finding the stable approximation I don't know what else to do.

@leafpetersen
Copy link
Contributor Author

  1. Have a task to compute the strongly connected component for a unit

Maybe could use the task mode cycle detection for this, but I'm not sure what properties it has (e.g. does it guarantee to give the complete cycle)?
I believe that it's only guaranteed to detect a recursive request for information.

This comment seems promising:

  /**
   * If a dependency cycle was found while computing the inputs for the task,
   * the set of [WorkItem]s contained in the cycle (if there are overlapping
   * cycles, this is the set of all [WorkItem]s in the entire strongly
   * connected component).  Otherwise, `null`.
   */
  List<WorkItem> dependencyCycle;

Unfortunately, without finding the stable approximation I don't know what else to do.

Agreed that this is really not in the spirit of the task model. I don't know what else to do right now either, and really want to unblock things. We should revisit this once we're back to a good state.

@leafpetersen
Copy link
Contributor Author

I've done a quick experiment which uses the task model cycle detection to find strongly connected components, in the hopes that the comment above is accurate. So far, things look good. I'll upload something tomorrow for @bwilkerson and @stereotype441 to give feedback on. If this approach seems reasonable to them, I think I should be able to knock this off tomorrow.

@leafpetersen
Copy link
Contributor Author

With Vijay's CL and my analyzer CL, I now see all_tests.dart passing. There's a dramatic performance regression on large_class_declaration_test.dart. I believe that incremental resolution of a field is probably linear in the number of fields, and so this test runs quadratically in the number of fields. There should be an easy fix for this.

@bwilkerson
Copy link

[+scheglov]

Konstantin,

You know more about the incremental resolver than I do, so I'm hoping you
have a guess.

Brian

On Tue, Oct 6, 2015 at 11:05 AM, Leaf Petersen [email protected]
wrote:

With Vijay's CL and my analyzer CL, I now see all_tests.dart passing.
There's a dramatic performance regression on
large_class_declaration_test.dart. I believe that incremental resolution of
a field is probably linear in the number of fields, and so this test runs
quadratically in the number of fields. There should be an easy fix for
this.


Reply to this email directly or view it on GitHub
#354 (comment)
.

@vsmenon
Copy link
Contributor

vsmenon commented Oct 6, 2015

Leaf,

I'm also seeing the regression on large_class_declaration_test. I'm also
still seeing a couple inference tests failures locally - looking....

Vijay

On Tue, Oct 6, 2015 at 11:05 AM, Leaf Petersen [email protected]
wrote:

With Vijay's CL and my analyzer CL, I now see all_tests.dart passing.
There's a dramatic performance regression on
large_class_declaration_test.dart. I believe that incremental resolution of
a field is probably linear in the number of fields, and so this test runs
quadratically in the number of fields. There should be an easy fix for
this.


Reply to this email directly or view it on GitHub
#354 (comment)
.

@leafpetersen
Copy link
Contributor Author

I have a fix for the performance issue, I'll upload shortly. Vijay - there are some tests which set the inferTransitively flag to false - we don't support that option (and don't plan to), so those tests will fail (the flag just gets ignored, and transitive checking happens anyway). If you're seeing more than that, let me know.

@leafpetersen
Copy link
Contributor Author

To be clear, the fix is to avoid using the incremental resolver for field re-resolution.

@scheglov
Copy link

scheglov commented Oct 6, 2015

Incremental resolution works only for method bodies, so we don't need to apply it to fields anyway.

@vsmenon
Copy link
Contributor

vsmenon commented Oct 6, 2015

Leaf - just confirming - I'm only seeing failures with that flag off . We
can just delete those tests.

On Tue, Oct 6, 2015 at 11:43 AM, Leaf Petersen [email protected]
wrote:

I have a fix for the performance issue, I'll upload shortly. Vijay - there
are some tests which set the inferTransitively flag to false - we don't
support that option (and don't plan to), so those tests will fail (the flag
just gets ignored, and transitive checking happens anyway). If you're
seeing more than that, let me know.


Reply to this email directly or view it on GitHub
#354 (comment)
.

@leafpetersen
Copy link
Contributor Author

Update: I think have the cache invalidation issue resolved. I'm running into git merge conflicts from hell, but as soon as I get those resolved I'll upload a CL.

@leafpetersen
Copy link
Contributor Author

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

6 participants