Skip to content

Feedback from PhpStorm #5

@wbars

Description

@wbars

Hello, I want to share purely technical feedback about possible support for this RFC in PhpStorm.

Disclaimer

In the following text by "support" I will mean general IDE features like code analysis and completion.
Also, this feedback is preliminary because it's based on the draft, and it's neither a commitment nor a refusal of any of the support, be it current or future one. We still evaluate the necessity of features and needed effort for them individually based on various factors.

Support for correctly parsing and highlighting generic PHP doc tags is out of scope of this feedback because PhpStorm will most certainly be able to support any formal syntax for this kind of job.

Also please note that this is overview of a current PhpStorm engine state and it may change in the future.

Technical problems

In its current form it’s unlikely that PhpStorm will be able to fully support the current proposal. To find out why I want to shed some light on internals of our engine.

  1. Most of our local code analysis and type inference is performed “on the fly”. For the sake of simplicity we can assume that code analysis pass is being restarted on every change of the editor content, for example, typing of a letter. It means that analysis has to be “fast”, and we achieve this by ensuring that we don’t directly access the content of other files during analysis since it is a very expensive operation in our terms (a file needs to be read, parsed, etc.). So instead we use indices built for such files.

Consider the following example:

<?php
// a.php

$a = f();
$a->x();
<?php
// b.php
class A {
function m() {}
}

function f(): A {
return new A();
}

Suppose we run a local inspection "undefined method" on the file "a.php". To know which type the function call f() will have, we need to access the content of the file "b.php". Since we're not allowed to access this content directly, we need to build indices for the file "b.php" and access them instead. In those indices there will be some information like "Function \f has return type \A".

  1. Speaking about indices, they are built for each file separately and are rebuilt for appropriate file changes. It means that if the content of the file 'b.php' changes, then only the indices for the file 'b.php' will be rebuilt and the indices for 'a.php' will stay the same.

It's an important property of indices and to preserve it, there is the following principle for building them:
index information for the file should be computed based solely on the content of the file being indexed.
It means that when indices for "b.php" is built we:

  • can't access other indices (they are not built yet)
  • can't access the content of other files (to ensure granular invalidation).
  1. Since PHP is a dynamically typed language, we also keep the inferred types of a function in indexes in case of no explicit return type declaration.
<?php
function f() { // we still want to be able to work with this function calls in different files, so inferring type locally during indexing
return new A();
}

If the content of the current file is not enough to infer the full type of the function, we encode non-complete type information which will be used as a guide to resolve the full type during code analysis.

<?php
function f() {
return b(); // type of `\b()` is unknown during indexing, so in indexes there will be encoded information about the fact that we need to search for the return type of function `\b`.
            // this info can be used only after all indices are built including local code analysis
}

For this file, such guide may look like this:
Find function \b(); its return type is the return type of \f().
It will be stored on disk in indices, while the original content of the function f will be inaccessible during type resolve.

These 3 points can be summarized as the following statement: the more IDE is able to know about the code based solely on the content of the current file, the better support will be.

Now, with understanding of these points let's try to imagine how support of the current proposal may look like.
Consider the following code:

<?php
function a() {
$f = another_create(new F(), f(), 1);
return create(new A(), f(), 1)->x($f);
}

Suppose we want to index this file and provide a return type of a(). Since the type is not fully known during indexing, we will encode some guiding information on how to resolve this type.
If there are no generics, a guide may look like this:

  1. Find function create(), see its return type
  2. In this return type, find method x(). Its return value is the return type of the function \\a.

The important property of a computed guide is that its size is a linear function of the size of the method being indexed. That said, if we add one more method call into a chain, the size of this "guide" will only grow one step.

Now let's try to solve the same task in the potential presence of generics. It’s irrelevant what the actual declarations of these functions are because an index has to assume every possible scenario based on the content of only the current file. That said, a file will be indexed in a manner that every function call can potentially be a generic one:

  1. Find function 'create'. Since during indexing we don't know whether the function is a generic one and if so, on which parameter, we need to record all argument types:
    1. Class \A
    2. Result of calling \f()
    3. int
  2. For each of this type suppose that ‘create’ can be generic over this type, map these types to an incomplete type for ‘create’ functions, and for each of these types find the method x().
  3. Again, since for the method x we don't know whether the method is a generic one, we need to record all argument types (in this case, $f) and in doing so basically enumerate all pairs from
    parameters lists of 'another_create' call and ‘create’.

The size of such a guide grows way faster than linear progression and to the point where performance becomes unacceptable considering the size of real PHP projects.

This is the problem. There are potential ways to solve this both from the IDE side and via standard means:

  1. IDE may provide a hard limit on added types. From our experience, this produces unstable code analysis results (e.g. flaky behavior) which we consider a major problem.
  2. IDE may take into account generics only during the "global" inspections pass or invent new lifecycle. This will make the task pretty much the same as the task that current static analysis tools are facing. Unfortunately we didn't invent a lot in this direction before, since our users mostly benefit from "on-the-fly" analysis. As a result, a huge effort is required here.
  3. The proposal itself may be adjusted to follow the earlier statement: the more IDE is able to know about the code based solely on the content of the current file, the better support will be.
    This is not an easy task to come up with such a proposal, so I don't have a concrete scheme that may suite us, but I do have general examples which may work and show possible paths to investigate:
    1. Some way to indicate from call-site that this function is a generic one and if so, on which type from. For this example, it may be some PHP syntax construct like create<F>(f()): from this signature, it's clear which type the function is returning.
    2. It may be relaxed a bit like create<>(f()): from this call, there is no way to tell which type is returned, but it serves an important purpose of marking generic calls.

Hope you find it useful.
Kirill.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions