Skip to content
This repository was archived by the owner on Apr 14, 2022. It is now read-only.

Symbol indexer #521

Merged
merged 10 commits into from
Jan 16, 2019
Merged

Symbol indexer #521

merged 10 commits into from
Jan 16, 2019

Conversation

jakebailey
Copy link
Member

@jakebailey jakebailey commented Jan 5, 2019

For #433.
Closes #483.
Closes #484.
Closes #485.

General overview:

  • Indexing happens on each parse, so with the current LS, every file in the workspace gets indexed. This will need to be reworked.
  • All symbol queries go to the index, I have not yet approached Query symbol index for document symbols if analysis on document is not complete #486.
  • I do some extra heuristics to highlight some of the more interesting symbols, namely:
    • __init__ in a class will be marked as a constructor.
    • Functions annotated with a property decorator are given the kind Property.
    • Functions that fit the pattern __.*__ are given the kind Operator.
    • Other functions in a class are given the kind Method.
    • Variables that look like UPPERCASE_UNDERSCORE_123 at the top level or directly inside a class are given the kind Constant.
  • Any parse updates the index. I don't do any version management other than to look up the AST on the parse complete event.
  • VS-specific "functionKind" is preserved.

Things to do in future PRs:

  • Fallback to analysis when it's present. I left all of the functions previously used there.
  • Using the import system to answer the question "is this thing being imported a module or not", probably augmented/modified outside the indexer.
  • Parse version management.
  • Asynchronous behavior (switching the dictionary to be a map from Uri to TaskCompletionSource<HierarchicalSymbol>).


namespace Microsoft.PythonTools.Analysis.Indexing {
internal interface ISymbolIndex {
Task<IEnumerable<HierarchicalSymbol>> HierarchicalDocumentSymbolsAsync(Uri uri, CancellationToken token = default);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task<IEnumerable<T>> has weird semantic meaning, something like Lazy<Lazy<T>>. It is better to have Task<IReadOnlyList<T>>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know. I currently can't do a task which gives back a read only list because I'm using (perhaps abusing) yield return to lazily traverse the index, then a Take at the use of the enumerable to limit how far it goes based on the limits provided by the LS. IAsyncEnumerable hasn't left the proposal phase yet, so I'll have to come up with something else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually use Task?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely I'll just end up passing limits down to the index when doing lookups, it just feels a lot less clean and inflexible. In any case, everything here in the first commit is just prototype code that I wanted to push somewhere so I wasn't sitting on it anymore. 😃

Copy link
Member Author

@jakebailey jakebailey Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed your question, sorry.

I don't actually use Task, but Mikhail had suggested sticking async on things (and it made sense, especially given how async the new analyzer is turning out). The future would be to have another implementation of this interface that instead goes to something like a sqlite database on disk or similar, so I wasn't sure if it was a good idea to make this completely synchronous.

The other thing that I was considering was cancellation; If a user is searching for a symbol, I believe the editor will cancel the request and send a new one when the input changes, so I think I'd need to be passing the cancellation token down so I can allow it to throw when it's cancelled to stop the query. Unfortunately, that has the unfortunate effect of my IEnumerable being tied to that token, so if a user of this interface were to give the query result to something else, that something else may end up being cancelled unexpectedly where it doesn't make sense.

Perhaps I drop the async on this and make it fully synchronous instead, and make a new interface if we need something that's asynchronous (and have the service provide both?).

Copy link

@MikhailArkhipov MikhailArkhipov Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think indexer should be working similar to what async parser and analysis do. I.e.

  • Get notification on text changes (probably push rather than listening on events to be less dependent on the source),
  • Cancel current indexing on the changed document
  • Start internal indexing task asynchronously.
  • Fire event 'indexing complete' on every document and on overall end of indexing.

It can probably run indexing on several files in parallel. Indexing results on each document should be an immutable snapshot. Snapshot is getting replaced when indexing on the file is done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I probably don't understand what this method does. Is it a method that awaits for the most recent snapshot or it runs indexing over single document?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is just the query over the index's state. It does no indexing itself. This is somewhat similar to GetModuleVariables which the indexer will partially replace.

@AlexanderSher
Copy link
Contributor

Design question: is it possible that indexing of one file can trigger indexing of another file?

@jakebailey
Copy link
Member Author

Design question: is it possible that indexing of one file can trigger indexing of another file?

No, I don't think so. Symbols are not analysis dependent, at least not in the way I'm trying to implement it. It's a pure function from file content to symbols.

@MikhailArkhipov
Copy link

@AlexanderSher - no, it shouldn't as indexing only consumes changes.

@AlexanderSher
Copy link
Contributor

AlexanderSher commented Jan 7, 2019

no, it shouldn't as indexing only consumes changes.

Then I don't understand how indexing can be async, unless we plan to run on different parts of the document in parallel.
Public wrapper methods that allow some code to wait until indexing is complete will be async of course, but I'm not sure that those methods should be part of indexer, cause they will be parser-dependent.

@jakebailey
Copy link
Member Author

The end game here is that the indexer does its own parsing instead of relying on some event from the language server to give it the trees. When the new analyzer comes online, there will no longer be a parse event for every file in the project, so the indexer will need to do its own thing in lieu of being given everything. Technically speaking, that doesn't need to be async either, since the walker is fast and the process would generally be "read file, parse, update index", which is just a synchronous action.

There are two sides; the API for querying (ISymbolIndex), and then the other side which provides ASTs, the parsing, etc, which will need to be changed.

@AlexanderSher
Copy link
Contributor

Well, "read file" is an I/O operation, which should be async.

The end game here is that the indexer does its own parsing instead of relying on some event from the language server to give it the trees

Meaning we will be doing the same parsing work twice?

@MikhailArkhipov
Copy link

@AlexanderSher "Meaning we will be doing the same parsing work twice?"

Possibly, but not always. Indexer processes all files in the workspace. New analyzer is only loading open file and dependencies. So there can be AST Provider service where indexer can ask for an AST given URI and if it is not available, create its own. When file does become opened, it can switch then to use live AST and flip back to private when file is closed. Technically analyzer does not need to preserve the AST.

@AlexanderSher
Copy link
Contributor

Like in https://github.com/MikhailArkhipov/python-language-server/blob/analysis10/src/Analysis/Ast/Impl/Modules/PythonModule.cs#L171 and https://github.com/MikhailArkhipov/python-language-server/blob/analysis10/src/Analysis/Ast/Impl/Modules/PythonModule.cs#L244

Ok, so Parse(CancellationToken cancellationToken) runs parser.ParseFile synchronously, so async isn't for the parsing, but for the callers to wait for the newest AST. And here it needs to know more about those callers. For example:

  • If Update(IEnumerable<DocumentChangeSet> changes) is called two times, it is possible that two parser.ParseFile calls are executed concurrently and second one completes before the first one. As a result, GetAstAsync may await on the wrong task and will never get the right one - _parsingTask will be set to null.
  • If Update is called multiple times, we can get several parsers running concurrently, and I'm not sure if we can solve this with throttling.
  • If we have changes in several files (e.g., user has undone a rename refactoring), we shouldn't start analysis immediately after the first parse is completed.

And so on. That's why I thought that we need some kind of a service that takes care of all parsing requests and provides awaiters for the code that needs to wait for the most recent parse or index.

@MikhailArkhipov
Copy link

MikhailArkhipov commented Jan 7, 2019

Next update cancels prev parse. May be helpful to sprinkle parser with cancellation. But in RL parsing takes like 10-20ms tops so cancellation mostly means 'ignore the result'. Throttle is done in standard way in editors, i.e. parse on idle + ~200 ms (not yet implemented). Changes 99% come in a single active file, unless someone stats flipping across files and changing them. So # of parallel parsers depends more on # of imports that can be going in parallel. I haven't seen more ~2-3 so far.

@jakebailey jakebailey changed the title [WIP] Symbol indexer Symbol indexer Jan 15, 2019
@jakebailey
Copy link
Member Author

Dropping the WIP, since I think it's looking alright. Let me know what you think.

All of the old IMember code is still there in the LS symbols file; I didn't delete any of it even though it's no longer referenced.


public SymbolIndex() { }

public IEnumerable<HierarchicalSymbol> HierarchicalDocumentSymbols(Uri uri) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you can easily return IReadOnlyList<HierarchicalSymbol> here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, though my intention was that a future PR allows this data to be persisted in a local database, where not needing to collect the entire result beforehand. Though, I guess it's a bit pointless when the HierarchicalSymbol itself contains an IList<HierarchicalSymbol>.

…thand names, use queue instead of linked list
@jakebailey
Copy link
Member Author

We figured out that I forgot to specialize for-loops and generators, so I'm doing that now.

@jakebailey
Copy link
Member Author

Added for loops and comprehensions. For example:

image

This to me is much better than other outlines where s would get dumped in the wrong scope.

@jakebailey
Copy link
Member Author

Any last opinions on this?

@MikhailArkhipov
Copy link

Ship it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants