-
Notifications
You must be signed in to change notification settings - Fork 185
Indexing became extremely slow #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That really is a lot. Suggestions? |
It would be nice to have some configurable memory limit. And have the language server smart enough to fit in this memory limit, i.e. If the memory limit is tiny, the language server should optimize the memory usage by moving parts to a file system cache. Our use case with Eclipse Che, is that the PHP language server runs as an agent in a Docker container with a total memory of 1-2 GB. There are more language severs (JSON, CSS, JavaScript, etc.), Che workspace agent, etc. I would imagine that the PHP language server in our case should be limited to 100-200 MB of memory. |
Regarding memory limit I think it would be nice to add command line argument like "--tcp" for socket connection and apply it to As for the performance problem I will take a look what is causing such slow down. |
We have some experience with indexing in Eclipse PDT / Zend Studio. It was impossible to hold all data in-memory. Even a single regular project using a popular framework go over 10,000 files. The nature of the Eclipse workspace allows multiple project at the same time and the memory usage goes really crazy. For long years we used an H2 database to store the data. H2 is an embedded relational database and as such it has some performance limitations. In the last year we switched to using Apache Lucene - a text-based search engine. It improve the performance significantly. Currently, Eclipse PDT has a much better performance than we have here, even without the latest changes which caused the big slow down. |
Moving structures to disk would help. Some ideas to remove RAM usage:
Any ideas regarding index time? We traverse all ASTs twice at index time. |
@kaloyan-raev I heard from other folks that rebuilding the Eclipse PDT index takes them ~3min. |
@mniewrzal I don't know why it is so extreme for you... I just tried it on Symfony + dependencies, which is 7100 files:
|
Let me clone magento |
Ok I will also check Symfony + dependencies to compare results. |
It depends on the project size and which version of Eclipse PDT. We have done lots of the optimizations in the last couple of years. The switch to Apache Lucene was done in PDT 4.0, which was released in June this year.
This is a relational database too. H2 has much better performance than SQLite. I would count more on a text-based search engine like Apache Lucene. I wonder what would be the Lucene equivalent in PHP? |
What do you mean by "text-based search engine"? Isn't this here about persisting the index, which is mostly used for resolving definitions (which currently works through the FQN)? In that case key/value would be the best performance, but it must be something the user doesn't have to install manually. Actually saving the objects in files wouldn't be too bad because it basically is key/value. |
Also I want to note that I had always had XDebug enabled, which is noted to be a performance hog in the PHPParser docs. We should compare this to non-xdebug and if it has a significant impact investigate dynamic php.ini files. Speaking of XDebug, has anyone ever used it to profile a PHP app? I would like to know which functions take the most CPU time. For example, if it is the file IO after all, I would like to add an async Benchmarks in CI would also be nice. Could add projects like magento as git submodules to |
File content should be kept in memory only for those files, which the language server received a |
@mniewrzal Just tested magento. It is true that it is dog-slow. For me the LS crashed at around the 20,000th file, which was several minutes into parsing already. But RAM usage isn't nearly as high as you experienced it, it was <600MB. |
Interesting :) Are you using clean version from master or maybe you have some local changes? |
My bad, I was working on your PHP_CodeSniffer branch. You are right, I'm now at file 4500/24226, 3 min parsing, 1,3GB RAM usage. |
One more optimization that can be done is to exclude some folders from indexing, especially those containing tests. I did some file counting on the Magento project: $ find . -type f -name '*.php' | wc -l
24225
$ find . -type f -name '*.php' | grep "[Tt]est" | wc -l
9035 9K out of 24K files seems to be test files. This is up to 37% of CPU time and memory if we avoid indexing them. The exclusion pattern should be configurable with some meaningful default. |
Personally I want global symbol search to also work for tests though. |
Sure, this is why it should be configurable. Some IDEs (like Eclipse) provide users with the ability to configure which folders are "source folders" and which not in a project. Such IDE should ask the language server to index only the source folders, but skip the rest. This would be just empowering users to optimize performance when working with huge projects like Magento. |
With latest master there is problem with indexing large PHP file e.g. https://github.com/composer/composer/blob/master/tests/Composer/Test/Autoload/Fixtures/classmap/LargeClass.php
|
@mniewrzal with unlimited memory limit? These large files definitely dont need to be parsed. Can we somehow catch the error? Or implement the support for ignoring files with globs, like |
My bad, I didn't recompile plugin sources. I was testing memory limit parameter and I had 256M limitation. Now everything is working like before. But I think it would be good to think about some limitation for PHP file size. This one file is bumping memory consumption from 18MiB to 542MiB for whole composer project. |
Sure, a simple |
I think this one can be closed. Another issues can be processed with separate bug. |
After recent changes indexing performance decreased and memory consumption increased extremely. After indexing 3k-4k files slow down is more visible.
Result for magento (24k files) project:
All PHP files parsed in 2904 seconds. 6123 MiB allocated.
Previous those values were about 100s and 600MiB.
The text was updated successfully, but these errors were encountered: