Skip to content

Caching between runs for better performance #530

@JDGrimes

Description

@JDGrimes

First let me thank you for this great tool. 👍

I've been using this on my PHP projects, and I've found that it can take a while to sniff the code, especially on larger projects with complex configurations. Performance will naturally be determined largely by how well the sniffs used are written. However, I think that performance could be increased by caching the hash signatures of the files being sniffed. Then only those files which have changes since the last sniff was conducted would need to be sniffed (there are some caveats which I'll get to in a moment). This wouldn't improve the performance of the initial sniffing (and might even degrade it slightly), but would drastically improve performance for latter sniffings.

As I noted above, there are some caveats:

  • The cache needs to be invalidated when the configuration changes. This would be when the XML config file is edited, for example, and possibly at other times as well. This could be facilitated by saving a hash signature of these configuration settings to the cache, and checking whether this matches the current configuration when the cache is being loaded.
  • If a file has errors, the errors still need to get reported on subsequent sniffings even if the file hasn't changed. This could be done by not caching files with errors, or by caching a list of the errors found and replaying them to the user without sniffing those files.

There are probably other things I haven't thought of, maybe regarding interactive mode, reports, or automatic fixing, all of which I am unfamiliar with. And there would probably need to be an easy way for the user to bypass the cache as needed.

There are probably also other things that could be cached between runs on a project as well.

Exactly how the cache is saved is up to you. I was thinking of a .phpcs-cache in the root of the project being sniffed that would contain the cache represented as a JSON object.

If this is something that you think could be done, I'd be happy to work up a PR if you'll give me a little guidance on how you'd like this implemented.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions