Skip to content

Narrow search range for tree-sitter queries #1769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pokey opened this issue Aug 11, 2023 · 11 comments · Fixed by #2714
Closed

Narrow search range for tree-sitter queries #1769

pokey opened this issue Aug 11, 2023 · 11 comments · Fixed by #2714
Assignees
Labels
code quality Improvements to code quality enhancement New feature or request

Comments

@pokey
Copy link
Member

pokey commented Aug 11, 2023

In #1800, we introduced a workaround for tree-sitter/tree-sitter#2484, which means that we no longer restrict the search range for our tree-sitter queries. That workaround has no impact on correctness, but could have performance implications for large files, as we get all matches and then filter them after the fact, rather than restricting to the range we care about. Once tree-sitter/tree-sitter#2484 is fixed, we'd like to reactivate our code that restricts the search range, by rolling back #1800

@pokey pokey added enhancement New feature or request code quality Improvements to code quality labels Aug 11, 2023
@josharian
Copy link
Collaborator

In the meantime, If the file is particularly large, it might be better to disable treesitter scopes entirely, similarly to how VS code disable syntax highlighting on massive files.

@pokey
Copy link
Member Author

pokey commented Aug 11, 2023

Fwiw we're already parsing the file; this issue is just about running the tree query, which is likely many orders of magnitude faster

@pokey pokey mentioned this issue Sep 12, 2023
@josharian
Copy link
Collaborator

#1889 removes some of the code that we will need to resume restricting the search tree.

@pokey
Copy link
Member Author

pokey commented Jan 9, 2025

@AndreasArvidsson while you're looking into performance stuff, fwiw it appears tree-sitter/tree-sitter#2484 is fixed so we should be able to start restricting tree-sitter query range again. I'm not sure it's urgent but figured I'd flag since you're very familiar with performance right now

@AndreasArvidsson AndreasArvidsson self-assigned this Jan 9, 2025
@AndreasArvidsson
Copy link
Member

AndreasArvidsson commented Jan 9, 2025

@AndreasArvidsson while you're looking into performance stuff, fwiw it appears tree-sitter/tree-sitter#2484 is fixed so we should be able to start restricting tree-sitter query range again. I'm not sure it's urgent but figured I'd flag since you're very familiar with performance right now

Doesn't look like it's fixed actually. They just closed it.
I can't reproduce this issue in the playground which also uses web-tree-sitter.

I rolled back the changes and tried to run our test and they are indeed failing.

@pokey
Copy link
Member Author

pokey commented Jan 9, 2025

Strange. Possible we're on an old web-tree-sitter in vscode parse tree. I'll bump at some point and we can try again

@AndreasArvidsson
Copy link
Member

Please do

@pokey
Copy link
Member Author

pokey commented Jan 10, 2025

Done; hopefully it works now 🤞

@AndreasArvidsson
Copy link
Member

AndreasArvidsson commented Jan 10, 2025

No go. Still 4 test that are failing. I will have a look at the position logic to make sure we haven't done anything incorrect.

@AndreasArvidsson
Copy link
Member

AndreasArvidsson commented Jan 10, 2025

Oh I think it might be our logic in getQuerySearchRange that is failing now.

Edit:

Looks like the each capture needs to be overlapping the startPosition-endPosition range. Take this example:
bbb = () => {};|

We are here matching two different captures. We have the field definition where we start our content range and we have the semicolon where we end our content range. getQuerySearchRange takes our position and extends it one character to the left and to the right so we end up with:

bbb = () => {}|;
|

That means that we are not overlapping with the field capture and the pattern is not matched

@AndreasArvidsson
Copy link
Member

AndreasArvidsson commented Jan 10, 2025

This patten does not work

(
  (field_definition
    value: (arrow_function)
  ) @namedFunction.start
  .
  ";"? @namedFunction.end
)

But this does work:

(_
  (field_definition
    value: (arrow_function)
  ) @namedFunction.start
  .
  ";"? @namedFunction.end
) 

The only difference is the inclusion of the wildcard at the top level. Without that we don't have a common ancestor for all captures.

github-merge-queue bot pushed a commit that referenced this issue Jan 10, 2025
…ies (#2714)

Today we generate all captures in the entire document whenever we do a
Tree sitter query. This is of course quite expensive for large documents
and it would be better if we could narrow the search range. That is
what's implemented in this pull request.

I also found some incorrect scm patterns that broke with this
implementation that I have fixed.

Fixes #1769

## Checklist

- [/] I have added
[tests](https://www.cursorless.org/docs/contributing/test-case-recorder/)
- [/] I have updated the
[docs](https://github.com/cursorless-dev/cursorless/tree/main/docs) and
[cheatsheet](https://github.com/cursorless-dev/cursorless/tree/main/cursorless-talon/src/cheatsheet)
- [/] I have not broken the cheatsheet

---------

Co-authored-by: Phil Cohen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code quality Improvements to code quality enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants