Change scopeType matchers to rely on tree-sitter style scheme queries #616

Will-Sommers · 2022-03-23T14:15:20Z

Background

Currently Cursorless uses a custom pattern definition DSL alongside a set of helper functions in nodeMatcher.ts to match various scope types, such as item within a list or argue within a function definition or function invocation.

The tree-sitter project also provides a DSL, written in Scheme which allows a user to query for patterns within syntax trees. Here's a link to the docs and an example usage in JS via the web-tree-sitter project. Each query then is allowed to assign a name to a node, such as @comment or @punctuation.bracket:

(comment) @comment

[
  "("
  ")"
  "["
  "]"
  "{"
  "}"
  "%w("
  "%i("
] @punctuation.bracket

The name can then be read or asserted against.

The thought is that moving towards this approach will be more expressive out of the box. Additionally, many other projects including Neovim and Helix rely on queries for syntax highlighting as well as indentation which might help to make the incremental work for adding a new language a little bit simpler, since there are already partial or full definitions to work from. In particular, Helix already uses these queries for their textobjects, which is a simplified version of Cursorless scope types. Here's an example of a set of textobject definitions; they exist for several other languages as well

The Work

Create a queries directory with a subdirectory for each language, eg queries/python, etc
Create a query file for a language that provides queries for some or all of these ScopeTypes, placing the file in queries/<language>/scopeTypes.scm
Add the ability to load queries on a per-language basis
Queries occur on the tree level (SyntaxNode.Tree.Language) and so are top down rather than bottom up as cursorless node matchers currently work.
Should a query have successful matches, return the smallest range containing the input selection
Note that for some of the auxiliary definitions listed below (eg @<scopeType>.searchScope) we first find a match, and then search within that range

The definitions

Default query tag is just @<scopeType>, so eg @namedFunction
In addition, we support a few other queries:
- @<scopeType>.removalRange indicates a different range that should be used for removal
- @<scopeType>.domain indicates that we should first expand to the smallest containing match for this tag and then search for a rooted instance of @<scopeType> within this region. The canonical example for this one is enabling take value from within the key in a map: we'd set @collectionItem.domain to be the containing pair
- @<scopeType>.iterationScope indicates that when user says "every <scopeType>", we should first expand to the smallest instance of this tag, and then yield all top-level instances of @<scopeType> within this range. Here, top-level means not contained by any other match within the search range. Also, note that when finding the instances in the range, we should use @<scopeType>.domain if it exists. See below for an explanation
- @<scopeType>.interior is used by excludeInterior and interiorOnly stages (see update inside / outside #254)

Migration notes

This will require a replacement of each of the language matcher files with a scopeTypes.scm definition. For this reason, we will want to support both paths while the migration occurs. We can keep doing continuous delivery during migration because every language other than C# is well tested.

Questions

Do we want to change our term scopeType to textObject? That is the term used in both nvim tree-sitter, helix, and by redstart voice
Better term for @<scopeType>.iterationScope?
- @<scopeType>.parent?
How to handle argument lists and collection items? I have a feeling we'll be repeating , stuff for removal ranges a lot. I wonder if we want to add Toml configuration for languages where we can indicate scopes that should be handled as comma-separated lists. Along this direction, it's worth thinking about the connection to Support generic comma-separated lists #357
Do we want to support custom queries? Here's how neovim does it
Do we still want to support the "every" in cases where a scope doesn't explicitly specify a @<scopeType>.domain? We do that today by just iterating the parent. Might be useful to keep this one as a fallback 🤷‍♂️

Challenging cases

Why we need to use `@<scopeType>.domain` when searching within `@<scopeType>.iterationScope`

Consider the following case:

{
   foo: {
      bar: "baz"
   }
}

If the user says "take every key fine", we want to just return foo, excluding the nested key bar. In this case key.iterationScope is object and key.domain is pair. If we just looked for instances of key within the object, we'd get the nested key as well. However, if we search for top-level pair objects we won't, as desired

Why `@<scopeType>` must be rooted within `@<scopeType>.domain`

We can actually use the same code example as above:

{
   foo: {|
      bar: "baz"
   }
}

If the user says "take key" with the cursor at the indicated position (after second opening bracket), we want to select foo. We first expand to the containing pair, as that is the definition of key.domain. Then we need to find the key. If we just look for top-level keys (ie not contained by other keys), we'll end up with both foo and bar. If we require that the key be rooted within the pair, that won't happen

Fwiw, we could possibly instead exclude any @<scopeType> matches which are contained within a lower @<scopeType>.domain

Resources

Helix documentation
Helix implementation
Helix queries
nvim-treesitter documentation
nvim-treesitter implementation (might actually be in neovim itself)
nvim-treesitter queries
neovim documentation
neovim implementation
- https://github.com/neovim/neovim/blob/master/runtime/lua/vim/treesitter.lua
- https://github.com/neovim/neovim/tree/master/runtime/lua/vim/treesitter
neovim tests

Addenda:

The query syntax also allows for the injection of information apart from the name of a node and using this DSL will likely be the approach used to support multi-language documents.
Use tree-sitter queries for smart indentation #685

Attributions

👋 Big H/T 🎩 to @wenkokke for the original idea

The text was updated successfully, but these errors were encountered:

wenkokke · 2022-03-23T17:50:24Z

@<scopeType>.removalRange indicates a different range that should be used for removal

Using things like .removalRange has as a disadvantage that we can only include a node in one such range.

wenkokke · 2022-03-23T17:54:12Z

@<scopeType>.searchScope indicates that we should first expand to the smallest containing match for this tag and then search for @<scopeType> within this region. The canonical example for this one is enabling take value from within the key in a map: we'd set @collectionItem.searchScope to be the containing pair

My proposal featured a simplified query language for delimiters, which we search upwards from the current token. It would be good to somehow merge these proposals, to have a concept like “all results for query Q2 within the node returned by Q1 containing the mark”.

wenkokke · 2022-03-23T17:57:17Z

@<scopeType>.iterationScope indicates that when user says "every ", we should first expand to the smallest instance of this tag, and then yield all top-level instances of @<scopeType> within this range.

I propose we do a slightly different thing, using the delimiter mechanism described above. If we run Q2 within the delimiting scope, then “every” simply means “all matches” and “ordinal” means the ordinal’th match. The default would be to again return the node which contains the mark.

pokey · 2022-03-23T17:59:27Z

@<scopeType>.removalRange indicates a different range that should be used for removal

Using things like .removalRange has as a disadvantage that we can only include a node in one such range.

Sorry not sure I follow. Can you elaborate on the problem?

pokey · 2022-03-23T18:00:27Z

query language for delimiters

Can you elaborate on what you mean by this? What kind of delimiters?

wenkokke · 2022-03-23T20:16:41Z

query language for delimiters

Can you elaborate on what you mean by this? What kind of delimiters?

For, e.g., function parameters in a definition in Haskell, I search upwards to the first function definition node, which will delimit the second query, a downwards search for parameters rooted at that definition.

I think this achieves something similar to @Will-Sommers suggestion of using an .iterationScope annotation, but theirs is cleaner, since it manages to use a single query and a single query language.

Theirs would search the entire document for all function definitions with parameters, then grab the one which contains the mark, and then (probably could) return a list of captures corresponding to the parameters, either taking every, the n’th, or by default the one containing the mark.

Would even, I just realised, open up the suggestion of saying things like “take next arg” to take the parameter one after the one which contains the cursor, and other such relative commands.

pokey · 2022-03-23T21:05:53Z

query language for delimiters

Can you elaborate on what you mean by this? What kind of delimiters?

For, e.g., function parameters in a definition in Haskell, I search upwards to the first function definition node, which will delimit the second query, a downwards search for parameters rooted at that definition.

I think this achieves something similar to @Will-Sommers suggestion of using an .iterationScope annotation, but theirs is cleaner, since it manages to use a single query and a single query language.

Theirs would search the entire document for all function definitions with parameters, then grab the one which contains the mark, and then (probably could) return a list of captures corresponding to the parameters, either taking every, the n’th, or by default the one containing the mark.

Would even, I just realised, open up the suggestion of saying things like “take next arg” to take the parameter one after the one which contains the cursor, and other such relative commands.

Ok I'm confused: are you suggesting that we just do what's described in the issue, or are you proposing that we change something?

pokey · 2022-03-23T21:08:18Z

@Will-Sommers maybe tomorrow let's go through the exercise of writing a language definition? I think we might be able to make some headway in an hour and it should help to make things more concrete

wenkokke · 2022-03-23T21:14:49Z

Ok I'm confused: are you suggesting that we just do what's described in the issue, or are you proposing that we change something?

I propose we use @Will-Sommers approach of marking — what I call — delimiters using a named capture, then select all matches where the delimiter is the smallest possible region containing the mark, and use my approach to assign captures to every/ordinal/default.

pokey · 2022-03-23T21:16:55Z

How is that different from what's described in the issue?

pokey · 2022-03-23T21:18:40Z

Fwiw here is @wenkokke's original proposal:

Cursorless NodeFinder.pdf

And looks like here she started putting together a draft implementation

pokey · 2022-03-23T22:00:54Z

Fwiw the reason that I was leaning away from rooted queries and towards top-level matches was so that we could support "take every key" both with nothing selected, which would expand to containing map, and with something selected, which would search within selection. If we use top-level matches, then we can handle both in the same way

pokey · 2022-03-23T22:12:10Z

Ok updated issue description to clarify iteration stuff a bit

pokey · 2022-03-24T12:59:53Z

Ok I captured our mob programming session in draft PR #620

wenkokke · 2022-03-24T13:13:43Z

Fwiw the reason that I was leaning away from rooted queries and towards top-level matches was so that we could support "take every key" both with nothing selected, which would expand to containing map, and with something selected, which would search within selection. If we use top-level matches, then we can handle both in the same way

What about something like “take every key in funk red made”, where we have “in” as a composing keyword to do the second search (every key) in the scope of the first (funk red made).

That way searches could have a default delimiter set by, e.g, @<scope_type>.delimiter — e.g. for key it could default to the smallest map or block that contains the mark — and we could override it with “in”, e.g., “take every key in funk red made” would set the delimiter to the smallest function definition, and selects all keys in it, even if the mark was in a, e.g., a map.

pokey · 2022-03-24T13:35:47Z

Yes, that's the idea (see here, here, and here), but the issue is that you could end up with subtly different behaviour if we're not careful, because that style of command is unrooted, because the parent doesn't come from a query

pokey · 2022-03-24T13:49:29Z

Also @wenkokke can we align on naming here? I'm using @<scope_type>.searchScope instead of your @<scope_type>.delimiter. When I hear the term "delimiter", I assume something like a ( or ,. To me searchScope is clearer. Absolutely open to discussing terms, but let's try to at least align to avoid confusion 😅

pokey · 2022-03-24T13:51:40Z

Basically the term refers to the maximal range within the document for which the given scope is the canonical instance of that scope type. So for the key scope type, that is the pair that contains it. I believe you're referring to that as @<scope_type>.delimiter. I'm referring to it as @<scope_type>.searchScope. Maybe @<scope_type>.container?

wenkokke · 2022-03-26T18:59:10Z

Also @wenkokke can we align on naming here? I'm using @<scope_type>.searchScope instead of your @<scope_type>.delimiter. When I hear the term "delimiter", I assume something like a ( or ,. To me searchScope is clearer. Absolutely open to discussing terms, but let's try to at least align to avoid confusion 😅

I’m using delimiter in the sense of (parse) trees or e.g. delimited continuations, i.e., a node which delimits a subtree, but iteratorScope is probably more generally clear? I’d prefer to use something like “punctuation” for what you call delimiters, but I’m fine either way.

wenkokke · 2022-03-26T19:00:15Z

I’m using “delimiter” the way @Will-Sommers uses “iteratorScope”, I think? I don’t fully understand “searchScope”?

Will-Sommers · 2022-04-08T15:52:35Z

Just a small note — looks like nvim has added some stuff here as well:
https://github.com/nvim-treesitter/nvim-treesitter-textobjects/blob/master/queries/javascript/textobjects.scm#L81

pokey · 2022-04-08T17:27:00Z

Ooh nice find. Interesting to see how they've handled ranges

Will-Sommers · 2022-04-12T14:49:06Z

https://github.com/nvim-treesitter/nvim-treesitter/blob/a838a35b2b5514cc341edfe398dc5036df554ac4/lua/nvim-treesitter/query.lua#L233-L246

Will-Sommers · 2022-04-12T16:29:24Z

Heyo, so it does look like user supported predicates are supported, in a sense. What ends up happening is that a textPredicate is registered an returned on each query. An operator and then a set of operands is set and it looks like we can match either off of a capture or a string match. The operand would dictate the behavior which is implemented on the TS side.

From that point, the ball is back in our court(see above link).

Going to play around with implementing #make-range! and update the draft PR.

pokey · 2022-04-12T18:56:01Z

Yep sounds good to me. I think ideally we'll just need to implement a few simple generic custom functions

Will-Sommers · 2022-05-10T21:31:22Z

Heyo @pokey —

Just to catch you up on my thoughts. The PR I'm working on will not define searchScope, nor will it implement any matchers which would require a search scope, eg. argue, item, key. Instead, I'm going to keep the current scopes in the file(see below). I've implemented the default "parent" matching, relying on it only if a searchScope isn't defined. I'll clean this up and get a PR up.

All of this will happen within the returned matcher function and we'll implement it there, adding siblings?: boolean to NodeValueMatcher. I think this will be a lot cleaner than relying non-encapsulated behavior within getNodeMatcher.

I'm working on some tests for the code itself now. I'm not sure the philosophy related to unit tests(most everything looks like functional integration tests rather than unit.

(comment) @comment
(method) @function
(if) @if
(call) @call
(method) @namedFunction
(method
  name: (_) @functionName)
(hash) @map

[
  (array)
  (string_array)
  (symbol_array)
] @list

(regex) @regex

(class) @class
(class
  name: (_) @className)

(assignment
  left: (_) @name)
(operator_assignment
  left: (_) @name)
(class
  name: (_) @name)
(method
  name: (_) @name)

pokey · 2022-05-26T19:44:39Z

Just chatted with @AndreasArvidsson and we came up with a nice name for searchScope: enclosure. I think that captures the idea much more clearly. So for example, in the query

(operator_assignment
  left: (_) @name) @name.enclosure

Then internally the object that describes what came back would have enclosureRange and extractionRange, the latter of which would be @name above

Will-Sommers · 2022-05-26T20:02:24Z

@pokey @AndreasArvidsson — this works for me but I'll likely bring it into another PR after the initial one was merged.

Here's the diff of my work on it, the @name.enclosure is flipped in my version but I think you're right in having it this way. It sounds like I might need to read up on the compositional modifiers PR to see how this will tie in with the node(or set of nodes) returned from the matcher.

pokey · 2022-05-27T13:56:56Z

other possible terms for enclosure:

dominion
domain

the notion is that we want to indicate the range within which this scope is the canonical instance of the given scope type. So eg for key, its domain is pair: within a given pair it is the canonical key

wenkokke · 2022-05-27T16:04:48Z

Just chatted with @AndreasArvidsson and we came up with a nice name for searchScope: enclosure. I think that captures the idea much more clearly. So for example, in the query
(operator_assignment
  left: (_) @name) @name.enclosure
Then internally the object that describes what came back would have enclosureRange and extractionRange, the latter of which would be @name above

Why not just @name.scope?
(Because it indicates the scope in which @name is valid.)

My second preference would be domain. Dominion is reserved for @🐯 or @simba. Using enclosure with @🐯 or @simba is animal cruelty.

wenkokke · 2022-05-27T16:10:14Z

Otherwise, what is currently preventing the first pull request for this issue from being merged? Do we have a to do list?

pokey · 2022-05-27T16:38:37Z

scope is a bit too generic; we ended up aligning on domain, so glad to hear you like it as well. Fwiw we're doing some pairing on this one; happy to loop you in if interested

The PR needs some various fixes; I think @Will-Sommers has a todo list based on our most recent pairing session

pokey · 2022-05-27T16:39:22Z

Dominion is reserved for @🐯 or @simba. Using enclosure with @🐯 or @simba is animal cruelty.

😄

- Partially addresses #616 - Partially addresses #436 - Depends on #1396 ## Todo - [x] **[DISCUSS]** What to do about fallback `iterationScope`? That's the only thing that is a regression here. - [x] File issues for FIXMEs - [x] File issue for defining iteration scopes. Can probably reuse most of the code from the regular scope handler other than creating the target - [x] File issue to add unit tests for scope handlers - [x] File issue to add some Python scope types where multiple can end at the same point (due to lack of closing brackets) - [x] Add test that checks no scope types are duplicated between legacy and new definition, or file issue to add test - [x] File PR for my 7783da6 (Add support for domain, leading, trailing, interior) #1427 - [x] Look through comments on this thread for anything worth filing / doing - [x] Open as new PR? - [x] Remove extraneous test cases - [x] Double check #629 (comment); a lot of those tests we already have for the generic modifier code - [x] Make sure changes to parse-tree-extension are shipped - [x] Close #785 if we fix that - [x] Comment on #484 saying the process has started and providing link to example - [x] Close #797 if we fix that --------- Co-authored-by: Pokey Rule <[email protected]>

AndreasArvidsson · 2025-01-25T03:56:25Z

We are today relying on Tree sitter queries and most languages are migrated

pokey added enhancement New feature or request code quality Improvements to code quality labels Mar 23, 2022

pokey added this to the 0.27.0 milestone Mar 23, 2022

pokey self-assigned this Mar 24, 2022

pokey mentioned this issue Mar 24, 2022

Change scopeType matchers to rely on tree-sitter style scheme queries #620

Closed

1 task

pokey mentioned this issue Mar 24, 2022

Add language support for tree-sitter query files #621

Closed

pokey mentioned this issue May 27, 2022

Introduce tree-sitter queries for syntactic scopes #629

Merged

15 tasks

This was referenced Jun 8, 2022

"every key" and "every value" modifiers are broken for html #591

Closed

update inside / outside #254

Open

pokey mentioned this issue Jul 18, 2022

Implement autoformatter for tree-sitter query language #878

Open

pokey mentioned this issue Apr 12, 2023

Automatically migrate old tree-sitter patterns to new query language #1419

Closed

pokey modified the milestones: Short list, Long list Jul 6, 2023

AndreasArvidsson closed this as completed Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change scopeType matchers to rely on tree-sitter style scheme queries #616

Change scopeType matchers to rely on tree-sitter style scheme queries #616

Will-Sommers commented Mar 23, 2022 •

edited by pokey

Loading

wenkokke commented Mar 23, 2022

wenkokke commented Mar 23, 2022

wenkokke commented Mar 23, 2022 •

edited

Loading

pokey commented Mar 23, 2022

pokey commented Mar 23, 2022

wenkokke commented Mar 23, 2022 •

edited

Loading

pokey commented Mar 23, 2022 •

edited

Loading

pokey commented Mar 23, 2022

wenkokke commented Mar 23, 2022

pokey commented Mar 23, 2022

pokey commented Mar 23, 2022 •

edited

Loading

pokey commented Mar 23, 2022

pokey commented Mar 23, 2022 •

edited

Loading

pokey commented Mar 24, 2022

wenkokke commented Mar 24, 2022 •

edited

Loading

pokey commented Mar 24, 2022

pokey commented Mar 24, 2022

pokey commented Mar 24, 2022 •

edited

Loading

wenkokke commented Mar 26, 2022

wenkokke commented Mar 26, 2022

Will-Sommers commented Apr 8, 2022

pokey commented Apr 8, 2022

Will-Sommers commented Apr 12, 2022

Will-Sommers commented Apr 12, 2022 •

edited

Loading

pokey commented Apr 12, 2022

Will-Sommers commented May 10, 2022

pokey commented May 26, 2022

Will-Sommers commented May 26, 2022 •

edited

Loading

pokey commented May 27, 2022

wenkokke commented May 27, 2022 •

edited

Loading

wenkokke commented May 27, 2022 •

edited

Loading

pokey commented May 27, 2022

pokey commented May 27, 2022

AndreasArvidsson commented Jan 25, 2025

Change scopeType matchers to rely on tree-sitter style scheme queries #616

Change scopeType matchers to rely on tree-sitter style scheme queries #616

Comments

Will-Sommers commented Mar 23, 2022 • edited by pokey Loading

Background

The Work

The definitions

Migration notes

Questions

Challenging cases

Why we need to use @<scopeType>.domain when searching within @<scopeType>.iterationScope

Why @<scopeType> must be rooted within @<scopeType>.domain

Resources

Addenda:

Attributions

wenkokke commented Mar 23, 2022

wenkokke commented Mar 23, 2022

wenkokke commented Mar 23, 2022 • edited Loading

pokey commented Mar 23, 2022

pokey commented Mar 23, 2022

wenkokke commented Mar 23, 2022 • edited Loading

pokey commented Mar 23, 2022 • edited Loading

pokey commented Mar 23, 2022

wenkokke commented Mar 23, 2022

pokey commented Mar 23, 2022

pokey commented Mar 23, 2022 • edited Loading

pokey commented Mar 23, 2022

pokey commented Mar 23, 2022 • edited Loading

pokey commented Mar 24, 2022

wenkokke commented Mar 24, 2022 • edited Loading

pokey commented Mar 24, 2022

pokey commented Mar 24, 2022

pokey commented Mar 24, 2022 • edited Loading

wenkokke commented Mar 26, 2022

wenkokke commented Mar 26, 2022

Will-Sommers commented Apr 8, 2022

pokey commented Apr 8, 2022

Will-Sommers commented Apr 12, 2022

Will-Sommers commented Apr 12, 2022 • edited Loading

pokey commented Apr 12, 2022

Will-Sommers commented May 10, 2022

pokey commented May 26, 2022

Will-Sommers commented May 26, 2022 • edited Loading

pokey commented May 27, 2022

wenkokke commented May 27, 2022 • edited Loading

wenkokke commented May 27, 2022 • edited Loading

pokey commented May 27, 2022

pokey commented May 27, 2022

AndreasArvidsson commented Jan 25, 2025

Will-Sommers commented Mar 23, 2022 •

edited by pokey

Loading

Why we need to use `@<scopeType>.domain` when searching within `@<scopeType>.iterationScope`

Why `@<scopeType>` must be rooted within `@<scopeType>.domain`

wenkokke commented Mar 23, 2022 •

edited

Loading

wenkokke commented Mar 23, 2022 •

edited

Loading

pokey commented Mar 23, 2022 •

edited

Loading

pokey commented Mar 23, 2022 •

edited

Loading

pokey commented Mar 23, 2022 •

edited

Loading

wenkokke commented Mar 24, 2022 •

edited

Loading

pokey commented Mar 24, 2022 •

edited

Loading

Will-Sommers commented Apr 12, 2022 •

edited

Loading

Will-Sommers commented May 26, 2022 •

edited

Loading

wenkokke commented May 27, 2022 •

edited

Loading

wenkokke commented May 27, 2022 •

edited

Loading