Skip to content

Commit 57f0efd

Browse files
committed
Support adding languages using only queries
1 parent 16b3844 commit 57f0efd

File tree

12 files changed

+55
-79
lines changed

12 files changed

+55
-79
lines changed

.github/PULL_REQUEST_TEMPLATE/new_programming_language.md

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Adds support for the `<language>` programming language
1111
- [ ] `"chuck arg"` with multiple arguments in list
1212
- [ ] `"chuck item"` with single argument in list
1313
- [ ] `"chuck item"` with multiple arguments in list
14+
- [ ] Added `@textFragment` captures. Usually you want to put these on comment and string nodes. This enables `"take round"` to work within comments and strings.
1415
- [ ] Added a test for `"change round"` inside a string, eg `"hello (there)"`
1516
- [ ] Supported` "type"` both for type annotations (eg `foo: string`) and declarations (eg `interface Foo {}`) (and added tests for this behaviour 😊)
1617
- [ ] Supported` "item"` both for map pairs and list entries (with tests of course)

docs/contributing/adding-a-new-language.md

+5-10
Original file line numberDiff line numberDiff line change
@@ -16,20 +16,15 @@ First a few notes / tips:
1616
- We suggest opening a draft PR as soon as possible to get early feedback. Please use the new language PR template either by adding `?template=new_programming_language` to the end of the URL you used to open the PR, or just by copying and pasting from the [template](https://github.com/cursorless-dev/cursorless/blob/main/.github/PULL_REQUEST_TEMPLATE/new_programming_language.md?plain=1) to your PR body, if that's easier.
1717
- We suggest adding tests as early as possible, after each language feature you add. Recording tests is quick and painless using the test case recorder described below. We promise 😇
1818

19-
Minimum changes that each language needs:
20-
21-
- new file in `packages/cursorless-engine/src/languages/yourCoolNewLanguage.ts`. Take a look at [existing languages](../../packages/cursorless-engine/src/languages) as a base. At its core you're implementing your language's version of the `nodeMatchers` const, mapping scope types found in [`SimpleScopeTypeType`](../api/modules/common_src_types_command_partialtargetdescriptor_types/#simplescopetypetype) with matching expressions that align with the parse tree output.
22-
- new entry in [`getNodeMatcher.ts:languageMatchers`](../../packages/cursorless-engine/src/languages/getNodeMatcher.ts), importing your new file above
23-
- new entry in [`constants.ts`](../../packages/cursorless-engine/src/languages/constants.ts)
24-
- new text fragment extractor (default is likely fine) in [`getTextFragmentExtractor.ts:textFragmentExtractors`](../../packages/cursorless-engine/src/languages/getTextFragmentExtractor.ts)
19+
To add a new language, you just need to add a `.scm` file to the [`queries` directory](../../queries). The `.scm` query format is documented [here](https://tree-sitter.github.io/tree-sitter/using-parsers#query-syntax).
2520

2621
The parse trees exposed by tree-sitter are often pretty close to what we're
2722
looking for, but we often need to look for specific patterns within the parse
28-
tree to get the scopes that the user expects. Fortunately, we have a
29-
domain-specific language that makes these definitions fairly compact.
23+
tree to get the scopes that the user expects. Fortunately, the tree-sitter query language makes these definitions fairly compact.
3024

31-
- Check out the [docs](parse-tree-patterns.md) for the syntax tree pattern
32-
matcher
25+
- Check out the [docs](https://tree-sitter.github.io/tree-sitter/using-parsers#query-syntax) for the query language.
26+
- Have a look at our custom query predicate operators in [`queryPredicateOperators.ts`](../../packages/cursorless-engine/src/languages/TreeSitterQuery/queryPredicateOperators.ts)
27+
- Look at the existing language definitions in the [`queries` directory](../../queries) for examples.
3328
- If you look in the debug console, you'll see debug output every time you move
3429
your cursor, which might be helpful.
3530
- You will likely want to look at `node-types.json` for your language, (eg [java](https://github.com/tree-sitter/tree-sitter-java/blob/master/src/node-types.json)). This file is generated from `grammar.js`, which might also be helpful to look at (eg [java](https://github.com/tree-sitter/tree-sitter-java/blob/master/grammar.js)).

docs/user/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ Note that if the mark is `"this"`, and you have multiple cursors, the modifiers
143143

144144
##### Syntactic scopes
145145

146-
For programming languages where Cursorless has rich parse tree support, we support modifiers that expand to the nearest containing function, class, etc. See [the source code](../../packages/cursorless-engine/src/languages/constants.ts) for a list of supported languages. Below is a list of supported scope types, keeping in mind that this table can sometimes lag behind the actual list. Your cheatsheet (say `"cursorless cheatsheet"` with VSCode focused) will have the most up-to-date list.
146+
For programming languages where Cursorless has rich parse tree support, we support modifiers that expand to the nearest containing function, class, etc. See [the source code](../../queries) for a list of supported languages. Some languages are still supported using our legacy implementation; those will be listed in [here](../../packages/cursorless-engine/src/languages/LegacyLanguageId.ts). Below is a list of supported scope types, keeping in mind that this table can sometimes lag behind the actual list. Your cheatsheet (say `"cursorless cheatsheet"` with VSCode focused) will have the most up-to-date list.
147147

148148
| Term | Syntactic element |
149149
| -------------- | --------------------------------------------------- |

packages/cursorless-engine/src/languages/LanguageDefinition.ts

+1-2
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ import { ide } from "../singletons/ide.singleton";
88
import { TreeSitter } from "../typings/TreeSitter";
99
import { TreeSitterQuery } from "./TreeSitterQuery";
1010
import { TEXT_FRAGMENT_CAPTURE_NAME } from "./captureNames";
11-
import { LanguageId } from "./constants";
1211

1312
/**
1413
* Represents a language definition for a single language, including the
@@ -35,7 +34,7 @@ export class LanguageDefinition {
3534
*/
3635
static create(
3736
treeSitter: TreeSitter,
38-
languageId: LanguageId,
37+
languageId: string,
3938
): LanguageDefinition | undefined {
4039
const queryPath = join(ide().assetsRoot, "queries", `${languageId}.scm`);
4140

packages/cursorless-engine/src/languages/LanguageDefinitions.ts

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
import { TreeSitter } from "..";
22
import { LanguageDefinition } from "./LanguageDefinition";
3-
import { LanguageId } from "./constants";
43

54
/**
65
* Sentinel value to indicate that a language doesn't have
@@ -43,7 +42,7 @@ export class LanguageDefinitions {
4342

4443
if (definition == null) {
4544
definition =
46-
LanguageDefinition.create(this.treeSitter, languageId as LanguageId) ??
45+
LanguageDefinition.create(this.treeSitter, languageId) ??
4746
LANGUAGE_UNDEFINED;
4847

4948
this.languageDefinitions.set(languageId, definition);
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
/**
2+
* The language IDs that we have full tree-sitter support for using our legacy
3+
* modifiers.
4+
*/
5+
export type LegacyLanguageId =
6+
| "c"
7+
| "clojure"
8+
| "cpp"
9+
| "css"
10+
| "csharp"
11+
| "go"
12+
| "html"
13+
| "java"
14+
| "javascript"
15+
| "javascriptreact"
16+
| "json"
17+
| "jsonc"
18+
| "latex"
19+
| "markdown"
20+
| "php"
21+
| "python"
22+
| "ruby"
23+
| "scala"
24+
| "scss"
25+
| "rust"
26+
| "typescript"
27+
| "typescriptreact"
28+
| "xml";

packages/cursorless-engine/src/languages/TreeSitterQuery/queryPredicateOperators.ts

+8-7
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,9 @@ import { HasSchema } from "./PredicateOperatorSchemaTypes";
66

77
/**
88
* A predicate operator that returns true if the node is not of the given type.
9-
* For example, `(not-type? string)` will match any node that is not a string.
10-
* It is acceptable to pass in multiple types, e.g. `(not-type? string comment)`.
9+
* For example, `(not-type? @foo string)` will reject the match if the `@foo`
10+
* capture is a `string` node. It is acceptable to pass in multiple types, e.g.
11+
* `(not-type? @foo string comment)`.
1112
*/
1213
class NotType extends QueryPredicateOperator<NotType> {
1314
name = "not-type?" as const;
@@ -19,9 +20,9 @@ class NotType extends QueryPredicateOperator<NotType> {
1920

2021
/**
2122
* A predicate operator that returns true if the node's parent is not of the
22-
* given type. For example, `(not-parent-type? string)` will match any node that
23-
* is not a child of a string. It is acceptable to pass in multiple types, e.g.
24-
* `(not-parent-type? string comment)`.
23+
* given type. For example, `(not-parent-type? @foo string)` will reject the
24+
* match if the `@foo` capture is a child of a `string` node. It is acceptable
25+
* to pass in multiple types, e.g. `(not-parent-type? @foo string comment)`.
2526
*/
2627
class NotParentType extends QueryPredicateOperator<NotParentType> {
2728
name = "not-parent-type?" as const;
@@ -33,8 +34,8 @@ class NotParentType extends QueryPredicateOperator<NotParentType> {
3334

3435
/**
3536
* A predicate operator that returns true if the node is the nth child of its
36-
* parent. For example, `(is-nth-child? 0)` will match the first child of any
37-
* node.
37+
* parent. For example, `(is-nth-child? @foo 0)` will reject the match if the
38+
* `@foo` capture is not the first child of its parent.
3839
*/
3940
class IsNthChild extends QueryPredicateOperator<IsNthChild> {
4041
name = "is-nth-child?" as const;

packages/cursorless-engine/src/languages/constants.ts

-37
This file was deleted.

packages/cursorless-engine/src/languages/getNodeMatcher.ts

+3-3
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ import {
99
import { notSupported } from "../util/nodeMatchers";
1010
import { selectionWithEditorFromRange } from "../util/selectionUtils";
1111
import clojure from "./clojure";
12-
import { SupportedLanguageId } from "./constants";
12+
import { LegacyLanguageId } from "./LegacyLanguageId";
1313
import cpp from "./cpp";
1414
import csharp from "./csharp";
1515
import go from "./go";
@@ -31,7 +31,7 @@ export function getNodeMatcher(
3131
scopeTypeType: SimpleScopeTypeType,
3232
includeSiblings: boolean,
3333
): NodeMatcher {
34-
const matchers = languageMatchers[languageId as SupportedLanguageId];
34+
const matchers = languageMatchers[languageId as LegacyLanguageId];
3535

3636
if (matchers == null) {
3737
throw new UnsupportedLanguageError(languageId);
@@ -51,7 +51,7 @@ export function getNodeMatcher(
5151
}
5252

5353
const languageMatchers: Record<
54-
SupportedLanguageId,
54+
LegacyLanguageId,
5555
Record<SimpleScopeTypeType, NodeMatcher>
5656
> = {
5757
c: cpp,

packages/cursorless-engine/src/languages/getTextFragmentExtractor.ts

+6-6
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ import type { SyntaxNode } from "web-tree-sitter";
33
import { SelectionWithEditor } from "../typings/Types";
44
import { notSupported } from "../util/nodeMatchers";
55
import { getNodeInternalRange, getNodeRange } from "../util/nodeSelectors";
6-
import { SupportedLanguageId } from "./constants";
6+
import { LegacyLanguageId } from "./LegacyLanguageId";
77
import { getNodeMatcher } from "./getNodeMatcher";
88
import { stringTextFragmentExtractor as htmlStringTextFragmentExtractor } from "./html";
99
import { stringTextFragmentExtractor as jsonStringTextFragmentExtractor } from "./json";
@@ -18,7 +18,7 @@ export type TextFragmentExtractor = (
1818
) => Range | null;
1919

2020
function constructDefaultTextFragmentExtractor(
21-
languageId: SupportedLanguageId,
21+
languageId: LegacyLanguageId,
2222
stringTextFragmentExtractor?: TextFragmentExtractor,
2323
): TextFragmentExtractor {
2424
const commentNodeMatcher = getNodeMatcher(languageId, "comment", false);
@@ -51,7 +51,7 @@ function constructDefaultTextFragmentExtractor(
5151
}
5252

5353
function constructDefaultStringTextFragmentExtractor(
54-
languageId: SupportedLanguageId,
54+
languageId: LegacyLanguageId,
5555
): TextFragmentExtractor {
5656
const stringNodeMatcher = getNodeMatcher(languageId, "string", false);
5757

@@ -79,7 +79,7 @@ function constructDefaultStringTextFragmentExtractor(
7979
* @returns The range of the string text or null if the node is not a string
8080
*/
8181
function constructHackedStringTextFragmentExtractor(
82-
languageId: SupportedLanguageId,
82+
languageId: LegacyLanguageId,
8383
) {
8484
const stringNodeMatcher = getNodeMatcher(languageId, "string", false);
8585

@@ -105,7 +105,7 @@ function constructHackedStringTextFragmentExtractor(
105105
export default function getTextFragmentExtractor(
106106
languageId: string,
107107
): TextFragmentExtractor {
108-
const extractor = textFragmentExtractors[languageId as SupportedLanguageId];
108+
const extractor = textFragmentExtractors[languageId as LegacyLanguageId];
109109

110110
if (extractor == null) {
111111
throw new UnsupportedLanguageError(languageId);
@@ -121,7 +121,7 @@ type FullDocumentTextFragmentExtractor = null;
121121
const fullDocumentTextFragmentExtractor = null;
122122

123123
const textFragmentExtractors: Record<
124-
SupportedLanguageId,
124+
LegacyLanguageId,
125125
TextFragmentExtractor | FullDocumentTextFragmentExtractor
126126
> = {
127127
c: constructDefaultTextFragmentExtractor("c"),

packages/cursorless-engine/src/languages/index.ts

-7
This file was deleted.

packages/cursorless-engine/src/tokenizer/tokenizer.test.ts

+1-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
import * as assert from "assert";
22
import { flatten, range } from "lodash";
33
import { tokenize } from ".";
4-
import { LanguageId } from "../languages/constants";
54
import { unitTestSetup } from "../test/unitTestSetup";
65

76
type TestCase = [string, string[]];
@@ -110,9 +109,7 @@ const shellScriptDialectTokenizerTests: LanguageTokenizerTests = {
110109
exclusionPredicate: (input: string) => !!input.match("-"),
111110
};
112111

113-
const languageTokenizerTests: Partial<
114-
Record<LanguageId, LanguageTokenizerTests>
115-
> = {
112+
const languageTokenizerTests: Record<string, LanguageTokenizerTests> = {
116113
css: cssDialectTokenizerTests,
117114
scss: cssDialectTokenizerTests,
118115
shellscript: shellScriptDialectTokenizerTests,

0 commit comments

Comments
 (0)