Implement a reliable way to test the syntax definition #2

daviwil · 2016-05-24T21:37:14Z

We need to find a way to test the syntax definition to ensure that any new changes don't break the grammar. It would be ideal if this could be done with the fewest number of dependencies possible so that we could run our tests in AppVeyor when PRs are sent.

I believe @vors might have some initial idea for how we could do that.

vors · 2016-05-24T22:09:32Z

A while ago @Jaykul proposed use https://github-lightshow.herokuapp.com/ for testing. I think it's the right way to go about it. It's fairly editor agnostic.

daviwil · 2016-06-07T18:08:24Z

I just found this GitHub repo: https://github.com/microsoft/vscode-textmate

Looks like we can use VS Code's TextMate parser as an API, might be helpful for testing the results of using our TextMate grammar against test files!

daviwil · 2016-07-16T14:35:10Z

The VS Code team uses some custom code to test the syntax definitions they ship (including the one for PowerShell). They use a well-known example code file for each language and then use their syntax tokenizer to output a JSON file which is then compared against a JSON file from a "known good" tokenization pass. Simple approach but it seems to work for them. We might be able to use this as a starting point for our own CI tests.

Here are some links to the relevant files:

daviwil · 2016-07-16T19:25:02Z

Looks like the shared syntax definition repo for TypeScript also has a similar solution and it's also using the vscode-textmate library:

https://github.com/Microsoft/TypeScript-TmLanguage/tree/master/tests

vors · 2016-09-01T18:18:43Z

As a heads-up, I'm planning to spend some time this and next week working on this tests automation. Please, feel free to use https://gitter.im/PowerShell/EditorSyntax if you have some ideas to share about it.

vors · 2016-09-02T20:08:58Z

TypeScript is a good example!

Plan

How to evaluate tests

I don't particularly like an idea of storing serialized regions. I would probably prefer run current grammar vs previous release (tag or just commit sha1) grammar and if it gives the same results on the tests included in the previous release, then CI passes. That way

We don't need to store serialized intermediate tests in the repo
We are confident that there are no regressions on the previous release test subset.
There could be regressions (or fixes) on the new (compare to the baseline) tests, but maybe just leave them as a warnings.

How to store tests

I like the way how https://github.com/jgm/CommonMark/blob/master/spec.txt works.
It's the single source for http://spec.commonmark.org/0.25/ and for tests https://github.com/jgm/CommonMark/blob/master/test/spec_tests.py
All in one single file. Very convinient imo!

In SublimeText/PowerShell we have a single file, but it's a .ps1 file.
https://github.com/SublimeText/PowerShell/blob/dev/tests/samples/test-file.ps1

It forces everybody to put comments as powershell comments, which interfere with the highlighting.
For example, if something is broken in the middle of document and the rest of the doc is treated as a string, then test just show this fact and loose all other information.
It's kind of old compilers that reports only one error at a time. Not convenient.

If we take a similar to CommonMark approach, we can use a spec document with some loose structure, that allows to identify test-case regions. For simplicity and popularity I would say a markdown document.

Test document

It could look like that

## Basics

declarations should be consistent for functions

```powershell
function foo.bar() {}
Function foo() {}
```

And classes

```powershell
class A {}
Class Foo-Bar {}
```

And workflows

```powershell
workflow w1 {}
Workflow work {}
```

And configurations

```powershell
configuration c {}
Configuration c {}
```

## Highlight types
    Some explanation about test.

```powershell
[int[]][char[]]"hello world"
[string]$someVariable = [char[]](104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)
[Collections.Generic.List``2[char]]$x = [char[]](104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)
[Collections.Generic.List[char]]$x = [char[]](104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)
```

Every

```powershell
   foobar
```

block would be treated as a separate test case.
This way, it's easy to read test cases subsequently (because they are listed subsequently) and get the context and intentions of them from the comments above and below.

Feedback is welcome! Please let me know if you see any problems with this approach

daviwil · 2016-09-02T21:02:20Z

So for the CommonMark approach, do they just have a "last known good" HTML file that they're comparing the Markdown formatting against? If so, that sounds fine to me. I like having a document serve a dual purpose to be both an example and a test artifact.

vors · 2016-09-02T21:11:40Z

No, they explicitly say what rendered view should look like, i.e.
https://github.com/jgm/CommonMark/blob/master/spec.txt#L531

```````````````````````````````` example
***
---
___
.
<hr />
<hr />
<hr />
````````````````````````````````

daviwil · 2016-09-02T21:14:36Z

Ahhh! Clever, I missed that.

vors · 2016-09-07T00:47:24Z

I started a node.js - based prototype for test harness that uses https://github.com/Microsoft/vscode-textmate

Opened a couple of issues:
microsoft/vscode-textmate#20
microsoft/vscode-textmate#21

vors · 2016-09-07T00:57:07Z

I'm comparing old grammar from commit 0cabc46 to the current master.

Example for parsing is

function foo() {}
function bar() {}
class XXX {}

Here is a current output

Old

    Token "function" from 0 to 8 with scopes source.powershell,meta.function,storage.type
    Token " " from 8 to 9 with scopes source.powershell,meta.function
    Token "foo" from 9 to 12 with scopes source.powershell,meta.function,entity.name.function.powershell
    Token "(" from 12 to 13 with scopes source.powershell
    Token ") {}
" from 13 to 18 with scopes source.powershell
    Token "function" from 18 to 26 with scopes source.powershell,meta.function,storage.type
    Token " " from 26 to 27 with scopes source.powershell,meta.function
    Token "bar" from 27 to 30 with scopes source.powershell,meta.function,entity.name.function.powershell
    Token "(" from 30 to 31 with scopes source.powershell
    Token ") {}
" from 31 to 36 with scopes source.powershell
    Token "class" from 36 to 41 with scopes source.powershell,storage.type.powershell
    Token " " from 41 to 42 with scopes source.powershell
    Token "XXX" from 42 to 45 with scopes source.powershell,entity.name.function
    Token " {}
" from 45 to 50 with scopes source.powershell

Current

    Token "function" from 0 to 8 with scopes source.powershell,meta.function,storage.type
    Token " " from 8 to 9 with scopes source.powershell,meta.function
    Token "foo" from 9 to 12 with scopes source.powershell,meta.function,entity.name.function.powershell
    Token "(" from 12 to 13 with scopes source.powershell
    Token ") " from 13 to 15 with scopes source.powershell
    Token "{" from 15 to 16 with scopes source.powershell,meta.scriptblock.powershell
    Token "}" from 16 to 17 with scopes source.powershell,meta.scriptblock.powershell
    Token "
" from 17 to 18 with scopes source.powershell
    Token "function" from 18 to 26 with scopes source.powershell,meta.function,storage.type
    Token " " from 26 to 27 with scopes source.powershell,meta.function
    Token "bar" from 27 to 30 with scopes source.powershell,meta.function,entity.name.function.powershell
    Token "(" from 30 to 31 with scopes source.powershell
    Token ") " from 31 to 33 with scopes source.powershell
    Token "{" from 33 to 34 with scopes source.powershell,meta.scriptblock.powershell
    Token "}" from 34 to 35 with scopes source.powershell,meta.scriptblock.powershell
    Token "
" from 35 to 36 with scopes source.powershell
    Token "class" from 36 to 41 with scopes source.powershell,meta.class.powershell,storage.type.powershell
    Token " " from 41 to 42 with scopes source.powershell,meta.class.powershell
    Token "XXX" from 42 to 45 with scopes source.powershell,meta.class.powershell,entity.name.function.powershell
    Token " {" from 45 to 47 with scopes source.powershell,meta.class.powershell
    Token "}" from 47 to 48 with scopes source.powershell,meta.class.powershell
    Token "
" from 48 to 50 with scopes source.powershell

As you can see they are quite different already, even on this small example.

vors · 2016-09-07T23:44:20Z

Here is the code that used to produce it (totally node.js newbie)

var exec = require('child_process').exec;
var Parser = require('commonmark').Parser;
var Registry = require('vscode-textmate').Registry;

const gitCommitId = "0cabc46e3a40ce8d300403107b08a70708321ca6";
const grammarPath = "../PowerShellSyntax.tmLanguage";

function tokenize(codeSnippet, grammar)
{
    var lineTokens = grammar.tokenizeLine(codeSnippet);
    console.log("Tokenizing:\n" + codeSnippet + "\n\n")
    for (var i = 0; i < lineTokens.tokens.length; i++) {
        var token = lineTokens.tokens[i];
        var text = codeSnippet.substr(token.startIndex, token.endIndex - token.startIndex);
        console.log('    Token "' + text + '" from ' + token.startIndex + ' to ' + token.endIndex + ' with scopes ' + token.scopes);
    }
    console.log("End tokenizing\n");
}

function tokenizeCodeSnippet(codeSnippet, oldGrammarPath, newGrammarPath)
{
    var oldRegistry = new Registry();
    var newRegistry = new Registry();

    console.log("oldGrammarPath: " + oldGrammarPath);
    console.log("newGrammarPath: " + newGrammarPath);

    var oldGrammar = oldRegistry.loadGrammarFromPathSync(oldGrammarPath);
    var newGrammar = newRegistry.loadGrammarFromPathSync(newGrammarPath);

    tokenize(codeSnippet, oldGrammar);
    tokenize(codeSnippet, newGrammar);
}

function compareGrammars(oldGrammarPath, newGrammarPath)
{
    var mdReader = new Parser();
    var mdDoc = mdReader.parse("Bar\n```powershell\nfunction foo() {}\nfunction bar() {}\nclass XXX {}\n```\n\n\nxxx");
    var mdWalker = mdDoc.walker();
    var mdNode;

    while (mdNode = mdWalker.next())
    {
        if (mdNode.node.type == "code_block")
        {
            tokenizeCodeSnippet(mdNode.node.literal, oldGrammarPath, grammarPath);
        }
    }
}

function main()
{
    var path = "./" + gitCommitId + ".tmLanguage";
    var child = exec('git show ' + gitCommitId + ":" + grammarPath + " > " + path, function(err, stdout, stderr) {});
    child.on('close', (code) => {
        compareGrammars(path, grammarPath);
    });
}

main()

daviwil · 2016-09-08T00:01:17Z

Looks good so far!

gravejester · 2017-08-29T06:51:13Z

I forgot you were this far along with a working solution @vors :) Did you get any further on this? I have been working on a very similar solution myself, but I'm using a JSON file to describe the tests.. but the markdown approach is of course much easier to author. The only problem I see is that we don't have a way of stating what the correct scopes should be in the markdown file?

This is an example of how I would use a json file to describe the tests:

[
    {
        "line":"Write-Host 'This is a single quoted string'",
        "tokens": [
            {
                 "token":"Write-Host",
                 "scopes": [
                     "source.powershell",
                     "meta.command.powershell",
                     "support.function.powershell"
                 ]
            },
            {
                 "token":"'",
                 "scopes": [
                     "source.powershell",
                     "meta.command.powershell",
                     "string.quoted.single.powershell"
                 ]
            },
            {
                 "token":"This is a single quoted string",
                 "scopes": [
                     "source.powershell",
                     "meta.command.powershell",
                     "string.quoted.single.powershell"
                ]
            },
            {
                 "token":"'",
                 "scopes": [
                     "source.powershell",
                     "meta.command.powershell",
                     "string.quoted.single.powershell"
                ]
            }
        ]
    }
]

vors · 2017-08-29T07:02:29Z

@gravejester not really, I dropped it without finishing.

I tried to document my reasoning about the desirable test harness here as well as the source code to produce these results.

If you already working on another approach, don't feel obligated to try to incorporate mine. But if you find anything suitable, feel free to reuse it.

gravejester · 2017-08-29T07:06:08Z

@vors Ok, I will probably steal some of your code, but going for using a json file for defining the tests. This way we are not comparing "this version" with the "last version", but always testing against what we have decided should be "the truth(tm)" :)

Unless someone have some other ideas. It will be a hassle to create all the tests, but once they are created they should rarely change.

On a different note, found a good description of the scope names here: https://www.sublimetext.com/docs/3/scope_naming.html and I suggest we base our naming on this document.

gravejester · 2017-08-30T14:41:39Z

For anyone interested, I have opted to use YAML instead of JSON for the reference file - makes it's a lot easier to read (and edit) :)

I have a working version running locally now - with a really small subset of a reference file. So now starts the major job of fleshing this out.

omniomi · 2018-05-21T20:38:37Z

Closing as we've implemented Jasmine tests and https://github.com/kevinastone/atom-grammar-test.

Can create new issues to address changes in the way the tests are written or coverage issues. Cleaning up old issues.

daviwil added enhancement help wanted labels May 24, 2016

vors self-assigned this May 24, 2016

vors mentioned this issue Jun 3, 2016

Misc fixes/changes #3

Closed

vors mentioned this issue Sep 2, 2016

Is there a way to get my own version of lightshow? github-linguist/linguist#3187

Closed

daviwil removed the help wanted label Sep 8, 2016

vors mentioned this issue Sep 14, 2016

Move to PowerShell/EditorSyntax as the backing grammar jrsconfitto/language-powershell#52

Merged

gravejester self-assigned this Sep 4, 2017

gravejester added this to the v2 milestone Sep 6, 2017

vors mentioned this issue Feb 22, 2018

[Proposition] Create a separate repo for the grammar SublimeText/PowerShell#148

Closed

omniomi closed this as completed May 21, 2018

msftrncs mentioned this issue Aug 16, 2018

multiple issues detecting documentation keywords #136

Closed

Implement a reliable way to test the syntax definition #2

Implement a reliable way to test the syntax definition #2

Comments

daviwil commented May 24, 2016

vors commented May 24, 2016

Uh oh!

daviwil commented Jun 7, 2016

Uh oh!

daviwil commented Jul 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daviwil commented Jul 16, 2016

Uh oh!

vors commented Sep 1, 2016

Uh oh!

vors commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Plan

How to evaluate tests

How to store tests

Test document

Uh oh!

daviwil commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vors commented Sep 2, 2016

Uh oh!

daviwil commented Sep 2, 2016

Uh oh!

vors commented Sep 7, 2016

Uh oh!

vors commented Sep 7, 2016

Old

Current

Uh oh!

vors commented Sep 7, 2016

Uh oh!

daviwil commented Sep 8, 2016

Uh oh!

gravejester commented Aug 29, 2017

Uh oh!

vors commented Aug 29, 2017

Uh oh!

gravejester commented Aug 29, 2017

Uh oh!

gravejester commented Aug 30, 2017

Uh oh!

omniomi commented May 21, 2018

Uh oh!

daviwil commented Jul 16, 2016 •

edited

Loading

vors commented Sep 2, 2016 •

edited

Loading

daviwil commented Sep 2, 2016 •

edited

Loading