-
Notifications
You must be signed in to change notification settings - Fork 50
Implement a reliable way to test the syntax definition #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A while ago @Jaykul proposed use https://github-lightshow.herokuapp.com/ for testing. I think it's the right way to go about it. It's fairly editor agnostic. |
I just found this GitHub repo: https://github.com/microsoft/vscode-textmate Looks like we can use VS Code's TextMate parser as an API, might be helpful for testing the results of using our TextMate grammar against test files! |
The VS Code team uses some custom code to test the syntax definitions they ship (including the one for PowerShell). They use a well-known example code file for each language and then use their syntax tokenizer to output a JSON file which is then compared against a JSON file from a "known good" tokenization pass. Simple approach but it seems to work for them. We might be able to use this as a starting point for our own CI tests. Here are some links to the relevant files: |
Looks like the shared syntax definition repo for TypeScript also has a similar solution and it's also using the vscode-textmate library: https://github.com/Microsoft/TypeScript-TmLanguage/tree/master/tests |
As a heads-up, I'm planning to spend some time this and next week working on this tests automation. Please, feel free to use https://gitter.im/PowerShell/EditorSyntax if you have some ideas to share about it. |
TypeScript is a good example! PlanHow to evaluate testsI don't particularly like an idea of storing serialized regions. I would probably prefer run current grammar vs previous release (tag or just commit sha1) grammar and if it gives the same results on the tests included in the previous release, then CI passes. That way
How to store testsI like the way how https://github.com/jgm/CommonMark/blob/master/spec.txt works. In SublimeText/PowerShell we have a single file, but it's a .ps1 file. It forces everybody to put comments as powershell comments, which interfere with the highlighting. If we take a similar to CommonMark approach, we can use a spec document with some loose structure, that allows to identify test-case regions. For simplicity and popularity I would say a markdown document. Test documentIt could look like that
Every
block would be treated as a separate test case. Feedback is welcome! Please let me know if you see any problems with this approach |
So for the CommonMark approach, do they just have a "last known good" HTML file that they're comparing the Markdown formatting against? If so, that sounds fine to me. I like having a document serve a dual purpose to be both an example and a test artifact. |
No, they explicitly say what rendered view should look like, i.e.
|
Ahhh! Clever, I missed that. |
I started a node.js - based prototype for test harness that uses https://github.com/Microsoft/vscode-textmate Opened a couple of issues: |
I'm comparing old grammar from commit 0cabc46 to the current master. Example for parsing is function foo() {}
function bar() {}
class XXX {} Here is a current output Old
Current
As you can see they are quite different already, even on this small example. |
Here is the code that used to produce it (totally node.js newbie) var exec = require('child_process').exec;
var Parser = require('commonmark').Parser;
var Registry = require('vscode-textmate').Registry;
const gitCommitId = "0cabc46e3a40ce8d300403107b08a70708321ca6";
const grammarPath = "../PowerShellSyntax.tmLanguage";
function tokenize(codeSnippet, grammar)
{
var lineTokens = grammar.tokenizeLine(codeSnippet);
console.log("Tokenizing:\n" + codeSnippet + "\n\n")
for (var i = 0; i < lineTokens.tokens.length; i++) {
var token = lineTokens.tokens[i];
var text = codeSnippet.substr(token.startIndex, token.endIndex - token.startIndex);
console.log(' Token "' + text + '" from ' + token.startIndex + ' to ' + token.endIndex + ' with scopes ' + token.scopes);
}
console.log("End tokenizing\n");
}
function tokenizeCodeSnippet(codeSnippet, oldGrammarPath, newGrammarPath)
{
var oldRegistry = new Registry();
var newRegistry = new Registry();
console.log("oldGrammarPath: " + oldGrammarPath);
console.log("newGrammarPath: " + newGrammarPath);
var oldGrammar = oldRegistry.loadGrammarFromPathSync(oldGrammarPath);
var newGrammar = newRegistry.loadGrammarFromPathSync(newGrammarPath);
tokenize(codeSnippet, oldGrammar);
tokenize(codeSnippet, newGrammar);
}
function compareGrammars(oldGrammarPath, newGrammarPath)
{
var mdReader = new Parser();
var mdDoc = mdReader.parse("Bar\n```powershell\nfunction foo() {}\nfunction bar() {}\nclass XXX {}\n```\n\n\nxxx");
var mdWalker = mdDoc.walker();
var mdNode;
while (mdNode = mdWalker.next())
{
if (mdNode.node.type == "code_block")
{
tokenizeCodeSnippet(mdNode.node.literal, oldGrammarPath, grammarPath);
}
}
}
function main()
{
var path = "./" + gitCommitId + ".tmLanguage";
var child = exec('git show ' + gitCommitId + ":" + grammarPath + " > " + path, function(err, stdout, stderr) {});
child.on('close', (code) => {
compareGrammars(path, grammarPath);
});
}
main() |
Looks good so far! |
I forgot you were this far along with a working solution @vors :) Did you get any further on this? I have been working on a very similar solution myself, but I'm using a JSON file to describe the tests.. but the markdown approach is of course much easier to author. The only problem I see is that we don't have a way of stating what the correct scopes should be in the markdown file? This is an example of how I would use a json file to describe the tests: [
{
"line":"Write-Host 'This is a single quoted string'",
"tokens": [
{
"token":"Write-Host",
"scopes": [
"source.powershell",
"meta.command.powershell",
"support.function.powershell"
]
},
{
"token":"'",
"scopes": [
"source.powershell",
"meta.command.powershell",
"string.quoted.single.powershell"
]
},
{
"token":"This is a single quoted string",
"scopes": [
"source.powershell",
"meta.command.powershell",
"string.quoted.single.powershell"
]
},
{
"token":"'",
"scopes": [
"source.powershell",
"meta.command.powershell",
"string.quoted.single.powershell"
]
}
]
}
] |
@gravejester not really, I dropped it without finishing. I tried to document my reasoning about the desirable test harness here as well as the source code to produce these results. If you already working on another approach, don't feel obligated to try to incorporate mine. But if you find anything suitable, feel free to reuse it. |
@vors Ok, I will probably steal some of your code, but going for using a json file for defining the tests. This way we are not comparing "this version" with the "last version", but always testing against what we have decided should be "the truth(tm)" :) Unless someone have some other ideas. It will be a hassle to create all the tests, but once they are created they should rarely change. On a different note, found a good description of the scope names here: https://www.sublimetext.com/docs/3/scope_naming.html and I suggest we base our naming on this document. |
For anyone interested, I have opted to use YAML instead of JSON for the reference file - makes it's a lot easier to read (and edit) :) I have a working version running locally now - with a really small subset of a reference file. So now starts the major job of fleshing this out. |
Closing as we've implemented Jasmine tests and https://github.com/kevinastone/atom-grammar-test. Can create new issues to address changes in the way the tests are written or coverage issues. Cleaning up old issues. |
We need to find a way to test the syntax definition to ensure that any new changes don't break the grammar. It would be ideal if this could be done with the fewest number of dependencies possible so that we could run our tests in AppVeyor when PRs are sent.
I believe @vors might have some initial idea for how we could do that.
The text was updated successfully, but these errors were encountered: