Skip to content

Verify lossless parse + serialize (without transforms) #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krlmlr opened this issue Jul 25, 2017 · 10 comments
Closed

Verify lossless parse + serialize (without transforms) #85

krlmlr opened this issue Jul 25, 2017 · 10 comments

Comments

@krlmlr
Copy link
Member

krlmlr commented Jul 25, 2017

A run without any transformers enabled should yield exactly the same results as the input, minus spaces at EOL and conversion of tabs to spaces. This should be tested separately on all input files, for easier isolation of parse/serialize problems from transformer problems during later refactorings.

Reference: #79 (comment)

@krlmlr krlmlr mentioned this issue Jul 25, 2017
@lorenzwalthert
Copy link
Collaborator

This is actually not easily possible at the moment. The reason is that all space information is discarded for the last token on a line, since the column indention will take care of the space between the last token on a line and the next token. If we want to change that, we need to add lag_spaces and change the serialisation. This may not be easy.

@krlmlr
Copy link
Member Author

krlmlr commented Jul 25, 2017

Agreed. The roundtrip test doesn't need to care about spaces at EOF and tabs. We just verify that for each text t, s(p(e(t))) = e(t) with s() = serialize, p() = parse, e = strip spaces at EOF + convert tabs to spaces.

@lorenzwalthert
Copy link
Collaborator

I am not sure if I understand you correctly, but I think that is not easily possible. It is easily possible if we remove all spaces, which you probably don't mean, right? Since we add spaces after the token and not before it (that is actually why we need eol stripping as a transformer) but add new lines before it , applying no transformers and serialising back will not yield the input. If we wanted this to be possible, we had to use lag_spaces instead of spaces which we currently don't do anywhere and adapt the serialization. Also, I think issues like in #89 cannot be detected with the approach you are suggesting, so I am not sure how much we can gain and the complexity of implementing this feature, should not be underestimated.

@krlmlr
Copy link
Member Author

krlmlr commented Jul 25, 2017

I'm fine if we can find a set of transformers that will allow us to test the roundtrip: for each text t, s(x(p(e(t)))) = e(t), with (in addition) x() = minimally sufficient set of transformers.

I think the main reason why we strip spaces at all is that spaces between tokens that also have a newline aren't properly recognized: The parse data isn't different for

a +
  b

and

a +    ~
  b

(substitute ~ for another space in the second example). OTOH, spaces at the end of comments and spaces before newlines embedded into string literals are kept. So, erasing space in the text is potentially destructive.

Let's think a bit more about testing the roundtrip, just remove the two gsub() calls at https://github.com/krlmlr/styler/pull/79/files/15d2e8b5cf094034a1bc607a6f4fec9fab793863#diff-ed8a73a1134b4bd8c65cef9e3365dff5R72.

@krlmlr
Copy link
Member Author

krlmlr commented Jul 27, 2017

I just tried:

styler:::parse_transform_serialize("a <- 1", list())
#> Warning: Unknown or uninitialised column: 'lag_newlines'.

#> Warning: Unknown or uninitialised column: 'lag_newlines'.

#> Warning: Unknown or uninitialised column: 'lag_newlines'.
#> Warning: Unknown or uninitialised column: 'spaces'.
#> Warning: Unknown or uninitialised column: 'lag_newlines'.
#> Warning: Unknown or uninitialised column: 'indent'.
#> Error: Element 4 is not a vector (NULL)

I think this should at least work without warnings or errors, ideally this would return the text unchanged in all cases.

@lorenzwalthert
Copy link
Collaborator

The problems seem to vanish when implementing #94. However, since we are working with named transformers now, some adaptions may still be required to make it work with an empty list (instead of a list with named objects that are empty).

@lorenzwalthert
Copy link
Collaborator

Might have gained relevance with #187.

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Sep 21, 2017

Now it's actually possible to run with an "empty" style guide.

styler::style_text("1+1 + ((3))", style = styler:::create_style_guide)
#> 1+1 + ((3))

We'd still have to implement the roundtrip to make sure we don't have any parsing / serialization problems.

Not sure about tabs? Should we not just convert them to spaces as we used to? I can open a separate issue

@lorenzwalthert
Copy link
Collaborator

Closed in favor of #140.

@lorenzwalthert
Copy link
Collaborator

Reference: #368.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants