Skip to content

Guarantee of correctness? #368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HenrikBengtsson opened this issue Mar 13, 2018 · 10 comments
Closed

Guarantee of correctness? #368

HenrikBengtsson opened this issue Mar 13, 2018 · 10 comments

Comments

@HenrikBengtsson
Copy link

First timer of pkg (which looks really neat) here. Without having looked into the details of the code myself, is the following statement correct:

The code produced by styler will always have the same AST ("parse tree") as the code going in?

If the above is correct, do styler functions validate this - something like all.equal(ast_out, ast_in) - before returning?

The reason why ask, is because when I get a big "hard-to-read" set of R code handed to me, I'd like to run it through a code formatter without having to worry that it might introduce bugs on top of what's already in the code. If I know ast_in and ast_out are guaranteed to be identical (formally - not just "yeah, they should be the same") then I don't have to worry about that part.

Thanks

@HenrikBengtsson
Copy link
Author

Hmm... maybe this cannot be guaranteed, e.g. since x = 1 is reformatted as x <- 1:

> styler::style_text("x = 1")
x <- 1

we don't get the same AST:

> str(as.list(parse(text = "x = 1")[[1]]))
List of 3
 $ : symbol =
 $ : symbol x
 $ : num 1

> str(as.list(parse(text = "x <- 1")[[1]]))
List of 3
 $ : symbol <-
 $ : symbol x
 $ : num 1

However, I wonder if there are other cases like this, or the assignment operators = / <- are unique in this sense. If the latter is the case, I guess I could set up a "styler" that preserves x = 1 as is and then I can validate the correctness. Only then, as a last step, I can do transformation of any x = 1 to x <- 1. I really would like to be able to do re-formatting at scale / automatically without having to worry about correctness.

@lorenzwalthert
Copy link
Collaborator

Only then, as a last step, I can do transformation of any x = 1 to x <- 1.

Exactly. Styler validates if it can validate, which is the case if you set scope = "line_breaks". The corresponding source code is here and here.

Hence, I suggest you to:

  1. Check initial code into git.
  2. Get the brand new styler 1.0.1 from CRAN with some relevant enhancements you can find in the release notes.
  3. First style with scope = "line_breaks" and hope you don't get the error mentioned above (to my best knowledge, no one ever reported that error, so I am confident styler does indeed not change the AST). Also, have a look at the other options of the style guide with ?tidyverse_style such as strict before you style.
  4. Review and commit changes styler made to your code.
  5. Style with scope = "tokens" and manually inspect the changes. Should not be that many I hope.

@lorenzwalthert
Copy link
Collaborator

For reference: Early discussion of so-called roundtrip validation were brought up in #85 and later in #140. Roundtrip validation was introduced in mainly in #287, tests were added in #289.

@lorenzwalthert
Copy link
Collaborator

For completness: Note that there are other transformations apart from = to <- that styler performs with scope = "tokens" that change the AST. Here a few examples:

  • Adding curly braces to if statements (if there are none) that are multi-line.
  • Put statements separated with ; on two lines.

In addition, for all levels of scope, comments that are not shebang, knitr chunk headers or roxygen comments are forced to start with at least one space, i.e

styler::style_text("#a", scope = "line_breaks")
#> # a

Created on 2018-03-13 by the reprex package (v0.2.0).

Maybe that should not be the case for scope < "tokens". Note that this does not trigger a validation error because we compare two parsed objects (with styler:::expressions_are_identical()), which drops all comments.

@lorenzwalthert
Copy link
Collaborator

Do you @HenrikBengtsson think we should add a note in the documentation about when a verification can be expected?

@HenrikBengtsson
Copy link
Author

Sorry for the delay and thanks for you detailed description and pointers; I finally got time to test what you describe and I can confirm it works. Below are my observations (in case someone else finds this later).

Yes, I think it would be useful to bring the "roundtrip validation" from in the documentation; it's an important concept. It may even deserve a tiny vignette by itself so it's more visible (e.g. on CRAN pages).

BTW, as seen below, I noticed that styler outputs:

Please review the changes carefully!> 

when using scope = "tokens" but not when using scope = "line_breaks" - is that referring this problem where the AST cannot be guaranteed to be preserved?

BTW2, there is a need for a newline after that message.

Roundtrip validation supported by scope = "line_breaks"

I can confirm that using scope = "line_breaks" preserves the AST;

> expr0 <- parse(file = pathname)

> res <- styler::style_file(pathname, scope = "line_breaks")
Styling  1  files:
 cstruct1.R i 
----------------------------------------
Status	Count	Legend0	File unchanged.1	File changed.0	Styling threw an error.
----------------------------------------

> expr1 <- parse(file = pathname)
> all.equal(as.list(expr1), as.list(expr0), check.attributes = FALSE)
[1] TRUE

Roundtrip validation not possible with scope = "tokens"

Then, if I style with scope = "tokens" (one level up from "line_breaks" and the most invasive as described in ?tidyverse_style)

> res <- styler::style_file(pathname, scope = "tokens")
Styling  1  files:
 cstruct1.R ℹ 
────────────────────────────────────────
Status	Count	Legend0	File unchanged.1	File changed.0	Styling threw an error.
────────────────────────────────────────
Please review the changes carefully!> 

> expr2 <- parse(file = pathname)
> all.equal(as.list(expr2), as.list(expr0), check.attributes=FALSE)
 [1] "Component 2: target, current do not match when deparsed" 
 [2] "Component 9: target, current do not match when deparsed" 
 [3] "Component 10: target, current do not match when deparsed"
...
[30] "Component 41: target, current do not match when deparsed"

and

> diffobj::diffDeparse(as.list(expr2), as.list(expr0))
< as.list(expr2)                                                                
> as.list(expr0)                                                                
@@ 1,5 / 1,5 @@                                                                 
  structure(list(library(MASS), normP <- function(P) {                          
<     P1 <- t(t(P) - colMeans(P))                                               
>     P1 = t(t(P) - colMeans(P))                                                
<     P1 <- 5 * P1/sqrt(max(rowSums(P1^2)))                                     
>     P1 = 5 * P1/sqrt(max(rowSums(P1^2)))                                      
      P1                                                                        
  }, kinetic0 <- function(p0) {                                                 
@@ 17,45 / 17,45 @@                                                             
      p0                                                                        
  }, finistructure <- function(S0, bin) {                                       
<     n <- dim(S0)[1]                                                           
>     n = dim(S0)[1]                                                            
<     N <- length(bin[, 1])                                                     
>     N = length(bin[, 1])                                                      
      if (n == N) {                                                             
          if (dim(S0)[2] == 3) {
[...]
          }                                                                     
<         iternum <- iternum + 1                                                
>         iternum = iternum + 1                                                 
      }                                                                         
      list(Ao, bo, invSigmao, Po, betao)                                        
  }), srcfile = <environment>, wholeSrcref = structure(c(1L, 0L,                
< 1070L, 0L, 0L, 0L, 1L, 1070L), srcfile = <environment>, class = "srcref"))    
> 1038L, 0L, 0L, 0L, 1L, 1038L), srcfile = <environment>, class = "srcref"))    

@lorenzwalthert
Copy link
Collaborator

The message

Please review the changes carefully!

is deliberately only printed for scope = "tokens" and only if the AST was changed. I think we'll add an note in the helpfiles. So I don't understand what your last code example is about? Is there a problem with the changed wholeSrcef?

@HenrikBengtsson
Copy link
Author

Thanks for confirming.

My code examples are just for others stumbling upon this thread and for my future self - everything works as expected. The last piece using diffobj::diffDeparse() is just to show what those 30 diffs look like (i.e. = updated to <-) plus it's a neat way to view the diff from within R (as an alternative to diff / git diff). (I didn't look into the details how to diffDeparse() how to not compare srcref - if that was you question).

@lorenzwalthert
Copy link
Collaborator

Ok. Well my question was just about the very last line. It seemed spurious that 1070L, and 1038L are flipped. But I concluded from your response that this is not of any relevance at this point, right?

@HenrikBengtsson
Copy link
Author

You're saying you'd expect 1038L vs 1070L rather than 1070L vs 1038L? I used "after" vs "before" when diffing. The "tokens" output got more lines (1070) than the "line_breaks" output (1038) because of changes like:

            if(rmoutlier){Po=fcorrect(pbin,Po)}

to

      if (rmoutlier) {
        Po <- fcorrect(pbin, Po)
      }

lorenzwalthert added a commit to lorenzwalthert/styler that referenced this issue Apr 3, 2018
HenrikBengtsson added a commit to HenrikBengtsson/hsa that referenced this issue Apr 3, 2018
This is guaranteed to preserve the AST, i.e. identical to before and after
(r-lib/styler#368).  Validated using:

stopifnot(styler:::expressions_are_identical(expr_after, expr_before))
HenrikBengtsson added a commit to HenrikBengtsson/hsa that referenced this issue Apr 3, 2018
ASTs are not identical before or after because '=' are replaced with '<-'
(r-lib/styler#368)

Validation: Visual inspection
lorenzwalthert added a commit to lorenzwalthert/styler that referenced this issue Apr 6, 2018
HenrikBengtsson added a commit to HenrikBengtsson/reconstruct-accuracy that referenced this issue Jun 12, 2018
> styler::style_file(c("GAM-perm-slice-principal-comp.R", "mFISH_HiC_3D_compare.R"), scope = "line_breaks")
Styling  2  files:
 GAM-perm-slice-principal-comp.R ℹ
 mFISH_HiC_3D_compare.R          ℹ
──────────────────────────────────────
Status Count  Legend
✔      0      File unchanged.
ℹ      2      File changed.
✖      0      Styling threw an error.
──────────────────────────────────────

This is guaranteed to give the exact same abstract syntax tree (AST), cf. r-lib/styler#368 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants