Guarantee of correctness? #368

HenrikBengtsson · 2018-03-13T02:56:39Z

First timer of pkg (which looks really neat) here. Without having looked into the details of the code myself, is the following statement correct:

The code produced by styler will always have the same AST ("parse tree") as the code going in?

If the above is correct, do styler functions validate this - something like all.equal(ast_out, ast_in) - before returning?

The reason why ask, is because when I get a big "hard-to-read" set of R code handed to me, I'd like to run it through a code formatter without having to worry that it might introduce bugs on top of what's already in the code. If I know ast_in and ast_out are guaranteed to be identical (formally - not just "yeah, they should be the same") then I don't have to worry about that part.

Thanks

The text was updated successfully, but these errors were encountered:

HenrikBengtsson · 2018-03-13T03:33:32Z

Hmm... maybe this cannot be guaranteed, e.g. since x = 1 is reformatted as x <- 1:

> styler::style_text("x = 1")
x <- 1

we don't get the same AST:

> str(as.list(parse(text = "x = 1")[[1]]))
List of 3
 $ : symbol =
 $ : symbol x
 $ : num 1

> str(as.list(parse(text = "x <- 1")[[1]]))
List of 3
 $ : symbol <-
 $ : symbol x
 $ : num 1

However, I wonder if there are other cases like this, or the assignment operators = / <- are unique in this sense. If the latter is the case, I guess I could set up a "styler" that preserves x = 1 as is and then I can validate the correctness. Only then, as a last step, I can do transformation of any x = 1 to x <- 1. I really would like to be able to do re-formatting at scale / automatically without having to worry about correctness.

lorenzwalthert · 2018-03-13T08:00:41Z

Only then, as a last step, I can do transformation of any x = 1 to x <- 1.

Exactly. Styler validates if it can validate, which is the case if you set scope = "line_breaks". The corresponding source code is here and here.

Hence, I suggest you to:

Check initial code into git.
Get the brand new styler 1.0.1 from CRAN with some relevant enhancements you can find in the release notes.
First style with scope = "line_breaks" and hope you don't get the error mentioned above (to my best knowledge, no one ever reported that error, so I am confident styler does indeed not change the AST). Also, have a look at the other options of the style guide with ?tidyverse_style such as strict before you style.
Review and commit changes styler made to your code.
Style with scope = "tokens" and manually inspect the changes. Should not be that many I hope.

lorenzwalthert · 2018-03-13T08:05:52Z

For reference: Early discussion of so-called roundtrip validation were brought up in #85 and later in #140. Roundtrip validation was introduced in mainly in #287, tests were added in #289.

lorenzwalthert · 2018-03-13T09:31:58Z

For completness: Note that there are other transformations apart from = to <- that styler performs with scope = "tokens" that change the AST. Here a few examples:

Adding curly braces to if statements (if there are none) that are multi-line.
Put statements separated with ; on two lines.

In addition, for all levels of scope, comments that are not shebang, knitr chunk headers or roxygen comments are forced to start with at least one space, i.e

styler::style_text("#a", scope = "line_breaks")
#> # a

Created on 2018-03-13 by the reprex package (v0.2.0).

Maybe that should not be the case for scope < "tokens". Note that this does not trigger a validation error because we compare two parsed objects (with styler:::expressions_are_identical()), which drops all comments.

lorenzwalthert · 2018-03-23T12:00:17Z

Do you @HenrikBengtsson think we should add a note in the documentation about when a verification can be expected?

HenrikBengtsson · 2018-03-25T20:22:03Z

Sorry for the delay and thanks for you detailed description and pointers; I finally got time to test what you describe and I can confirm it works. Below are my observations (in case someone else finds this later).

Yes, I think it would be useful to bring the "roundtrip validation" from in the documentation; it's an important concept. It may even deserve a tiny vignette by itself so it's more visible (e.g. on CRAN pages).

BTW, as seen below, I noticed that styler outputs:

Please review the changes carefully!>

when using scope = "tokens" but not when using scope = "line_breaks" - is that referring this problem where the AST cannot be guaranteed to be preserved?

BTW2, there is a need for a newline after that message.

Roundtrip validation supported by `scope = "line_breaks"`

I can confirm that using scope = "line_breaks" preserves the AST;

> expr0 <- parse(file = pathname)

> res <- styler::style_file(pathname, scope = "line_breaks")
Styling  1  files:
 cstruct1.R i 
----------------------------------------
Status	Count	Legend 
✔ 	0	File unchanged.
ℹ 	1	File changed.
✖ 	0	Styling threw an error.
----------------------------------------

> expr1 <- parse(file = pathname)
> all.equal(as.list(expr1), as.list(expr0), check.attributes = FALSE)
[1] TRUE

Roundtrip validation not possible with `scope = "tokens"`

Then, if I style with scope = "tokens" (one level up from "line_breaks" and the most invasive as described in ?tidyverse_style)

> res <- styler::style_file(pathname, scope = "tokens")
Styling  1  files:
 cstruct1.R ℹ 
────────────────────────────────────────
Status	Count	Legend 
✔ 	0	File unchanged.
ℹ 	1	File changed.
✖ 	0	Styling threw an error.
────────────────────────────────────────
Please review the changes carefully!> 

> expr2 <- parse(file = pathname)
> all.equal(as.list(expr2), as.list(expr0), check.attributes=FALSE)
 [1] "Component 2: target, current do not match when deparsed" 
 [2] "Component 9: target, current do not match when deparsed" 
 [3] "Component 10: target, current do not match when deparsed"
...
[30] "Component 41: target, current do not match when deparsed"

and

> diffobj::diffDeparse(as.list(expr2), as.list(expr0))

< as.list(expr2)                                                                
> as.list(expr0)                                                                
@@ 1,5 / 1,5 @@                                                                 
  structure(list(library(MASS), normP <- function(P) {                          
<     P1 <- t(t(P) - colMeans(P))                                               
>     P1 = t(t(P) - colMeans(P))                                                
<     P1 <- 5 * P1/sqrt(max(rowSums(P1^2)))                                     
>     P1 = 5 * P1/sqrt(max(rowSums(P1^2)))                                      
      P1                                                                        
  }, kinetic0 <- function(p0) {                                                 
@@ 17,45 / 17,45 @@                                                             
      p0                                                                        
  }, finistructure <- function(S0, bin) {                                       
<     n <- dim(S0)[1]                                                           
>     n = dim(S0)[1]                                                            
<     N <- length(bin[, 1])                                                     
>     N = length(bin[, 1])                                                      
      if (n == N) {                                                             
          if (dim(S0)[2] == 3) {
[...]
          }                                                                     
<         iternum <- iternum + 1                                                
>         iternum = iternum + 1                                                 
      }                                                                         
      list(Ao, bo, invSigmao, Po, betao)                                        
  }), srcfile = <environment>, wholeSrcref = structure(c(1L, 0L,                
< 1070L, 0L, 0L, 0L, 1L, 1070L), srcfile = <environment>, class = "srcref"))    
> 1038L, 0L, 0L, 0L, 1L, 1038L), srcfile = <environment>, class = "srcref"))

lorenzwalthert · 2018-03-25T20:43:31Z

The message

Please review the changes carefully!

is deliberately only printed for scope = "tokens" and only if the AST was changed. I think we'll add an note in the helpfiles. So I don't understand what your last code example is about? Is there a problem with the changed wholeSrcef?

HenrikBengtsson · 2018-03-25T20:55:20Z

Thanks for confirming.

My code examples are just for others stumbling upon this thread and for my future self - everything works as expected. The last piece using diffobj::diffDeparse() is just to show what those 30 diffs look like (i.e. = updated to <-) plus it's a neat way to view the diff from within R (as an alternative to diff / git diff). (I didn't look into the details how to diffDeparse() how to not compare srcref - if that was you question).

lorenzwalthert · 2018-03-25T20:57:19Z

Ok. Well my question was just about the very last line. It seemed spurious that 1070L, and 1038L are flipped. But I concluded from your response that this is not of any relevance at this point, right?

HenrikBengtsson · 2018-03-25T21:12:55Z

You're saying you'd expect 1038L vs 1070L rather than 1070L vs 1038L? I used "after" vs "before" when diffing. The "tokens" output got more lines (1070) than the "line_breaks" output (1038) because of changes like:

            if(rmoutlier){Po=fcorrect(pbin,Po)}

to

      if (rmoutlier) {
        Po <- fcorrect(pbin, Po)
      }

This is guaranteed to preserve the AST, i.e. identical to before and after (r-lib/styler#368). Validated using: stopifnot(styler:::expressions_are_identical(expr_after, expr_before))

ASTs are not identical before or after because '=' are replaced with '<-' (r-lib/styler#368) Validation: Visual inspection

> styler::style_file(c("GAM-perm-slice-principal-comp.R", "mFISH_HiC_3D_compare.R"), scope = "line_breaks") Styling 2 files: GAM-perm-slice-principal-comp.R ℹ mFISH_HiC_3D_compare.R ℹ ────────────────────────────────────── Status Count Legend ✔ 0 File unchanged. ℹ 2 File changed. ✖ 0 Styling threw an error. ────────────────────────────────────── This is guaranteed to give the exact same abstract syntax tree (AST), cf. r-lib/styler#368 (comment)

lorenzwalthert added Priority: Medium Complexity: Low Type: User question Status: Unassigned labels Mar 13, 2018

This was referenced Mar 13, 2018

Verify lossless parse + serialize (without transforms) #85

Closed

Adding roundtrip for nested processing #140

Closed

Add roundtrip #287

Merged

Roundtrip and CLI #289

Merged

lorenzwalthert added a commit to lorenzwalthert/styler that referenced this issue Apr 3, 2018

Notes on guarantee of correctness. Closes r-lib#368.

ca3eb0e

lorenzwalthert closed this as completed in 4339a22 Apr 3, 2018

HenrikBengtsson added a commit to HenrikBengtsson/hsa that referenced this issue Apr 3, 2018

CLEANUP: styler::style_text(code, scope = "tokens") [#1]

e62819b

ASTs are not identical before or after because '=' are replaced with '<-' (r-lib/styler#368) Validation: Visual inspection

lorenzwalthert added a commit to lorenzwalthert/styler that referenced this issue Apr 6, 2018

Notes on guarantee of correctness. Closes r-lib#368.

95d32c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guarantee of correctness? #368

Guarantee of correctness? #368

HenrikBengtsson commented Mar 13, 2018

HenrikBengtsson commented Mar 13, 2018

Uh oh!

lorenzwalthert commented Mar 13, 2018

Uh oh!

lorenzwalthert commented Mar 13, 2018

Uh oh!

lorenzwalthert commented Mar 13, 2018

Uh oh!

lorenzwalthert commented Mar 23, 2018

Uh oh!

HenrikBengtsson commented Mar 25, 2018

Uh oh!

lorenzwalthert commented Mar 25, 2018

Uh oh!

HenrikBengtsson commented Mar 25, 2018

Uh oh!

lorenzwalthert commented Mar 25, 2018

Uh oh!

HenrikBengtsson commented Mar 25, 2018

Uh oh!

Guarantee of correctness? #368

Guarantee of correctness? #368

Comments

HenrikBengtsson commented Mar 13, 2018

HenrikBengtsson commented Mar 13, 2018

Uh oh!

lorenzwalthert commented Mar 13, 2018

Uh oh!

lorenzwalthert commented Mar 13, 2018

Uh oh!

lorenzwalthert commented Mar 13, 2018

Uh oh!

lorenzwalthert commented Mar 23, 2018

Uh oh!

HenrikBengtsson commented Mar 25, 2018

Roundtrip validation supported by scope = "line_breaks"

Roundtrip validation not possible with scope = "tokens"

Uh oh!

lorenzwalthert commented Mar 25, 2018

Uh oh!

HenrikBengtsson commented Mar 25, 2018

Uh oh!

lorenzwalthert commented Mar 25, 2018

Uh oh!

HenrikBengtsson commented Mar 25, 2018

Uh oh!

Roundtrip validation supported by `scope = "line_breaks"`

Roundtrip validation not possible with `scope = "tokens"`