flattening out the parse table #111

lorenzwalthert · 2017-08-03T17:29:27Z

This is the first part of #106.

The nested parse table is turned into a flat parse table (called flattened_pd so it can be distinguished from pd_flat) by propagating all important attributes to the terminals first and then extracting the terminals from the nested structure. Serialization is done on that flat table, very similar to the serialization of the flat approach.

Uses a specialised visitor-like approach.

codecov · 2017-08-03T17:38:31Z

Codecov Report

❗ No coverage uploaded for pull request base (master@93e529b). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master     #111   +/-   ##
=========================================
  Coverage          ?   88.29%           
=========================================
  Files             ?       18           
  Lines             ?      658           
  Branches          ?        0           
=========================================
  Hits              ?      581           
  Misses            ?       77           
  Partials          ?        0

Impacted Files	Coverage Δ
R/visit.R	`100% <100%> (ø)`
R/serialize.R	`41.3% <100%> (ø)`
R/nested.R	`98.27% <100%> (ø)`
R/serialized_tests.R	`100% <100%> (ø)`
R/transform.R	`86.66% <100%> (ø)`
R/rules-other.R	`100% <100%> (ø)`
R/parsed.R	`97.72% <100%> (ø)`
R/rules-spacing.R	`94.33% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93e529b...bcc2218. Read the comment docs.

krlmlr

Thanks. I'm not in love with the readr import (need to declare it in DESCRIPTION, too), but we can postpone changing the recursive calls to visitors until later if it helps.

krlmlr · 2017-08-03T20:09:38Z

R/visit.R

+#' @return An updated parse table.
+#' @seealso context_to_terminals
+context_towards_terminals <- function(pd_nested,
+                                      passed_lag_newlines,


Maybe outer_ or parent_ instead of passed_?

krlmlr · 2017-08-03T20:14:39Z

R/visit.R

+#'   relative in `pd_nested`) will be converted into absolute.
+#' @inherit context_towards_terminals
+#' @seealso context_towards_terminals visitors
+context_to_terminals <- function(pd_nested,


What do we need to change to use this with pre_visit() instead of a "manual" recursive call?

I tried that but I could not find an easy way to do that. In contrast to pre_visit(), we pass scalars from one level to the other, not functions. Also, unlike in pre_visit(), we do not map over the children only, but simultaneously over other columns too (pmap() instead of map()). You could probably create a function that can accommodate both (a pmap visitor and for the usual case, we just have p = 1), but I felt it would not make things clearer since the tasks are not very similar in their implementation.

How about:

visit_context_to_terminals <- function(pd) { pd <- context_towards_terminals(pd, pd$outer_lag_newlines, ...) pd$child <- map2(pd$child, ..., function(x, y) { x[["outer_lag_newlines"]] <- y }) ... pd }

This stores the information for the next stage of the visitor in the children, where it is then picked up as needed.

I did this:

context_to_terminals <- function(pd_nested) { if (is.null(pd_nested)) return() pd_transformed <- context_towards_terminals( pd_nested, pd_nested$outer_lag_newlines, pd_nested$outer_indent, pd_nested$outer_spaces ) pd_transformed$child <- pmap( list( pd_transformed$child, pd_transformed$lag_newlines, pd_transformed$indent, pd_transformed$spaces), function(child, lag_newlines, indent, spaces) { if (is.null(child)) return(NULL) child[1, "outer_lag_newlines"] <- lag_newlines child[["outer_indent"]] <- indent child[nrow(child), "outer_spaces"] <- spaces child }) pd_transformed }

And changed context_towards_terminals to

context_towards_terminals <- function(pd_nested, outer_lag_newlines, outer_indent, outer_spaces) { pd_nested$indent <- pd_nested$indent + outer_indent pd_nested$lag_newlines <- pd_nested$lag_newlines + outer_lag_newlines pd_nested$spaces <- pd_nested$spaces + outer_spaces pd_nested }

Which could probably done more elegantly with a map_if().
In addition, we need to initialise the new columns in create_filler

pd_flat$outer_lag_newlines <- 0 pd_flat$outer_indent <- 0 pd_flat$outer_spaces <- 0

And in all functions that create new tokens such as add_brackets_in_pipe.
All tests pass.
What do you think? I think initial version was more clear, did not need three new columns and had less (almost-)code duplication. Maybe there is some more simplification I missed?

I would only add these columns temporarily in this visitor, and then remove them from the final result.

Anyway, I thought we should keep using visitors instead of recursive calls, but if the code is more difficult to understand with the visitor, let's keep the recursive calls.

Another advantage of using visitors though is that we can allow the user to take control over those transformations if the transformers are passed via the transformers argument. Maybe we can change it later if we consider this to be a manipulation the user should control.

krlmlr · 2017-08-03T20:17:39Z

R/visit.R

+       function(terminal, token, text, lag_newlines, spaces, indent, id,
+                parent, line1, child) {
+         if (terminal) {
+           c(lag_newlines, indent, token, text, spaces, id, parent, line1)


Better to use list() here to preserve types. (Don't need it at all if we switch to a post visitor.)

right, then we can forget about the type conversion with readr.

krlmlr · 2017-08-03T20:19:13Z

R/visit.R

+#' Helper to extract terminals
+#'
+#' @param pd_nested A nested parse table.
+extract_terminals_helper <- function(pd_nested) {


Again, I think a "post" visitor might do a good job here as well.

I just tried that and it works.

extract_terminals <- function(pd_nested) { flattened_pd <- post_visit(pd_nested, list(extract_terminals_helper)) flattened_pd } extract_terminals_helper <- function(pd_nested) { is_terminal <- pd_nested$terminal if (!any(pd_nested$terminal)) return(bind_rows(pd_nested$child)) bound <- bind_rows( pd_nested$child[!is_terminal], pd_nested[is_terminal,] ) child_terminal <- map(pd_nested$child, function(x) { if (is.null(x)) return(FALSE) rep(TRUE, nrow(x)) })%>% flatten_lgl() bound[order(c(which(child_terminal), which(!child_terminal))),] }

As you can see, the problem is the sorting. We cannot sort according to line1 and col1 anymore because we have a rule that adds () to pipe calls, so in particular col1 become outdated (and I don't think it's worth updating all line and column information since we do that once flattened out anyways (and there it's easy).
Do you see a better way?

I just found a bug in add_brackets_in_pipe. We can arrange according to line1 and col1 if there is no colision, ie.

a %>% b

Will become

a %>% b()

And since there is no token after () on that line, there are not two tokens with the same col1 and col2 value. If there were we might run into trouble. So I think it's not really a good idea to do so. I think that's more of a hack than anything else...

Instead of sorting by (line1, col1), we could create our own sequential sort key; new tokens could then get intermediate values (e.g., for the closing parens, key + 1/3 and key + 2/3).

Yes, that's true. If you consider that to be more elegant than the initial solution, I can implement that.

I think we can leave it for now and file a separate issue.

Ok. Now a post visitor is implemented. Should i still create the issue just that we have a sorting key?

Yes, I think this would be useful, but low prio.

Reference (#112)

krlmlr · 2017-08-03T20:20:57Z

R/visit.R

+    cumsum(flattened_pd$lag_newlines) + flattened_pd$line1[1]
+
+  flattened_pd$newlines <- lead(flattened_pd$lag_newlines, default = 0)
+  flattened_pd$nchar <- nchar(flattened_pd$text)


We might want to use nchar(..., type = "width") here.

krlmlr · 2017-08-03T20:21:23Z

R/visit.R

+#'  `line1`. The same applies for `col1` and `col2`.
+#' @inheritParams choose_indention
+enrich_terminals <- function(flattened_pd, use_raw_indention = FALSE) {
+  flattened_pd$lag_spaces <- lag(flattened_pd$spaces, default = 0)


Perhaps = 0L to remain in the domain of integers?

uups, changed it elsewhere. Now I think all default = 0 are changed to default = 0L.

krlmlr · 2017-08-03T20:24:03Z

R/visit.R

+#' @importFrom readr type_convert col_integer cols
+extract_terminals <- function(pd_nested) {
+  flat_vec <- extract_terminals_helper(pd_nested) %>%
+    unlist()


If the helper uses list(), this result can be simply combined using bind_rows().

Not sure whether I understand you correctly. I used list() instead of c() in the helper in if (terminal). This still gives me some nested list, and if I call bind_rows() on that, I get an error:

'getCharCE' must be called on a CHARSXP

Which I think is due to the fact that we try to bind nested lists. I tried to use purrr::transpose() but I could not find a quick solution.

I vaguely remember problems with bind_rows() that might be resolved in the dev version, but that's not the main point.

Maybe a visitor solution along the following lines would work around the need to sort:

visit_extract_terminals <- function(pd_nested) { pd_split <- split(pd_nested, seq_len(nrow(pd_nested))) bind_rows(ifelse(pd_nested$terminal, pd_split, pd_nested$child)) }

It may be slow, but we can optimize later.

Ok, that's a good idea

Use visitor for terminal extraction instead of complicated and pmap / matrix construct.

krlmlr

Thanks, looks good. Agree to your last comment about converting to a visitor, can you please file an issue?

krlmlr · 2017-08-07T07:24:56Z

NAMESPACE

@@ -21,4 +21,7 @@ importFrom(purrr,pmap)
 importFrom(purrr,pwalk)
 importFrom(purrr,reduce)
 importFrom(purrr,when)
+importFrom(readr,col_integer)


Do we still need this?

krlmlr · 2017-08-07T07:25:54Z

R/serialize.R

@@ -66,6 +68,28 @@ serialize_parse_data_flat <- function(pd_flat) {
        collapse = "")) %>%
    .[["text_ws"]] %>%
    strsplit("\n", fixed = TRUE) %>%
-    .[[1L]]
+    .[[1L]] %>%
+    trimws(which = "right")


Why do we need to trim whitespace after serializing?

Because we don't do it elsewhere anymore and I think it's easiest done here. I think we once had a transformer for it but that does not work if we do not want to touch indention (next topic after this PR is merged) because there, the spaces after the token are the spaces after the token and the line breaks (so spaces = indention for last token on a line). In your initial version of styler, we trimmed the initial text and replaced tabs with two white spaces. Why did we remove it there again?

We did the trimming in the initial text exclusively for the roundtrip test, which we currently don't have. For serialization, we're inserting spaces ourselves, and I'd rather avoid inserting extra space (by adapting newlines_and_spaces() if necessary) instead of trimming after the fact.

Ok, you are right. In fact, it already works like that, just not for one edge case: empty comments. This is because we have the rule that adds a space at the beginning of each comment.

#'

So now I changed the rule to only add a space if the comment is non-empty (see adapted help file) and reverted 3b193f0.

krlmlr · 2017-08-07T07:26:31Z

R/serialize.R

+    .[["text_ws"]] %>%
+    strsplit("\n", fixed = TRUE) %>%
+    .[[1L]] %>%
+    trimws(which = "right")


krlmlr · 2017-08-07T07:27:34Z

R/visit.R

+#'  `line1`. The same applies for `col1` and `col2`.
+#' @inheritParams choose_indention
+enrich_terminals <- function(flattened_pd, use_raw_indention = FALSE) {
+  flattened_pd$lag_spaces <- lag(flattened_pd$spaces, default = 0)


krlmlr · 2017-08-07T07:30:23Z

R/visit.R

+#' @importFrom readr type_convert col_integer cols
+extract_terminals <- function(pd_nested) {
+  if (is.null(pd_nested)) return(pd)
+  pd_splitted <- split(pd_nested, seq_len(nrow(pd_nested)))


splitted -> split

lorenzwalthert · 2017-08-07T08:12:23Z

Yes, issue #113 filed regarding the advantage of using visitors instead of recursive function calls for enhanced user control.

This reverts commit 3b193f0.

lorenzwalthert added 6 commits August 3, 2017 19:23

Propagate context to terminals

ef1ba00

Uses a specialised visitor-like approach.

Extract terminals from nested parse table.

f48fb9d

enrich flattened parse table with line, col etc.

75cab06

serialize flattened parse table

36a87fb

putting it all together: actually call new functions

1d90bde

FIXME. Will be fixed in future commit

7095527

lorenzwalthert requested a review from krlmlr August 3, 2017 17:29

use line1 col1 / col2 instead of line and col.

069b81a

lorenzwalthert force-pushed the pr106_1_flattening_out branch from feb2873 to 069b81a Compare August 3, 2017 18:17

krlmlr reviewed Aug 3, 2017

View reviewed changes

lorenzwalthert added 5 commits August 4, 2017 08:51

argument renaming: outer_ instead of passed_

92d261b

fix bug in add_brackets_in pipe

02f691d

Use visitor for terminal extraction

0fd8723

Use visitor for terminal extraction instead of complicated and pmap / matrix construct.

trim white space as last serialization step

3b193f0

0L instead of 0, type = "width" in nchar().

13dad41

lorenzwalthert requested a review from krlmlr August 7, 2017 07:17

krlmlr reviewed Aug 7, 2017

View reviewed changes

0L insted of 0. Now I searched through everything.

bd23590

lorenzwalthert mentioned this pull request Aug 7, 2017

Create key for sorting #112

Closed

remove readr import

4cc9382

lorenzwalthert mentioned this pull request Aug 7, 2017

user-control via transformers #113

Closed

lorenzwalthert added 2 commits August 7, 2017 10:47

Revert "trim white space as last serialization step"

c60bf90

This reverts commit 3b193f0.

don't insert space after # / #' if only spaces follow

0cf43fc

lorenzwalthert requested a review from krlmlr August 7, 2017 08:56

krlmlr approved these changes Aug 7, 2017

View reviewed changes

grammar

bcc2218

lorenzwalthert force-pushed the pr106_1_flattening_out branch from 8ddc1c1 to bcc2218 Compare August 7, 2017 09:18

lorenzwalthert merged commit 5f57d9f into r-lib:master Aug 7, 2017

flattening out the parse table #111

flattening out the parse table #111

Uh oh!

Conversation

lorenzwalthert commented Aug 3, 2017

Uh oh!

codecov bot commented Aug 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

krlmlr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorenzwalthert Aug 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krlmlr Aug 5, 2017 • edited by lorenzwalthert Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorenzwalthert Aug 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorenzwalthert Aug 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorenzwalthert Aug 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorenzwalthert Aug 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

codecov bot commented Aug 3, 2017 •

edited

Loading

lorenzwalthert Aug 4, 2017 •

edited

Loading

krlmlr Aug 5, 2017 •

edited by lorenzwalthert

Loading

lorenzwalthert Aug 4, 2017 •

edited

Loading

lorenzwalthert Aug 4, 2017 •

edited

Loading

lorenzwalthert Aug 5, 2017 •

edited

Loading

lorenzwalthert Aug 6, 2017 •

edited

Loading

lorenzwalthert Aug 7, 2017 •

edited

Loading