Add 'row.names' into ggplot_build(...)$data very useful for grouped geom_boxplot #4912

DiegoJArg · 2022-07-22T18:37:23Z

Hi.
After seeing in my boxplot that I may have some uncomplete groups of data under certain customer_ids, I wanted to see how filtering them out would change the behavior of all the rest.

My first idea was to gather the already calculated values in the drawn boxplot and filter the dataframe in order to draw it again.

Having successfully gathered the $data, I realized that my customer_ids are not listed on any column of the resulting dataframe. Instead, I saw a $y column with numeric values, which just guessing represents the order of the grouping variable.

Probably the plot object contains the grouping labels, but the "data" structure does not and it would help to have it.

Next code is meant to confirm that no labels are shown in $data, and finally assign them.

I had to confirm that the sequence of $y matched the order of the group, which is a factor() type.
I did that by fixing the seed and inspecting the resulting order in Rstudio.
However, this is not optimum as I don't have knowledge on how sorting and data types are handled internally.
The ideal is to preserve the original grouping names.

set.seed(111)
DF = data.frame(
  id  = factor( rep(LETTERS[1:5], 100), levels=LETTERS[1:5] ),
  COL = sample(1:20, 100, replace=TRUE)
)

# A     B       C     D      E
# 6.0   12.0    8.0   12.5   10.0  <-- xmiddle / ¿median?

bp = ggplot( DF , aes( COL, id ) ) + geom_boxplot();   
bp

# === SEARCH FOR Categorical-labels + Median values ===

# Getting boxplot data
Qggbp  = ggplot_build( bp )$data;         
typeof(Qggbp)    # list
Qggbp # gets converted into DF
row.names(Qggbp) # -> (nothing)
Qggbp$y          # -> null

# Getting boxplot data
Qggbp  = Qggbp[[1]]
typeof(Qggbp)    # list
Qggbp # gets converted into DF
row.names(Qggbp) # -> [1] "1" "2" "3" "4" "5"

# Realising tha they are numbered instead of labeled
Qggbp$y          # -> [1] 1 2 3 4 5   /  attr(,"class")  /  [1] "mapped_discrete" "numeric"
Qggbp$y %>% as.numeric # -> [1] 1 2 3 4 5

# Setting the names row-names to which they are associated.
row.names(Qggbp) <- levels( DF$id )
Qggbp

When writing this I found this question of 5 years ago

The text was updated successfully, but these errors were encountered:

yutannihilation · 2022-07-23T05:19:15Z

I might not understand your request, but, as the last example of geom_boxplot()'s document shows, you can calculate the necessary statistics beforehand if you need more control over the process. ggplot_build(...)$data just shows the internal data mainly for debugging, not for usability.

https://ggplot2.tidyverse.org/reference/geom_boxplot.html#ref-examples

DiegoJArg · 2022-07-23T09:51:50Z

Hi @yutannihilation,
I am not sure why you close this, nothing changed.
ggplot_build(...)$data is the core of the feature request.

The feature request is just to add the row names by default to ggplot_build(...)$data according to the grouping variable. Or a column with associated grouping values. It isn't an intrusive feature.

BPdata = ggplot_build(...)$data[[1]]                   # BPdata has rows without grouping labels
row.names(BPdata ) <- levels( group_factor )    # Now its workarounded, heach row has its respective group label

Also, help page doesn't mention "only debugging, not for usability".

yutannihilation · 2022-07-23T11:13:39Z

I meant, ggplot_build(...)$data is the as-is data that is used internally. It's not where we modify or add features. If the internal data has no row names, we don't add row names (c.f. #4868 (comment)), sorry.

DiegoJArg · 2022-07-23T12:08:50Z

I have to say that I don't understand the reasoning.
And it won't hurt me if the feature is not accepted. :)

But you already have ggplot_build(...)$data $y , which, just guessing, it is a row identifier as scalar.
Probably meant for ordering, and ordering is probably gotten from grouping variable.
Grouping variable is also there at the coordinates.

I checked out if $y and default $data order matches the grouping variable order levels( group_factor ) and it did.

My main objective was to get a guarantied identifier of rows at $data matching back to the initial variable group_factor.

Right not, row.names(BPdata ) <- levels( group_factor ) is my simplest way found, but hoping that $data won't get a different order.

thomasp85 · 2022-07-23T12:28:00Z

$y is the y aesthetic nothing more nothing less. The internal data structure of ggplot2 is without row names because they are 100% unneeded and gives a huge performance penalty in R. So, we won't add this to support a niche case like this

DiegoJArg · 2022-07-23T15:20:28Z

My aesthetic "y", is a factor not a double.
But it's ok, as long as, it is related to it.
Thanks for answering

yutannihilation · 2022-07-23T15:41:26Z

Again, it's an internal data. Numeric is the internal representation. You might feel ok or not ok, but it is what it is.

yutannihilation closed this as completed Jul 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 'row.names' into ggplot_build(...)$data very useful for grouped geom_boxplot #4912

Add 'row.names' into ggplot_build(...)$data very useful for grouped geom_boxplot #4912

DiegoJArg commented Jul 22, 2022

yutannihilation commented Jul 23, 2022

Uh oh!

DiegoJArg commented Jul 23, 2022

Uh oh!

yutannihilation commented Jul 23, 2022

Uh oh!

DiegoJArg commented Jul 23, 2022

Uh oh!

thomasp85 commented Jul 23, 2022

Uh oh!

DiegoJArg commented Jul 23, 2022

Uh oh!

yutannihilation commented Jul 23, 2022

Uh oh!

Add 'row.names' into ggplot_build(...)$data very useful for grouped geom_boxplot #4912

Add 'row.names' into ggplot_build(...)$data very useful for grouped geom_boxplot #4912

Comments

DiegoJArg commented Jul 22, 2022

yutannihilation commented Jul 23, 2022

Uh oh!

DiegoJArg commented Jul 23, 2022

Uh oh!

yutannihilation commented Jul 23, 2022

Uh oh!

DiegoJArg commented Jul 23, 2022

Uh oh!

thomasp85 commented Jul 23, 2022

Uh oh!

DiegoJArg commented Jul 23, 2022

Uh oh!

yutannihilation commented Jul 23, 2022

Uh oh!