Skip to content

control over discrete_scale when faceting #4180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ignaczzs opened this issue Aug 21, 2020 · 5 comments
Closed

control over discrete_scale when faceting #4180

ignaczzs opened this issue Aug 21, 2020 · 5 comments
Labels
bug an unexpected problem or unintended behavior scales 🐍

Comments

@ignaczzs
Copy link

ignaczzs commented Aug 21, 2020

Hi Everyone,
I am trying to plot points to a graph from two dataframes (structured identically) and also faceting the graph. I would like to be in full control over the order of how the points are plotted, the axis labels of what I am plotting and which axis labels appear under which panel of the facet. It seems I can only really control two of these three things simultaneously: both options break and limits in the discrete_scale command cause various issues.
My hunch is that on the variable that I am plotting, in one of dataframes one value of the variable is not present and this causes the confusion/problem.
Hopefully my comments reprex will give you a clear idea of the problem.
Thanks in advance for your response,
Zsófia

### Package Load
library(ggplot2)

### Mock Data.framaes
some.df<-data.frame(varname=c("one","two","three"),
                    b=c(0.5,0.6,0.55),
                    vartyp=c("odd","even","odd"),
                    modname=c("some","some","some"))
other.df<-data.frame(varname=c("one","two","three","four"),
                     b=c(-0.5,0.6,-0.55,-1),
                     vartyp=c("odd","even","odd","even"),
                     modname=c("other","other","other","other"))

## Creating Factor variables (relevant later)
some.df$varnameFACTOR<-factor(some.df$varname, levels=c("one","two","three","four"))
other.df$varnameFACTOR<-factor(other.df$varname, levels=c("one","two","three","four"))

### Aspects that I would like to bring together (and are desired to have control over all of this)
## a. Control of order of shapes/points: "other" over "some"
## b. Control of order of axis labels: higher numbers higher on the axis
## c. Control of axis labels by panels: only labels with valid data showing in panel

## Personal hack: this graph is the one I want to achieve in the end (see list of aspects) - but could never get it without the doubling of geom_point for other.df
ggplot()+
  geom_point(data=other.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname), size=4)+
  geom_point(data=some.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname),size=3)+
  geom_point(data=other.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname), size=4)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y")+
  scale_x_discrete(name="", breaks=c("one","two","three","four"))

### Different versions of the graphs (with comments, of why this is not what I want)
## Varname as string

# Default graph
# a. Control of order of shapes/points: correct,
# b. Control of order of axis labels: Labels are not ordered according to "limits" option, just simply alphabetically
# c. Control of axis labels by panels: correct
ggplot()+
  geom_point(data=other.df, aes(x=varname,y=b,color=modname,shape=modname), size=4)+
  geom_point(data=some.df, aes(x=varname,y=b,color=modname,shape=modname),size=3)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y")

## Graph: scale_x_discrete+limits
# a. Control of order of shapes/points: correct,
# b. Control of order of axis labels: correct
# c. Control of axis labels by panels: Labels in panels when there is no valid information
ggplot()+
  geom_point(data=other.df, aes(x=varname,y=b,color=modname,shape=modname), size=4)+
  geom_point(data=some.df, aes(x=varname,y=b,color=modname,shape=modname),size=3)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y")+
  scale_x_discrete(name="", limits=c("one","two","three","four"))

## Graph: scale_x_discrete+breaks
# a. Control of order of shapes/points: correct,
# b. Control of order of axis labels: Labels are not ordered according to "limits" option, just simply alphabetically
# c. Control of axis labels by panels: correct
ggplot()+
  geom_point(data=some.df, aes(x=varname,y=b,color=modname,shape=modname),size=3)+
  geom_point(data=other.df, aes(x=varname,y=b,color=modname,shape=modname), size=4)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y")+
  scale_x_discrete(name="", breaks=c("one","two","three","four"))

## Graph: scale_x_discrete+breaks+flipped around order of geom_points
# a. Control of order of shapes/points: wrong order
# b. Control of order of axis labels: Labels are not ordered according to "limits" option, just simply alphabetically
# c. Control of axis labels by panels: correct
ggplot()+
  geom_point(data=other.df, aes(x=varname,y=b,color=modname,shape=modname), size=4)+
  geom_point(data=some.df, aes(x=varname,y=b,color=modname,shape=modname),size=3)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y")+
  scale_x_discrete(name="", breaks=c("one","two","three","four"))

### Varname as factor (Using a factor variable brings us closer to the solution - but is still not the solution)
# a. Control of order of shapes/points: wrong order
# b. Control of order of axis labels: Labels are not ordered according to "limits" option, just simply alphabetically
# c. Control of axis labels by panels: correct
ggplot()+
  geom_point(data=some.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname),size=3)+
  geom_point(data=other.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname), size=4)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y")

## Graph: scale_x_discrete+limits
# a. Control of order of shapes/points: correct,
# b. Control of order of axis labels: correct
# c. Control of axis labels by panels: Labels in panels when there is no valid information
ggplot()+
  geom_point(data=some.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname),size=3)+
  geom_point(data=other.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname), size=4)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y")+
  scale_x_discrete(name="", limits=c("one","two","three","four"))

## Graph: scale_x_discrete+breaks
# a. Control of order of shapes/points: wrong order
# b. Control of order of axis labels: Labels are not ordered according to "limits" option, just simply alphabetically
# c. Control of axis labels by panels: correct
ggplot()+
  geom_point(data=some.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname),size=3)+
  geom_point(data=other.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname), size=4)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y")+
  scale_x_discrete(name="", breaks=c("one","two","three","four"))

## Graph: scale_x_discrete+breaks+flipped around order of geom_points
# a. Control of order of shapes/points: wrong order
# b. Control of order of axis labels: correct
# c. Control of axis labels by panels: correct
ggplot()+
geom_point(data=other.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname), size=4)+
  geom_point(data=some.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname),size=3)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y")+
  scale_x_discrete(name="", breaks=c("one","two","three","four"))

Created on 2020-08-31 by the reprex package (v0.3.0)

@ignaczzs ignaczzs changed the title controll over discrete_scale when faceting control over discrete_scale when faceting Aug 21, 2020
@thomasp85
Copy link
Member

Thanks for the detailed description. There is indeed some weird interactions that prevent you from getting the plot you want and AFAIK the correct approach is to encode the data as factors to ensure a consistent sorting. However it seems the factor levels are not kept as part of training the discrete scale so it does not know how to combine it properly when training across multiple layers

@thomasp85 thomasp85 added bug an unexpected problem or unintended behavior scales 🐍 labels Aug 31, 2020
@ignaczzs
Copy link
Author

@thomasp85 thanks for the input! The variables in question are factorized (and have identical levels) (see factor transformation right after the mock-up datasets), and in the second batch of graphs where I am already using factors, it still does not behave as it is supposed. so I am guessing that the discrete scale takes the actually observed factor levels and not the set of factor levels you get with the level() command.

@thomasp85 thomasp85 added this to the ggplot2 3.3.4 milestone Mar 25, 2021
@thomasp85
Copy link
Member

We will move this to the next feature release where the change to using vctrs might provide a fix

@thomasp85 thomasp85 removed this from the ggplot2 3.4.0 milestone May 20, 2022
@thomasp85
Copy link
Member

Looking into this again, this has little to do with vctrs and more to do with how RangeDiscrete$train() works.

To fix this we need to attach the levels to the range field in RangeDiscrete after it has been trained and keep updating this as new levels comes in

@teunbrand
Copy link
Collaborator

I think there has been some improvement in how {scales} trains discrete ranges, so you should get the correct output with the default factor approach.

library(ggplot2)

some.df<-data.frame(varname=c("one","two","three"),
                    b=c(0.5,0.6,0.55),
                    vartyp=c("odd","even","odd"),
                    modname=c("some","some","some"))
other.df<-data.frame(varname=c("one","two","three","four"),
                     b=c(-0.5,0.6,-0.55,-1),
                     vartyp=c("odd","even","odd","even"),
                     modname=c("other","other","other","other"))

some.df$varnameFACTOR<-factor(some.df$varname, levels=c("one","two","three","four"))
other.df$varnameFACTOR<-factor(other.df$varname, levels=c("one","two","three","four"))

ggplot()+
  geom_point(data=other.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname), size=4)+
  geom_point(data=some.df, aes(x=varnameFACTOR,y=b,color=modname,shape=modname),size=3)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y")

If that wouldn't work for some reason, you can give a function to limits() that would return the correct order.

ggplot()+
  geom_point(data=other.df, aes(x=varname,y=b,color=modname,shape=modname), size=4)+
  geom_point(data=some.df, aes(x=varname,y=b,color=modname,shape=modname),size=3)+
  coord_flip(clip = "off")+
  facet_grid(rows=vars(vartyp),scales="free",switch = "y") +
  scale_x_discrete(limits = ~ intersect(levels(some.df$varnameFACTOR), .x))

Created on 2024-07-15 with reprex v2.1.1

As this is now easy to do, I'm going to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior scales 🐍
Projects
None yet
Development

No branches or pull requests

3 participants