Skip to content

as.treedata is not compatible with merge manipulation? #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
6 tasks done
ETaSky opened this issue Jul 1, 2020 · 4 comments
Closed
6 tasks done

as.treedata is not compatible with merge manipulation? #36

ETaSky opened this issue Jul 1, 2020 · 4 comments

Comments

@ETaSky
Copy link

ETaSky commented Jul 1, 2020

Prerequisites

  • Have you read Feedback and follow the guide?
    • make sure your are using the latest release version
    • read the documents
    • google your quesion/issue

Describe you issue

I was having trouble with "the manipulating tree data using tidy interface". Briefly, I have a tree file created and converted to a tibble using as_tibble, but after some manipulation, this tibble cannot be converted back by as.treedata.

  • Make a reproducible example
  • your code should contain comments to describe the problem (e.g. what expected and actually happened?)
# This is a tree I created using the taxonomy info of several genera, the branch length is fake
Tre = "(((((D_5_-G1:20)D_4_Enterobacteriaceae:20)D_3_Enterobacteriales:20)D_2_Gammaproteobacteria:20)D_1_Proteobacteria:20,((((D_5_Agathobacter-G2:20,D_5_CAG.56-G3:20)D_4_Lachnospiraceae:20,(D_5_Ruminococcaceae.UCG.010-G10:20)D_4_Ruminococcaceae:20)D_3_Clostridiales:20)D_2_Clostridia:20,(((D_5_Asteroleplasma-G4:20)D_4_Erysipelotrichaceae:20)D_3_Erysipelotrichales:20)D_2_Erysipelotrichia:20,(((D_5_Dialister-G9:20)D_4_Veillonellaceae:20)D_3_Selenomonadales:20)D_2_Negativicutes:20)D_1_Firmicutes:20,((((D_5_Bacteroides-G5:20)D_4_Bacteroidaceae:20)D_3_Bacteroidales:20)D_2_Bacteroidia:20)D_1_Bacteroidetes:20,((((D_5_Candidatus.Lumbricincola-G6:20)D_4_Mycoplasmataceae:20)D_3_Mycoplasmatales:20,((D_5_uncultured.bacterium-G7:20)D_4_uncultured.bacterium:20,(D_5_-G8:20)D_4_:20)D_3_Mollicutes.RF39:20)D_2_Mollicutes:20)D_1_Tenericutes:20)D_0_Bacteria:1;"

# convert to treeio tree object
Tre_td <- as.treedata(ape::read.tree(text = Tre))
# convert to tibble
Tre_tb <- as_tibble(Tre_td)
str(Tre_tb)
# test manipulation
Tre_tb_t <- merge(Tre_tb, Tre_tb %>% select(4) %>% mutate(Test = "AAA"), by.x = 4, by.y =1) %>% as_tibble()
str(Tre_tb_t)
Tre_tb_t %>% as.treedata() # then the error message is `Error in check_edgelist(x) : Cannot find root. network is not a tree!`

After doing some digging, I think the problem is due to merge is not compatible with treeio. The output of str(Tre_tb) shows (S3: tbl_tree/tbl_df/tbl/data.frame); however, the output of str(Tre_tb_t) shows (S3: tbl_df/tbl/data.frame). This won't happen if the manipulation is performed using dplyr. I guess this is where the problem arises.

A side note, for some reason, the as_tibble(Tre_td) will generate a warning message:

Warning message:
Unknown or uninitialised column: `node`.

not sure why?

Thank you!

Jincheng

Session Info
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.0     purrr_0.3.4     readr_1.3.1     tidyr_1.1.0     tibble_3.0.1    ggplot2_3.3.2   tidyverse_1.3.0
[10] treeio_1.12.0  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6     cellranger_1.1.0 pillar_1.4.4     compiler_4.0.0   dbplyr_1.4.4     tools_4.0.0      lubridate_1.7.9  jsonlite_1.7.0   tidytree_0.3.3  
[10] lifecycle_0.2.0  nlme_3.1-147     gtable_0.3.0     lattice_0.20-41  pkgconfig_2.0.3  rlang_0.4.6      reprex_0.3.0     cli_2.0.2        DBI_1.1.0       
[19] rstudioapi_0.11  parallel_4.0.0   haven_2.3.1      withr_2.2.0      xml2_1.3.2       httr_1.4.1       fs_1.4.2         hms_0.5.3        generics_0.0.2  
[28] vctrs_0.3.1      grid_4.0.0       tidyselect_1.1.0 glue_1.4.1       R6_2.4.1         fansi_0.4.1      readxl_1.3.1     modelr_0.1.8     blob_1.2.1      
[37] magrittr_1.5     backports_1.1.8  scales_1.1.1     ellipsis_0.3.1   rvest_0.3.5      assertthat_0.2.1 ape_5.4          colorspace_1.4-1 stringi_1.4.6   
[46] lazyeval_0.2.2   munsell_0.5.0    broom_0.5.6      crayon_1.3.4    
@ETaSky ETaSky changed the title as.treedata not compatible with merge manipulation as.treedata is not working after tbl manipulation Jul 1, 2020
@ETaSky ETaSky changed the title as.treedata is not working after tbl manipulation as.treedata is not compatible with merge manipulation? Jul 1, 2020
@GuangchuangYu
Copy link
Member

GuangchuangYu commented Jul 2, 2020 via email

@ETaSky
Copy link
Author

ETaSky commented Jul 2, 2020

@GuangchuangYu Thank you! I tried to make it work by using full_join and it is good to know that the class can be assigned.

It seems that without the "tbl_tree" attribute, as.treedata will use as.treedata.tbl_df function, in which the check_edgelist function caused the error message.

check_edgelist <- function(edgelist) {
    if (dim(edgelist)[2] < 2)
        stop("input should be a matrix of edge list that holds the relationships in the first two columns")
    if (length(unique(edgelist[[1]])) > length(unique(edgelist[[2]]))) {
        children <- edgelist[[1]]
        parents <- edgelist[[2]]
    } else {
        children <- edgelist[[2]]
        parents <- edgelist[[1]]
    }
    root <- unique(parents[!(parents %in% children)])
    if (length(root) != 1)
        stop("Cannot find root. network is not a tree!")

    matrix(c(parents, children), ncol=2)
}

The line root <- unique(parents[!(parents %in% children)]) in my case would return an empty value. Because all parents are in children, which I think is not uncommon (because even for the root node, in the table, parent and child has the same node number). As a result, I think maybe this function should be updated?

Thanks!

@GuangchuangYu
Copy link
Member

I noticed this either.

I think a better solution is to implemented a merge method.

If you install the github version of tidytree. The following code should works:

# This is a tree I created using the taxonomy info of several genera, the branch length is fake
Tre = "(((((D_5_-G1:20)D_4_Enterobacteriaceae:20)D_3_Enterobacteriales:20)D_2_Gammaproteobacteria:20)D_1_Proteobacteria:20,((((D_5_Agathobacter-G2:20,D_5_CAG.56-G3:20)D_4_Lachnospiraceae:20,(D_5_Ruminococcaceae.UCG.010-G10:20)D_4_Ruminococcaceae:20)D_3_Clostridiales:20)D_2_Clostridia:20,(((D_5_Asteroleplasma-G4:20)D_4_Erysipelotrichaceae:20)D_3_Erysipelotrichales:20)D_2_Erysipelotrichia:20,(((D_5_Dialister-G9:20)D_4_Veillonellaceae:20)D_3_Selenomonadales:20)D_2_Negativicutes:20)D_1_Firmicutes:20,((((D_5_Bacteroides-G5:20)D_4_Bacteroidaceae:20)D_3_Bacteroidales:20)D_2_Bacteroidia:20)D_1_Bacteroidetes:20,((((D_5_Candidatus.Lumbricincola-G6:20)D_4_Mycoplasmataceae:20)D_3_Mycoplasmatales:20,((D_5_uncultured.bacterium-G7:20)D_4_uncultured.bacterium:20,(D_5_-G8:20)D_4_:20)D_3_Mollicutes.RF39:20)D_2_Mollicutes:20)D_1_Tenericutes:20)D_0_Bacteria:1;"

# convert to treeio tree object
Tre_td <- as.treedata(ape::read.tree(text = Tre))
# convert to tibble
Tre_tb <- as_tibble(Tre_td)
str(Tre_tb)
# test manipulation

#############################
## now merge(tbl_tree, ...) output tbl_tree object
##############################
Tre_tb_t <- merge(Tre_tb, Tre_tb %>% select(4) %>% mutate(Test = "AAA"), by.x = 4, by.y =1) 
str(Tre_tb_t)
Tre_tb_t %>% as.treedata() 

@ETaSky
Copy link
Author

ETaSky commented Jul 10, 2020

Thanks for the quick fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants