Skip to content

Function to create initial data docs #1681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
focardozom opened this issue Sep 20, 2022 · 13 comments
Open

Function to create initial data docs #1681

focardozom opened this issue Sep 20, 2022 · 13 comments
Labels
docs 💡 documentation, news, vignettes, website, etc feature a feature request or enhancement tidy-dev-day 🤓 Tidyverse Developer Day

Comments

@focardozom
Copy link

Hi! document data is always tricky, so having a function to help people document the dataset would be fantastic. This function could create an .R file in the R folder containing information gathered from the data set. For example, the information can be inserted in a roxygen template, and the @Items values can be filled using glue(). The template could also include some descriptive information to help users understand the dataset better.

@jennybc
Copy link
Member

jennybc commented Sep 20, 2022

Would you like to make a PR for consideration? Full disclosure: I'm not 100% convinced that usethis should do this. But this is the topic of an issue I recently closed in R Packages, which contains some concrete ideas to start with.

hadley/r-pkgs#707

@jennybc
Copy link
Member

jennybc commented Sep 21, 2022

As I continue to work on R Packages, I've learned there are packages that already offer functions to do this. One example is sinew (https://cran.r-project.org/web/packages/sinew/index.html). So given that there are solutions out there already, I don't think it's a priority for us to add this to usethis.

@jennybc jennybc closed this as not planned Won't fix, can't repro, duplicate, stale Sep 21, 2022
@ngreifer
Copy link

ngreifer commented Nov 9, 2022

Just wanted to add to this since I had the same thought. sinew does not fit well into the package development ecosystem for usethis users; it prints a string rather than creating a file, doesn't use Markdown syntax, doesn't check for existing Roxygen etc. usethis offers use_package_doc(), which is a major help, and I think the same could be done by a function, e.g., use_data_doc(), which takes in a dataset and creates a file with a Roxygen skeleton. It could also be an argument to use_data(), e.g., use_data(., doc = TRUE) which saves the dataset to /data and creates a documentation file. Thank you for considering and making such a useful package!

@focardozom
Copy link
Author

Just wanted to add to this since I had the same thought. sinew does not fit well into the package development ecosystem for usethis users; it prints a string rather than creating a file, doesn't use Markdown syntax, doesn't check for existing Roxygen etc. usethis offers use_package_doc(), which is a major help, and I think the same could be done by a function, e.g., use_data_doc(), which takes in a dataset and creates a file with a Roxygen skeleton. It could also be an argument to use_data(), e.g., use_data(., doc = TRUE) which saves the dataset to /data and creates a documentation file. Thank you for considering and making such a useful package!

Hi, based on @jennybc suggestions, I created this function to help with the documentation. This is under construction, so any comments will be constructive. I like your idea of use_data(., doc = TRUE), so maybe they will re-open this issue in the future. I am also a big fan of use_this.

@jennybc
Copy link
Member

jennybc commented Nov 10, 2022

OK, we'll reconsider.

@jennybc jennybc reopened this Nov 10, 2022
@hadley hadley changed the title [Feature request] it would be nice to have a function to help people create the documentation of the dataset Function to create initial data docs Jan 17, 2023
@hadley hadley added feature a feature request or enhancement docs 💡 documentation, news, vignettes, website, etc labels Jan 17, 2023
@hadley
Copy link
Member

hadley commented Jan 18, 2023

We probably also need to think a little about how we organise data documentation files. We currently tend to dump all data docs into a single .R, but that's obviously going to be harder to edit with a script. Maybe we should move to a convention we have data/foo.Rd, data-raw/foo.R, and R/data-foo.R?

@ijlyttle

This comment was marked as outdated.

@hadley

This comment was marked as outdated.

@focardozom
Copy link
Author

We probably also need to think a little about how we organise data documentation files. We currently tend to dump all data docs into a single .R, but that's obviously going to be harder to edit with a script. Maybe we should move to a convention we have data/foo.Rd, data-raw/foo.R, and R/data-foo.R?

The FAIR framework can be a helpful resource for organizing dataset documentation files. From my experience, most researchers only describe the variables in the dataset, but including good metadata can make the datasets more valuable and useful. Thank you for considering this issue as a potential feature.  

@hadley
Copy link
Member

hadley commented Jan 19, 2023

@focardozom I'm not familiar with FAIR. Can you please summarise how it might inform a function that automatically creates a documentation template?

@focardozom
Copy link
Author

@hadley FAIR can be used as a checklist to decide what information should be included in the template created by the function. Following the FAIR, the template should include two categories: (1) metadata, which includes information that helps others find, access, and use the data, such as details about how the data was gathered, licensing, file size, format, etc. Some of this information can be automatically extracted from the data object and included in the template, while other information should be suggested to the user to fill in. (2) The template should also include spaces to describe the variables. Users can use this template to ensure that they at least include basic elements recommended by guides like FAIR.

@focardozom
Copy link
Author

We probably also need to think a little about how we organise data documentation files. We currently tend to dump all data docs into a single .R, but that's obviously going to be harder to edit with a script. Maybe we should move to a convention we have data/foo.Rd, data-raw/foo.R, and R/data-foo.R?

I am talking with my mentor @RaymondBalise. We looked at how you documented datasets in ggplot2 and we see your point now. Dr. Balise teaches us to do R/foo.R and R/bar.R. He commented that R/data-foo.R is a great idea.

I have been reviewing https://design.tidyverse.org and I would love to apply what I have learned. Can I see how you are coding this or can I help?

@jennybc
Copy link
Member

jennybc commented Jul 22, 2024

Labelling for tidyverse dev day. Overall advice: start small, aim for an MVP (so: probably not everything you see discussed above).

@jennybc jennybc added the tidy-dev-day 🤓 Tidyverse Developer Day label Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs 💡 documentation, news, vignettes, website, etc feature a feature request or enhancement tidy-dev-day 🤓 Tidyverse Developer Day
Projects
None yet
Development

No branches or pull requests

5 participants