-
Notifications
You must be signed in to change notification settings - Fork 0
add data to the package #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have not succeeded in getting in touch with the corpus creator. It also occurs to me that creating an R object on the fly would replicate the work done in scikit-talk (unless it can be done by invoking that package, e.g. through I think we should look more seriously into the GPL-licensed IFADV corpus — it was specifically designed for free use including commercial use so it would be very painful if it were not possible. The 2008 LREC paper by the corpus creators is very clear about this:
The paper foresees some of the implications of using the GPL (see §6) and specifies that the "source" that must be included to make the derivative version GPL-proof:
On one reading then, it seems we could use a portion of the IFADV data in the package, if we also include the original TextGrid or EAF files for that portion. However, it is not clear to me whether this ultimately comes down to the same issue was with the English data, and whether we'd need to create an R object on the fly. Given the express goal of the IFADV project to allow any uses including unlimited distribution that would be a shame. |
I had come to similar conclusions. This does mean we need to change our license to GPL, but that should be OK; especially since we have good reason to do so. (I will nevertheless double check this.) The source behind the works in this context means code or software; this is openly available in our case, so not a worry. As for manipulations done on the source material, R actually has a neat standard to include any additional code that was required to create the derivative objects we ship with our package: we can include a script in the |
Another proposal: Instead of including the IFADV dataset in this package, we can package it separately for R users. Given the sheer amount of materials, and the fact that our code does not depend on it (and can easily use the dataset once it's packaged) this would make sense and make the This would also allow us to stay outside the GPL for our code (or at the very least: avoid complex dual licensing issues), which is worthwhile (see e.g. explanations in this discussion on the ggplot2 github). Knowing that the tidyverse/ggplot2 folks made an extensive effort to relicense away from GPL, and we want to contribute to their ecosystem, is another argument for me to keep with the more permissive Apache license. Do you agree @mdingemanse? If so, I will close this PR and generate a new GPL-licensed data package that includes the ifadv data (we can simply call it |
I'm totally happy with that! |
The data package is here: https://github.com/elpaco-escience/ifadv |
We will be using the Santa Barbara Corpus of Spoken American English to attach to the package. The data is licensed under CC-BY-ND, which means we are not allowed to distribute a derivative; unfortunately, an R object with a dataset is a derivative, so we need to distribute only the raw data and create the R object on the fly.
Perhaps contacting [email protected] to ask for specific permission is worthwhile? The original author mentioned with the license (John W. DuBois) does not seem to be connected to the department anymore (his page is empty).
The text was updated successfully, but these errors were encountered: