-
Notifications
You must be signed in to change notification settings - Fork 7.1k
CelebA download is broken #5705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It seems the author has intentionally restricted the visibility for these files for some reason. If that is true, I think there is no way for us to provide an automatic download. |
It seems something on the GDrive side has changed. The link above now gives a 403 (forbidden) instead a 404 page when you are logged. When you are not logged in you will be prompted to do so and afterwards see the same 403 page. Exporting the link manually gives https://drive.google.com/file/d/0B7EVK8r0v71pZjFTYXZWM3FlRnM/view. While logged in you get transferred to the download page. If you are not logged in, you get the login prompt but land on page stating that access need to be requested. For other datasets, e.g. Caltech101, both link variants are equivalent:
We prefer the first, because it has less redirects and checks. In any case it seems that Celeba is no longer automatically downloadable unless you are logged in. Thus, I propose we disable the download functionality preferably before the upcoming release. @NicolasHug Thoughts? |
Thanks for reminding me and for your investigations @pmeier ! What do you mean by "being logged in"? Is it logged in from a browser? |
Yes. In particular, we need the session cookies. As possible way is to ask users to export the cookies to a file from the browser that we can read in. But this far from an "automated approach". AFAIK, there is no way to login from the command line or through env vars or the like. |
Thanks for the deets. I agree we should deactivate it. |
Closed in #6052. |
Should we keep it open? Ultimately we'll want to put back the download feature, if the Gdrive becomes available again? |
That's reasonable. Should we also ping the owner (send an email or something) to let them know and ask them if they can open-up or rehost the dataset? |
I don't think it will come back. This is not a limitation by GDrive, but as explained in #5705 (comment) a conscious decision by the author to limit access. I've contacted them twice and asked to revert it, but got no response. If we want to keep it open, we should have some kind scheduled test or the like if the download is publicly accessible again. Otherwise we'll just forget about this and will have a stale issue. At least I will forget to regularly check the dataset if the author changed permissions. |
Thanks for clarifying Philip. Let me close again then. |
If we're not expecting to put it back, then we might want to deprecate the |
Hi there, It seems the situation is even worse now? Trying to download it like so: dataset = dset.CelebA(
root='datasets/celeba',
download=True,
) ... results in the following error: ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-15-75c5cdc7e903>](https://localhost:8080/#) in <cell line: 1>()
----> 1 dataset = dset.CelebA(
2 root='datasets/celeba',
3 download=True,
4 )
2 frames
[/usr/local/lib/python3.10/dist-packages/torchvision/datasets/utils.py](https://localhost:8080/#) in download_file_from_google_drive(file_id, root, filename, md5)
244
245 if api_response == "Quota exceeded":
--> 246 raise RuntimeError(
247 f"The daily quota of the file {filename} is exceeded and it "
248 f"can't be downloaded. This is a limitation of Google Drive "
RuntimeError: The daily quota of the file img_align_celeba.zip is exceeded and it can't be downloaded. This is a limitation of Google Drive and can only be overcome by trying again later. It would be really nice if that could be available out of the box somehow. Perhaps indeed disable the download feature, or point to some other way of obtaining the dataset (Huggingface?)? Thanks for reading in any case! |
The download of all CelebA files except
identity_CelebA.txt
is broken. For example, the URL to downloadimg_align_celeba.zip
resolves to https://drive.google.com/uc?id=0B7EVK8r0v71pZjFTYXZWM3FlRnM&export=download. This link is publicly accessible, but you have to be logged into Google. Otherwise you'll see a 404 page.I'll have a look if it is possible to get a general download link from the ID.
cc @pmeier @YosuaMichael
The text was updated successfully, but these errors were encountered: