Skip to content

removed gsutil dependency #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

removed gsutil dependency #21

wants to merge 1 commit into from

Conversation

loretoparisi
Copy link

I have removed the gsutil dependency using curl and google drive. This approach is well-known and used in several frameworks that need to download large models files (like in FastText)

@FloridaPete
Copy link

Thanks for doing this, but can you answer a dumb question? How do you download the model with this technique? I see access denied when trying to follow the url's. Thanks.

@loretoparisi
Copy link
Author

@FloridaPete the URL is public, basically curl will get the url from the public download url of the google drive file. Did you run the shell script?

@Georgerowberry
Copy link

I'm getting this error when running the script:

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

@loretoparisi
Copy link
Author

@Georgerowberry thank you, let me check the tar file I have uploaded.

@WuTheFWasThat
Copy link
Contributor

Hi - this is nice, but I'm afraid I can't have us officially pointing to something not controlled by us (in Google Drive)

@loretoparisi
Copy link
Author

loretoparisi commented Feb 15, 2019

Of course you should upload your file, but if you prefer to force people to install 200MB of google-cloud-kit, I think that it's up to you. Normally when dealing with large dataset, most of github repository uses Google Drive approach by the way, if you are not aware of this.

The right question is: why OpenAI has anything better to host files?!?

@ZeweiChu
Copy link

ZeweiChu commented Feb 16, 2019

I'm getting this error when running the script:

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

I had the same error with your script.

I ended up downloading from the google drive directly.
https://drive.google.com/uc?export=download&id=1kY-qc0uCU3uGPhVGTfryvHr5zWh_B9g7

@loretoparisi
Copy link
Author

@ZeweiChu @Georgerowberry I have fixed the link, that it points to the right 117M.tar.gz file!
By the way the script is here https://github.com/loretoparisi/gpt-2/blob/master/download_model.sh

@WuTheFWasThat
Copy link
Contributor

WuTheFWasThat commented Feb 16, 2019

Sorry I don't have time to test right now but can a version work simply with something like

curl -X GET https://storage.googleapis.com/gpt-2/models/117M/encoder.json?

jchwenger pushed a commit to jchwenger/gpt-2 that referenced this pull request Jan 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants