Skip to content

Remove restriction that a crate must be in the database to get a download URL #405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

carols10cents
Copy link
Member

@carols10cents carols10cents commented Aug 12, 2016

So I've been thinking about/working on making it easier to run a crates.io mirror. The mirror in particular that I'm working on would basically just mirror the crates.io API:

  • A cargo user would change their registry index URL in their .cargo/config to point to an alternate index
  • That index would be a clone of the official index with one extra commit changing the config.json to point to an alternate crates.io instance
  • The crates.io instance would have its S3_BUCKET set to crates-io and its S3_REGION set to us-west-1, so it would NOT be mirroring the actual crate files

The problem I hit was that in order to get a redirect from /crates/:crate_id/:version/download to S3, the crate and version needed to be in the database, because the current code would try to update the download counts.

I thought about creating a script that would load the contents of the index back into a database, but then it would have to keep that in sync as changes to the index happened, so I stepped back from that tactic and decided to try this.

This PR tries to increment the download count, but if that isn't possible for any reason, return the redirect to S3 anyway. Cargo checks the index it has before requesting the URL from the API anyway, so as long as the index is a (possibly lagging behind) mirror, cargo should not be requesting files that don't exist on S3. If it does, cargo will just say "failed to download package whatever from http://...s3". This might cause other problems to go unnoticed, but seeing as this is just the download count and any problems with the database would likely manifest in other ways, I think this is ok.

Is there a more idiomatic way to use the Result but ignore errors? I know that's not recommended in general, but that's really what I'm trying to do here :)

With this patch deployed to https://cratesio-mirror.herokuapp.com/, I'm able to set my .cargo/config to use https://gitlab.com/integer32llc/crates.io-index and install crates!

I'm totally open to other ideas if there's a better way to solve this problem!

@alexcrichton
Copy link
Member

Thanks for the PR @carols10cents! This sounds pretty reasonable to me, and I'm game! Exciting to see work on mirrors :)

I personally prefer to ignore results through either let _ = foo(); or just drop(foo()); with a comment indicating why the error is ignored.

I wonder though, perhaps just the failed-to-lookup error could be ignored, but things like db errors could still get plumbed through?

Also, as a side note, with source replacement now on nightly I believe it should be as simple as:

# in .cargo/config
[source.crates-io]
registry = 'https://github.com/rust-lang/crates.io-index'
replace-with = 'my-mirror'

[source.my-mirror]
registry = 'https://...' 

@alexcrichton
Copy link
Member

And with that as well Cargo will generate lock files that are interoperable with the default registry as well as mirror registries!

@carols10cents
Copy link
Member Author

I wonder though, perhaps just the failed-to-lookup error could be ignored, but things like db errors could still get plumbed through?

If we're a mirror, though, we'd still want to ignore all errors, including db errors, right? And send the download URL regardless?

I have this thought that I haven't researched the implications of yet, that some mirrors might not need to have a database at all.

And if we weren't acting as a mirror (in this case, that we do want to be counting downloads), then we actually don't want to ignore ANY errors from updating the download counts, right? :-/

It almost feels like we'd need to have a setting that said whether we were a mirror or not, but then we'd also want to be able to say what kind of mirror we were: are we trying to be an exact match of crates.io? A superset that allows additional (private) uploads? A subset that has crates that have been audited and approved?

This is where I start getting overwhelmed as to how much to start with, and how to implement a little bit at a time but be extensible :) I'd love ideas!

@carols10cents
Copy link
Member Author

I added a new commit to change to let _ = and have more of an explanation!

And with that as well Cargo will generate lock files that are interoperable with the default registry as well as mirror registries!

This is so awesome!!! My .cargo/config is now:

[source.mirror]
registry = "https://gitlab.com/integer32llc/crates.io-index"

[source.crates-io]
replace-with = "mirror"

and I can remove that, do a clean rebuild, and the Cargo.lock stays the same!!! 😻

@alexcrichton
Copy link
Member

Thanks!

So thinking about this a bit more, this means that if you're trying to download "crate-that-does-not-exist" at "version-that-does-not-exist", you'd get a 302 to go talk to S3? We may want to head that off quickly with a 404 (or at least a 200 with a JSON error like we do today), though?

Or at least crates.io itself may wish to do that even if mirrors do not

@carols10cents
Copy link
Member Author

So thinking about this a bit more, this means that if you're trying to download "crate-that-does-not-exist" at "version-that-does-not-exist", you'd get a 302 to go talk to S3? We may want to head that off quickly with a 404 (or at least a 200 with a JSON error like we do today), though?

Before we even get to that point, though, cargo looks in its locally checked out index and won't even make a request to this route if the crate+version isn't in its index. I'm not sure how/if/how likely it is to get into a situation where a crate+version IS in the index but ISN'T on s3... do you have any ideas?

Or at least crates.io itself may wish to do that even if mirrors do not

The problem I'm having is how to tell if we're crates.io or if we're a mirror. What mechanism would you suggest? Should I add another environment variable that's like... MIRROR?

@alexcrichton
Copy link
Member

Yeah it's true that Cargo itself may not send these requests to S3, but I'm also somewhat worried about other clients who download directly from crates.io for whatever reason (or future Cargo implementations perhaps). Maybe it's not worth worrying about them though?

I guess thinking a bit more about what this should be (in terms of mechanisms), what kind of mirror are you going for here? I believe that this is going to a mirror of the index, but not necessarily the crates themselves, right? Maybe a caching mirror would look different where it'd lazily fill in packages from S3 but it still would require everything to exist locally?

@carols10cents
Copy link
Member Author

For this step, I'm going for an API-only mirror. I thought this would be a good midpoint between what exists now and a mirror that lazily caches the packages to its own storage, but if you'd rather I proceed directly to trying to get a mirror like that working, I'd be into that :)

A lazy cache mirror still would have the same problem, in that when you hit the download route, it needs to have different behavior than crates.io does now, so how do we configure a mirror vs not a mirror?

@alexcrichton
Copy link
Member

Hm yeah so if we want to end up at just an API mirror then this makes sense to me (perhaps with a setting for crates.io to differ), but if that's not the end goal we'd end up changing this anyway, right?

I guess I'd expect a download route to behave with logic like:

  • Check if it's in the database
  • If so, we've probably got it cached somewhere or we've cached that it's not on crates.io
  • If not, we fetch it from crates.io inline and cache it

In any case though I'd be fine merging this for now with something like MIRROR=1 to enable it if it helps move forward!

@carols10cents
Copy link
Member Author

Yeah i have no idea if an API-only mirror is anything that's useful by itself! I mean, it would work, I just don't know if that's anything useful enough for anyone to run. (shrug)

This might fail because we don't have any crates in the database; we
just want to be an API mirror and pass along the redirect URL.
@carols10cents carols10cents force-pushed the remove-download-restriction branch from f54dceb to 4a7d270 Compare August 24, 2016 01:03
@carols10cents
Copy link
Member Author

Added the MIRROR=1 setting! :)

@alexcrichton
Copy link
Member

Looks good to me, thanks!

@alexcrichton alexcrichton merged commit b5d6f06 into rust-lang:master Aug 24, 2016
@shepmaster shepmaster deleted the remove-download-restriction branch January 16, 2017 00:15
bors added a commit that referenced this pull request Dec 18, 2019
….0.0, r=Turbo87

Bump ember-fetch from 6.5.1 to 7.0.0

Bumps [ember-fetch](https://github.com/ember-cli/ember-fetch) from 6.5.1 to 7.0.0.
<details>
<summary>Release notes</summary>

*Sourced from [ember-fetch's releases](https://github.com/ember-cli/ember-fetch/releases).*

> ## Release 7.0.0
> #### 💥 Breaking Change
> * [#292](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/292) Added a check to enforce top-level dependency [#290](https://github-redirect.dependabot.com/ember-cli/ember-fetch/issues/290). ([@&#8203;dnalagatla](https://github.com/dnalagatla))
> * [#348](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/348) Bump deps, drop node 6 ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> #### 📝 Documentation
> * [#405](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/405) Add release-it ([@&#8203;xg-wang](https://github.com/xg-wang))
> * [#399](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/399) Add section to the README for deprecations ([@&#8203;locks](https://github.com/locks))
>
> #### 🏠 Internal
> * [#405](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/405) Add release-it ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> #### Committers: 4
> - Dinesh Nalagatla ([@&#8203;dnalagatla](https://github.com/dnalagatla))
> - Ricardo Mendes ([@&#8203;locks](https://github.com/locks))
> - Thomas Wang ([@&#8203;xg-wang](https://github.com/xg-wang))
> - [[@&#8203;dependabot-preview](https://github.com/dependabot-preview)[bot]](https://github.com/apps/dependabot-preview)
>
> ## v6.7.2 (2019-11-03)
>
> #### 🐛 Bug Fix
> * [#372](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/372) fix: throwing w/ fresh ember-cli-fastboot serve ([@&#8203;xg-wang](https://github.com/x
> g-wang))
>
> #### Committers: 1
> - Thomas Wang ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> ## v6.7.1 (2019-09-12)
>
> #### 🐛 Bug Fix
> * [#358](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/358) Enable absolute url transform for Request in FastBoot ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> #### Committers: 2
> - Jan Bobisud ([@&#8203;bobisjan](https://github.com/bobisjan))
> - Thomas Wang ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> ## v6.7.0
> #### 🚀 Enhancement
> * [#303](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/303) Add fetch response and error utils ([@&#8203;BarryThePenguin](https://github.com/BarryThePenguin))
>
> #### Committers: 3
> - Jonathan Haines ([@&#8203;BarryThePenguin](https://github.com/BarryThePenguin))
> - Thomas Wang ([@&#8203;xg-wang](https://github.com/xg-wang))
> - [[@&#8203;dependabot-preview](https://github.com/dependabot-preview)[bot]](https://github.com/apps/dependabot-preview)
>
> ## v6.6.0
> #### 📝 Documentation
> * [#308](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/308) Deprecate ember-data adapter mixin ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> #### Committers: 2
></tr></table> ... (truncated)
</details>
<details>
<summary>Changelog</summary>

*Sourced from [ember-fetch's changelog](https://github.com/ember-cli/ember-fetch/blob/master/CHANGELOG.md).*

> ## v7.0.0 (2019-11-28)
>
> #### 💥 Breaking Change
> * [#292](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/292) Added a check to enforce top-level dependency [#290](https://github-redirect.dependabot.com/ember-cli/ember-fetch/issues/290). ([@&#8203;dnalagatla](https://github.com/dnalagatla))
> * [#348](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/348) Bump deps, drop node 6 ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> #### 📝 Documentation
> * [#405](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/405) Add release-it ([@&#8203;xg-wang](https://github.com/xg-wang))
> * [#399](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/399) Add section to the README for deprecations ([@&#8203;locks](https://github.com/locks))
>
> #### 🏠 Internal
> * [#405](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/405) Add release-it ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> #### Committers: 4
> - Dinesh Nalagatla ([@&#8203;dnalagatla](https://github.com/dnalagatla))
> - Ricardo Mendes ([@&#8203;locks](https://github.com/locks))
> - Thomas Wang ([@&#8203;xg-wang](https://github.com/xg-wang))
> - [[@&#8203;dependabot-preview](https://github.com/dependabot-preview)[bot]](https://github.com/apps/dependabot-preview)
>
> ## v6.7.2 (2019-11-03)
>
> #### 🐛 Bug Fix
> * [#372](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/372) fix: throwing w/ fresh ember-cli-fastboot serve ([@&#8203;xg-wang]
> g-wang))
>
> #### Committers: 1
> - Thomas Wang ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> ## v6.7.1 (2019-09-12)
>
> #### 🐛 Bug Fix
> * [#358](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/358) Enable absolute url transform for Request in FastBoot ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> #### Committers: 2
> - Jan Bobisud ([@&#8203;bobisjan](https://github.com/bobisjan))
> - Thomas Wang ([@&#8203;xg-wang](https://github.com/xg-wang))
>
> ## v6.7.0 (2019-07-08)
>
> #### 🚀 Enhancement
>
> - [#303](https://github-redirect.dependabot.com/ember-cli/ember-fetch/pull/303) Add fetch response and error utils ([@&#8203;BarryThePenguin](https://github.com/BarryThePenguin))
>
> #### Committers: 3
>
> - Jonathan Haines ([@&#8203;BarryThePenguin](https://github.com/BarryThePenguin))
> - Thomas Wang ([@&#8203;xg-wang](https://github.com/xg-wang))
> - [[@&#8203;dependabot-preview](https://github.com/dependabot-preview)[bot]](https://github.com/apps/dependabot-preview)
>
> ## v6.6.0 (2019-06-28)
></tr></table> ... (truncated)
</details>
<details>
<summary>Commits</summary>

- [`e525762`](ember-cli/ember-fetch@e525762) Release 7.0.0
- [`231f20d`](ember-cli/ember-fetch@231f20d) Add publishConfig.registry
- [`424cead`](ember-cli/ember-fetch@424cead) Merge pull request [#405](https://github-redirect.dependabot.com/ember-cli/ember-fetch/issues/405) from xg-wang/ri
- [`4e2fe99`](ember-cli/ember-fetch@4e2fe99) pick v6.7.2 and add release-it
- [`bac5db0`](ember-cli/ember-fetch@bac5db0) Merge pull request [#292](https://github-redirect.dependabot.com/ember-cli/ember-fetch/issues/292) from dnalagatla/dnalagatla/enforce-top-level-addon
- [`914f60a`](ember-cli/ember-fetch@914f60a) Bump @typescript-eslint/eslint-plugin from 2.8.0 to 2.9.0
- [`82a41f9`](ember-cli/ember-fetch@82a41f9) Bump ember-cli-babel from 7.12.0 to 7.13.0
- [`5b9720e`](ember-cli/ember-fetch@5b9720e) Bump ember-source from 3.14.1 to 3.14.2
- [`512fdf8`](ember-cli/ember-fetch@512fdf8) Bump @typescript-eslint/parser from 2.8.0 to 2.9.0
- [`a8eb84a`](ember-cli/ember-fetch@a8eb84a) Bump ember-try from 1.2.1 to 1.3.0
- Additional commits viewable in [compare view](ember-cli/ember-fetch@v6.5.1...v7.0.0)
</details>
<br />

[![Dependabot compatibility score](https://api.dependabot.com/badges/compatibility_score?dependency-name=ember-fetch&package-manager=npm_and_yarn&previous-version=6.5.1&new-version=7.0.0)](https://dependabot.com/compatibility-score.html?dependency-name=ember-fetch&package-manager=npm_and_yarn&previous-version=6.5.1&new-version=7.0.0)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

**Note:** This repo was added to Dependabot recently, so you'll receive a maximum of 5 PRs for your first few update runs. Once an update run creates fewer than 5 PRs we'll remove that limit.

You can always request more updates by clicking `Bump now` in your [Dependabot dashboard](https://app.dependabot.com).

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in the `.dependabot/config.yml` file in this repo:
- Update frequency
- Automerge options (never/patch/minor, and dev/runtime dependencies)
- Out-of-range updates (receive only lockfile updates, if desired)
- Security updates (receive only security updates, if desired)

</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants