Skip to content

static.crates.io cache control #1826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kornelski opened this issue Sep 5, 2019 · 8 comments · Fixed by #1871
Closed

static.crates.io cache control #1826

kornelski opened this issue Sep 5, 2019 · 8 comments · Fixed by #1871

Comments

@kornelski
Copy link
Contributor

Crate files on static.crates.io don't have any explicit cache lifetime set. Please consider adding caching information, such as Cache-control: max-age=31536000, immutable.

Even though Cargo itself has a non-HTTP cache and doesn't use that information, it would be helpful for users who put a caching proxy in front of crates.io infrastructure.

curl -I 'https://static.crates.io/crates/winapi/winapi-0.2.5.crate'
HTTP/2 200
content-type: application/x-tar
content-length: 478113
date: Thu, 05 Sep 2019 13:21:39 GMT
last-modified: Mon, 09 Nov 2015 21:19:48 GMT
etag: "83c818a44aed9494e7a9f8fa52acb789"
x-amz-version-id: null
accept-ranges: bytes
server: AmazonS3
x-cache: Miss from cloudfront
via: 1.1 21e3028cc4bd3f3af310d648f3748004.cloudfront.net (CloudFront)
x-amz-cf-pop: LHR61-C2
x-amz-cf-id: Qp_nTqbJJXOt85twQ53LRqUcMPu00Vl_D3cajrAItIgF-h-6jtFSBQ==
@smarnach
Copy link
Contributor

smarnach commented Sep 6, 2019

This would also allow Cloudfront to store crates for a longer time, so it may speed up some downloads in some parts of the world.

@sgrif
Copy link
Contributor

sgrif commented Sep 11, 2019

This definitely sounds like it's worth doing, but updating old crates basically requires a script to go through and tell AWS to copy the file to itself overriding metadata -- which is fairly manual and I'll want to spot check a few dozen times as running that is relatively high risk.

That said, I'd be happy to see a pull request which added this header to all new publishes

@jtgeibel
Copy link
Member

This should be safe to add to new publishes now, but I think we will need to make sure we've cleaned up any other issues with the headers before applying this to all old crates.

The main thing I remember is any remaining crates with Content-Encoding: gzip, as we wouldn't want to cache those bad headers for so long. See #1179 for some examples. I'm guessing that crates containing build metadata in their version (after the +) may have been missed in the initial cleanup that was done.

@smarnach
Copy link
Contributor

I've done a quick consistency check for the headers Cloudfront currently serves for all crate versions:

So only five crates in total have any kind of deviation in their headers. (For two crates, all versions weirdly have the content type binary/octet-stream. They aren't even that old.)

It should be possible to update all crates in one go by using the AWS web console for the update. The steps should be

  1. Navigate to the bucket in the web console.
  2. Right-click the crates folder and select "Change metadata".
  3. Set both "Cache-Control" and "Content-Type" and "Content-Encoding" to the correct values.
  4. Click "Save".

This updates the selected headers recursively for all crate files, and does not touch any other headers or permissions. It's also possible to do the same thing with a single invocation of the aws command-line tool, but I personally find this less transparent for this particular use case, and probably more error-prone.

I believe performing the above steps would both fix all remaining issues with the current metadata and set the Cache-Control header, though we probably want to do this in separate steps to be a bit more conservative. I manually tested this in my private AWS account.

@smarnach
Copy link
Contributor

smarnach commented Nov 5, 2019

@pietroalbini I'm not sure we want to set the TTL for CloudFront distributions as you did in the simpleinfra repo. This has several downsides:

  • It only affects the time things are cached in CloudFront, but does not result in a cache-control header being set.
  • If we do set a cache-control header, it will be disregarded by CloudFront, and CloudFront may cache the objects for a different amount of time.
  • We can't control the caching times from the application side anymore. I'd much rather have the logic to decide how long things are cached in the upload requests in the crates.io repo than in the simpleinfra repo where it is hard to discover and change for people on the crates.io team.

So I'd prefer if we can simple leave the TTL at its default, and leave it up to the uploading code to set reasonable headers. we can then go ahead with #1871, even though we would still need to update all existing objects in S3.

@pietroalbini
Copy link
Member

@smarnach good point, changed the terraform configuration to leave default TTLs for static.

@smarnach
Copy link
Contributor

smarnach commented Nov 6, 2019

@pietroalbini Thanks, looks good to me! This also solves the issues with readme files mentioned in the last meeting (specifically that we shouldn't cache them for ten years, since we may want to regenerate them).

@smarnach
Copy link
Contributor

Hm, this issue was closed by merging #1871, but I'm not sure we are completely done. We limited the scope of #1871 to just crate files, but we should also do something for readmes and database dumps. I'll file separate issues for both of these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants