Skip to content

Use rel=canonical to point to the latest version #1006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jyn514 opened this issue Aug 26, 2020 · 19 comments
Closed

Use rel=canonical to point to the latest version #1006

jyn514 opened this issue Aug 26, 2020 · 19 comments
Labels
A-frontend Area: Web frontend P-medium Medium priority

Comments

@jyn514
Copy link
Member

jyn514 commented Aug 26, 2020

Uses the same mechanism as #74, but for a very different purpose.

A common problem using docs.rs is that it's hard to find the latest version of the docs from Google. We mitigate this by having a 'go to latest version' button, but if you land on a search result halfway down the page, the header bar isn't shown. A possible way to get better results is to abuse rel=canonical to point to /:crate/latest/:path so it will be picked up by Google.

It'd be feasible to put rel=canonical in the <head> since we inject styles there already.

#74 should be implemented before this to avoid telling Google we are more canonical than the original documentation.

@jyn514 jyn514 added A-frontend Area: Web frontend P-medium Medium priority labels Aug 26, 2020
@jsha
Copy link
Contributor

jsha commented Oct 24, 2020

Linking #1120 as related. I'll note that the "go to latest version" link is part of the menu bar, which is pinned to the top of the browser window, so it's always visible even when mid-page. On mobile it's just a yellow triangle with no link, which is a bit confusing.

@jsha
Copy link
Contributor

jsha commented Nov 11, 2020

Another thought on this: Right now, even if you click "go to latest version," you'll get a redirect to a page with an explicit version in the URL. I think before working with rel=canonical, which is hard to see if it's working, it would be beneficial to make it so the latest link for each crate (e.g. https://docs.rs/ureq/latest/ureq/) doesn't redirect to a specific version, but loads a copy of the latest docs for that crate. That way each crate's docs have a stable URL that can gradually accumulate ranking. I suspect Google will never consider a URL canonical if it always redirects somewhere else.

@jyn514
Copy link
Member Author

jyn514 commented Nov 11, 2020

Make it so the latest link for each crate (e.g. https://docs.rs/ureq/latest/ureq/) doesn't redirect to a specific version, but loads a copy of the latest docs for that crate

I'm really hesistant to do this, that will mean the caching from CloudFront is wrong and won't be invalidated when a new release is published.

@pietroalbini
Copy link
Member

We could do a CloudFront invalidation of /{crate}/latest/* every time a new version is published.

@jsha
Copy link
Contributor

jsha commented Nov 11, 2020

Here's the pricing on that, which seems reasonable: https://aws.amazon.com/cloudfront/pricing/

No additional charge for the first 1,000 paths requested for invalidation each month. Thereafter, $0.005 per path requested for invalidation.

And from https://aws.amazon.com/blogs/aws/simplified-multiple-object-invalidation-for-amazon-cloudfront/:

An invalidation path that includes the “*” character incurs the same charge as one that does not.

@jsha
Copy link
Contributor

jsha commented Nov 11, 2020

By the way, this approach would help with another developer experience issue: For crates I use a lot, I often find them by opening a new Chrome tab and typing a few characters of the name, then selecting them from the history portion of the Omnibox. Right now that takes me to a specific version - but it becomes outdated once a new version is released, requiring an extra click to get to the latest version. And because Omnibox scores by number of visits, it can take a while before the latest version is the top choice. If most user navigations go to the "latest" URL, their autocomplete suggestions are more usable.

@jyn514
Copy link
Member Author

jyn514 commented Nov 11, 2020

Right, but it also means that links to docs.rs/:crate/latest/module/x.html can break silently when a new version is published. I guess people can still use a specific version if they want to, though? I'd want to first fix the bug where clicking on the version number takes you to the root of the crate instead of the page you were on.

@jsha
Copy link
Contributor

jsha commented Nov 11, 2020

Those links should be mostly stable across semver minor and patchlevel revisions, since removing or renaming an item would be an API break.

@jyn514
Copy link
Member Author

jyn514 commented Nov 11, 2020

latest doesn't care about patch versions though - it always goes to the latest version that's not a pre-release.

I'm not saying we shouldn't do this, but we should take some care to make sure it doesn't make the user experience worse.

@jsha
Copy link
Contributor

jsha commented Nov 11, 2020

latest doesn't care about patch versions though - it always goes to the latest version that's not a pre-release.

Yep, that makes sense. To flesh out the idea more: The "stable" URL for any given major version could be like docs.rs/:crate/v2/module/x.html (representing the latest release within that major version), with docs.rs/:crate/latest redirecting to to /:crate/vX, where X is the latest major version.

Another approach would be to not worry about major/minor versions, and offer permalink icons next to all internal anchors. The permalink would go to whatever the latest version is at that moment in time. This is similar to what GitHub does when you highlight a specific line of code and it offers a permalink in the ... menu to the left.

@jyn514
Copy link
Member Author

jyn514 commented Nov 11, 2020

This is similar to what GitHub does when you highlight a specific line of code and it offers a permalink in the ... menu to the left.

OMG I'm so mad, I've been doing that by scrolling to the top, opening the commit in a new tab, and pasting the commit in place of master. That's so much better!

Yeah, opt-in permalinks sound like the right way to do this.

@jsha
Copy link
Contributor

jsha commented Nov 11, 2020

OMG I'm so mad, I've been doing that by scrolling to the top, opening the commit in a new tab, and pasting the commit in place of master. That's so much better!

Yay, so glad to have introduced you to this! There's even a keyboard shortcut: y.

@pietroalbini
Copy link
Member

I also think opt-in permalink is better.

@thombles
Copy link

Is this really a valid use of rel="canonical"? My interpretation of the RFC is that "duplicative" content would be materially the same, which different versions of modules or types aren't, really.

A wild suggestion as an outsider - what if docs.rs/crate was always the latest version, and docs.rs/archive/<crate>/<ver> was where every version was published? Then one could use robots.txt to exclude /archive from search engines. Granted it's occasionally useful to locate outdated type names via a search engine. My opinion is the trade-off would be worth it.

@jsha
Copy link
Contributor

jsha commented Feb 19, 2021

I agree rel="canonical" is not quite the right fit for this, and I like the idea of docs.rs/<crate> always being the latest version. I don't think it's necessary to exclude older versions via robots.txt. As you mention, it can be useful to have old versions searchable. And I think search engines will quickly learn that docs.rs/<crate> is the "most important" URL for <crate>.

@jsha
Copy link
Contributor

jsha commented Apr 15, 2021

I'm really hesistant to do this, that will mean the caching from CloudFront is wrong and won't be invalidated when a new release is published.

I was looking at the caching story here, and I think we actually wouldn't need to change anything. Looking at https://docs.rs/ureq/2.1.0/ureq/, for instance: that URL itself (i.e. the HTML page) doesn't have a Cache-Control header and consistently returns x-cache: Miss from cloudfront. So the HTML pages aren't being cached today; the same would be true if there was a single "latest" URL.

For subresources - almost everything is per-rustdoc-version, not per-crate-version. That's all cached, and its caching won't be affected. The one exception is search-index{...}.js. That's per-crate-version, but also has a unique URL for each crate version. For example, https://docs.rs/ureq/2.1.0/search-index-20210324-1.53.0-nightly-07e0e2ec2.js. And since the URL for the search index is part of the HTML page, it will get updated correctly on new builds and doesn't need to worry about caching.

@apps4uco
Copy link

apps4uco commented Jun 9, 2021

Hi, Id just like to add my 2 cents

On the Part of the page with Versions the previous versions link href could have rel="nofollow" or "noindex nofollow" that would indicate that they are not popular.

Also as far as I know to add rel="canonical" to the latest version is just a hint to search engines, so it should not break anything.

A similar issue was first reported in 2013 rust-lang/rust#9461 (and is still open)

I believe implementing this would improve the users experience, ie you get to see the latest documentation by default.

Also, it would reduce the load on the server as users wouldn't land on one page only to click on the link to the latest version of the documentation. (Also you would reduce the load of bots indexing outdated documentation)

Thanks

@jsha
Copy link
Contributor

jsha commented Jun 12, 2021

I'd like to propose renaming this issue to "make latest version top result in search engines." The original proposal was to use rel=canonical for this purpose, but from subsequent research I think that won't work. The overall goal is findability from search engines.

@jsha
Copy link
Contributor

jsha commented Oct 21, 2022

This is done! See also #1438.

@jsha jsha closed this as completed Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-frontend Area: Web frontend P-medium Medium priority
Projects
None yet
Development

No branches or pull requests

5 participants