Skip to content

HTML5 permalinks are not permanent if section header starts with number #8709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
abitrolly opened this issue Jan 19, 2021 · 25 comments
Open

Comments

@abitrolly
Copy link

In pip Changelog slugs in html anchors are not permanently pointed to corresponding version. Instead, they are incremental position numbers, which start with #id1, so when new version of pip is released all anchors shift and start to point to a different version.

To Reproduce

#!/bin/bash

DOCDIR=testnumslug

rm -rf $DOCDIR
mkdir $DOCDIR

cat <<EOF > $DOCDIR/index.rst
Hi
==

1.2.0
-----

1.1.0
-----

1.0.0
-----
EOF

# Application error:
# config directory doesn't contain a conf.py file (testnumslug)
touch $DOCDIR/conf.py

sphinx-build $DOCDIR $DOCDIR/_html

echo -e "\n-----\n"

grep -R 'Permalink' $DOCDIR/_html/index.html

This gives the output.

<h1>Hi<a class="headerlink" href="#hi" title="Permalink to this headline">¶</a></h1>
<h2>1.2.0<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h2>
<h2>1.1.0<a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h2>
<h2>1.0.0<a class="headerlink" href="#id3" title="Permalink to this headline">¶</a></h2>

Expected behavior

<h1>Hi<a class="headerlink" href="#hi" title="Permalink to this headline">¶</a></h1>
<h2>1.2.0<a class="headerlink" href="#1.2.0" title="Permalink to this headline">¶</a></h2>
<h2>1.1.0<a class="headerlink" href="#1.1.0" title="Permalink to this headline">¶</a></h2>
<h2>1.0.0<a class="headerlink" href="#1.0.0" title="Permalink to this headline">¶</a></h2>

Environment info

  • Python version: 3.9.1
  • Sphinx version: 3.2.1

Additional context

@uranusjr
Copy link

uranusjr commented Jan 20, 2021

The reason to the current behaviour is likely due to the HTML4 spec:

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

So id="1.2.0" is technically invalid (although I suspect many browsers would handle it fine, since HTML5 loosened the restriction).

The behaviour still feels quite unintuitive to me, however. I would expect Sphinx to generate something more stable, such as id="id-1_2_0" instead.

With that said, you can always specify an explicit reference yourself:

Hi
==

.. _v1_2_0:

1.2.0
-----

.. _v1_1_0:

1.1.0
-----

.. _v1_0_0:

1.0.0
-----

This would always work regardless of the section title.

@abitrolly
Copy link
Author

Sphinx generates HTML5 by default since 2.0 #4587

@tk0miya
Copy link
Member

tk0miya commented Jan 20, 2021

It comes from the node ID generation rule of docutils; the core library of Sphinx. It was defined to support many kinds of formats.
https://repo.or.cz/docutils.git/blob/HEAD:/docutils/docutils/nodes.py#l2220

@abitrolly
Copy link
Author

What is the role of this function then?

sphinx/sphinx/util/nodes.py

Lines 435 to 439 in 3ed7590

def _make_id(string: str) -> str:
"""Convert `string` into an identifier and return it.
This function is a modified version of ``docutils.nodes.make_id()`` of
docutils-0.16.

@tk0miya
Copy link
Member

tk0miya commented Jan 20, 2021

It's a local ID generator for Sphinx domains. It does not relate to the section IDs.

@abitrolly
Copy link
Author

It looks like an override by import path. Although it doesn't make it any better.

In [2]: from docutils.nodes import make_id                                                                                                                                
In [3]: make_id('1.2.0')                                                                                                                                                  
Out[3]: ''
In [7]: from sphinx.util.nodes import _make_id                                                                                                                            
In [8]: _make_id('1.2.0')                                                                                                                                                 
Out[8]: ''

Is it possible to define a function html5_id(string: str) and delegate HTML id generation to it?

@madjxatw
Copy link

madjxatw commented Jan 21, 2021

The same behavior also goes with non-ASCII headers, producing idX. If a header consists of both ASCII and non-ASCII characters, all non-ASCII parts will be removed.

@abitrolly
Copy link
Author

@madjxatw understood. HTML5 removes all restrictions from IDs, which makes even these valid.

<p id="#">Foo.
<p id="##">Bar.
<p id="♥">Baz.
<p id="©">Inga.
<p id="{}">Lorem.
<p id="“‘’”">Ipsum.
<p id="⌘⌥">Dolor.
<p id="{}">Sit.
<p id="[attr=value]">Amet.
<p id="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.">Hello world!

https://mathiasbynens.be/notes/html5-id-class

@madjxatw
Copy link

@abitrolly, exactly, so Sphinx needs to implement a HTML5 version of make_id() to keep consistency with its default HTML5 output.

@abitrolly
Copy link
Author

@madjxatw not Sphinx, some human needs to sit down and write the code. While the code seems trivial, right now it is unclear where to place the code.

@madjxatw
Copy link

@abitrolly, hopefully some unicode slugifier (e.g. https://github.com/mozilla/unicode-slugify) could be used as an extension or be integrated somehow into Sphinx.

@tk0miya
Copy link
Member

tk0miya commented Jan 21, 2021

Sphinx has still supported HTML4 output. The HTML Help builder also depends on HTML4. In addition to this, I can't say the change does not affect other builders. Sphinx is not only for building HTML5.

@madjxatw
Copy link

@tk0miya, is it possible to have an option that lets users decide whether to enable unicode permalink?

@tk0miya
Copy link
Member

tk0miya commented Jan 21, 2021

I can't promise the option works fine for "all" builders. If I added it to Sphinx, I'll describe it as "it might work. But not promised. Please don't report us even if something broken" :-p
I think I can't provide such an option from the official. Please hack your own risk.

@madjxatw
Copy link

That's all right, it wouldn't be a big problem to hack it by ourselves , however it still sounds a bit sorry that unicode IDs is not officially supported, especially for those non-English (e.g. East Asian) writers who really need IDs with their own language characters. :-(

@idnsunset
Copy link

I don't know about the details about how Sphinx works internally, but couldn't a custom unicode ID maker be invoked only when the HTML5 builder is in use?

@tk0miya
Copy link
Member

tk0miya commented Jan 21, 2021

I don't know about the details about how Sphinx works internally, but couldn't a custom unicode ID maker be invoked only when the HTML5 builder is in use?

It's diffucult. The node IDs are generated in the reading phase. The result of the phase is cached and used for incremental builds. It means introducing the new ID breaks the incremental build feature.

@abitrolly
Copy link
Author

It's diffucult. The node IDs are generated in the reading phase. The result of the phase is cached and used for incremental builds. It means introducing the new ID breaks the incremental build feature.

But HTML IDs should not be node IDs. Is it possible during initial read to generate IDs in a structure that allows to output a proper HTML5 slug on writing? Like if ID is autogenerated from the title, store the title.

@tk0miya
Copy link
Member

tk0miya commented Jan 23, 2021

But HTML IDs should not be node IDs. Is it possible during initial read to generate IDs in a structure that allows to output a proper HTML5 slug on writing? Like if ID is autogenerated from the title, store the title.

Of course, it's possible if you give a wonderful patch! (IMO, it's impossible to me as I commented "it's difficult" above).

@abitrolly
Copy link
Author

@tk0miya what do you mean by "if's difficult"? It could help if you can point to locations where Sphinx reads and caches node ID, and where to insert write_html5_id` call.

@tk0miya
Copy link
Member

tk0miya commented Jan 23, 2021

The cross-referencing system of Sphinx has been based on the node IDs. So I can't imagine how we replace it by unicode IDs. I guess we need to rewrite whole of docutils and Sphinx. So I can't tell you where to do that.

@abitrolly
Copy link
Author

@tk0miya the idea is not about replacing internal node IDs. It is about writing IDs in HTML5 format on output to HTML5. All IDs written this way will be consistent.

@tk0miya
Copy link
Member

tk0miya commented Jan 23, 2021

I don't know how to do that. But all contributions are welcome!

@abitrolly
Copy link
Author

I am afraid I can go on only with funded contributions. Learning a codebase like this in my free time is not sustainable. A pity that this seemingly simple generator turned out to be that complex on the inside.

@gmilde
Copy link

gmilde commented Nov 9, 2021

It comes from the node ID generation rule of docutils; the core library of Sphinx. It was defined to support many kinds of formats.
The rationale and details of this design decision are explained in https://docutils.sourceforge.io/docs/ref/rst/directives.html#identifier-normalization
There is an open feature request for less restrictive IDs
For the original question: setting an id-prefix will keep permanent IDs on section headings starting with a number since Docutils 0.18, so this can provide a workaround in future Sphinx versions.

@abitrolly abitrolly changed the title Permalinks are not permanent if section header starts with number HTML5 permalinks are not permanent if section header starts with number Nov 10, 2021
@AA-Turner AA-Turner added this to the some future version milestone Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants