Skip to content

further thoughts about pip as an HTML5 reader #10880

Open
@ppena-LiveData

Description

@ppena-LiveData

Description

As Donald Stufft pointed out, the HTML5 spec has different requirements for HTML5 writers versus parsers, and in fact, the 13.1 Writing HTML documents part of the spec explicitly says that the section "does not apply to conformance checkers", so we were looking at the wrong section when we added a DOCTYPE regex to handle_decl(). If pip is going to continue to want to validate the HTML5 (i.e. be a conformance checker, which some of us have said should not be PIP's job, and which @dstufft said his PEP 503 is ambiguous about), then the 13.2 Parsing HTML documents section is what needs to be followed.

It is a complicated spec, so it will take some careful reading to figure out what exactly is required for HTML5 validation, but one thing I found is that the parser also allows for "PUBLIC" to be in the <!DOCTYPE ...>, which is not mentioned for the writer, so that is definitely a problem in the regex.

Expected behavior

When using a Simple Repository API repo that outputs a <!DOCTYPE ...> that has "PUBLIC" in it. Expected behavior is not to get a warning saying "does not have a proper HTML doctype declaration," but pip would give that warning.

pip version

22.0.3

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions