Skip to content

/simple serves HTML that can't be parsed by Python's xml.etree if package has yanked releases #7886

@boegel

Description

@boegel

Describe the bug

Parsing HTML served by /simple endpoint results in xml.etree.ElementTree.ParseError.

Expected behavior

No parse error, as it was before when there were no yanked releases yet or with packages that don't have any yanked releases (yet).

To Reproduce

  • Python script test.py that contains:

    import requests
    from xml.etree import ElementTree
    simple_pip = requests.get('https://pypi.python.org/simple/pip')
    ElementTree.fromstring(simple_pip.text)
  • run it with python test.py, for example (on macOS):

    $ python3 test.py
    Traceback (most recent call last):
      File "test.py", line 4, in <module>
        ElementTree.fromstring(simple_pip.text)
      File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xml/etree/ElementTree.py", line 1315, in XML
      parser.feed(text)
    xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 143, column 306
    

The problem is the data-yanked part in lines like:

<a href="https://files.pythonhosted.org/packages/8c/5c/c18d58ab5c1a702bf670e0bd6a77cd4645e4aeca021c6118ef850895cc96/pip-20.0.tar.gz#sha256=5128e9a9401f1d16c1d15b2ed766a79d7813db1538428d0b0ce74838249e3a41" data-requires-python="&gt;=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*" data-yanked>pip-20.0.tar.gz</a><br/>

My Platform

  • macOS 10.15.4 with Python 2.7.16 or 3.7.7 (but same issue occurs on other platforms too)

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs discussiona product management/policy issue maintainers and users should discuss

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions