Skip to content

downloading of large files fails with urllib.request with recent Python 3.x #3455

@zao

Description

@zao

In download_file we use urllib.request, which seems to throw an error in url_fd.read() when trying to read the whole file at once.

It can be reproduced with this script:

import urllib.request

x = urllib.request.urlopen('https://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run')
x.read()
Traceback (most recent call last):
  File "./foo.py", line 5, in <module>
    x.read()
  File "/usr/lib/python3.8/http/client.py", line 467, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python3.8/http/client.py", line 608, in _safe_read
    data = self.fp.read(amt)
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
OverflowError: signed integer is greater than maximum

While urllib is bugged like this, we need to either read it in chunks ourselves or skip the naive read and combine it with the subsequent write to file via shutil.copyfileobj:

with open('/dev/shm/out.raw', 'wb') as fh:
        shutil.copyfileobj(x, fh)

As a bonus, we won't be reading the whole thing into memory, possibly exhausting it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions