Skip to content

Requests from urllib and requests library have long delays #95838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kgs10 opened this issue Aug 9, 2022 · 6 comments
Closed

Requests from urllib and requests library have long delays #95838

kgs10 opened this issue Aug 9, 2022 · 6 comments
Labels
OS-mac type-bug An unexpected behavior, bug, or error

Comments

@kgs10
Copy link

kgs10 commented Aug 9, 2022

Bug report

On multiple MacOS systems running 12.4, the HTTP requests fail to establish on the initial connection attempt, blocking on sock.connect.

Your environment

  • CPython versions tested on:
    • Python 3.8.10
    • Python 3.6.8
    • Python 3.9.13
  • Operating system and architecture:
    • MacOS intel version 12.4
  • Request package version 2.28.1

Data

Requests made to a site will consistently take over one minute to complete, as shown below with a few examples. This doesn't matter if the requests or urllib library is used.


In [2]: time requests.get('http://www.example.com')
CPU times: user 6.4 ms, sys: 4.05 ms, total: 10.5 ms
Wall time: 1min 15s
Out[2]: <Response [200]>

In [3]: time requests.get('https://www.google.com')
CPU times: user 21.9 ms, sys: 7.68 ms, total: 29.5 ms
Wall time: 1min 15s
Out[3]: <Response [200]>

In [4]: import urllib
In [4]: time urllib.request.urlopen('https://google.com')
CPU times: user 26.4 ms, sys: 5.01 ms, total: 31.4 ms
Wall time: 2min 30s
Out[5]: <http.client.HTTPResponse at 0x7ff0d0b46e50>

But then, if I add a timeout, it's timeout+request time.

In [6]: time requests.get('https://www.google.com', timeout=1)
CPU times: user 16.4 ms, sys: 3.21 ms, total: 19.6 ms
Wall time: 1.38 s
Out[6]: <Response [200]>

If I get a traceback during one of these long deadlocks I have the following trace.

In [6]: requests.get('https://www.google.com')
^C---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 requests.get('https://www.google.com')

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/api.py:73, in get(url, params, **kwargs)
     62 def get(url, params=None, **kwargs):
     63     r"""Sends a GET request.
     64
     65     :param url: URL for the new :class:`Request` object.
   (...)
     70     :rtype: requests.Response
     71     """
---> 73     return request("get", url, params=params, **kwargs)

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/api.py:59, in request(method, url, **kwargs)
     55 # By using the 'with' statement we are sure the session is closed, thus we
     56 # avoid leaving sockets open which can trigger a ResourceWarning in some
     57 # cases, and look like a memory leak in others.
     58 with sessions.Session() as session:
---> 59     return session.request(method=method, url=url, **kwargs)

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/sessions.py:587, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    582 send_kwargs = {
    583     "timeout": timeout,
    584     "allow_redirects": allow_redirects,
    585 }
    586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
    589 return resp

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/sessions.py:701, in Session.send(self, request, **kwargs)
    698 start = preferred_clock()
    700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
    703 # Total elapsed time of the request (approximately)
    704 elapsed = preferred_clock() - start

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/adapters.py:489, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    487 try:
    488     if not chunked:
--> 489         resp = conn.urlopen(
    490             method=request.method,
    491             url=url,
    492             body=request.body,
    493             headers=request.headers,
    494             redirect=False,
    495             assert_same_host=False,
    496             preload_content=False,
    497             decode_content=False,
    498             retries=self.max_retries,
    499             timeout=timeout,
    500         )
    502     # Send the request.
    503     else:
    504         if hasattr(conn, "proxy_pool"):

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/urllib3/connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    700     self._prepare_proxy(conn)
    702 # Make the request on the httplib connection object.
--> 703 httplib_response = self._make_request(
    704     conn,
    705     method,
    706     url,
    707     timeout=timeout_obj,
    708     body=body,
    709     headers=headers,
    710     chunked=chunked,
    711 )
    713 # If we're going to release the connection in ``finally:``, then
    714 # the response doesn't need to know about the connection. Otherwise
    715 # it will also try to release it and we'll have a double-release
    716 # mess.
    717 response_conn = conn if not release_conn else None

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/urllib3/connectionpool.py:386, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    384 # Trigger any extra validation we need to do.
    385 try:
--> 386     self._validate_conn(conn)
    387 except (SocketTimeout, BaseSSLError) as e:
    388     # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.
    389     self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/urllib3/connectionpool.py:1042, in HTTPSConnectionPool._validate_conn(self, conn)
   1040 # Force connect early to allow us to validate the connection.
   1041 if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
-> 1042     conn.connect()
   1044 if not conn.is_verified:
   1045     warnings.warn(
   1046         (
   1047             "Unverified HTTPS request is being made to host '%s'. "
   (...)
   1052         InsecureRequestWarning,
   1053     )

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/urllib3/connection.py:358, in HTTPSConnection.connect(self)
    356 def connect(self):
    357     # Add certificate verification
--> 358     self.sock = conn = self._new_conn()
    359     hostname = self.host
    360     tls_in_tls = False

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/urllib3/connection.py:174, in HTTPConnection._new_conn(self)
    171     extra_kw["socket_options"] = self.socket_options
    173 try:
--> 174     conn = connection.create_connection(
    175         (self._dns_host, self.port), self.timeout, **extra_kw
    176     )
    178 except SocketTimeout:
    179     raise ConnectTimeoutError(
    180         self,
    181         "Connection to %s timed out. (connect timeout=%s)"
    182         % (self.host, self.timeout),
    183     )

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/urllib3/util/connection.py:85, in create_connection(address, timeout, source_address, socket_options)
     83     if source_address:
     84         sock.bind(source_address)
---> 85     sock.connect(sa)
     86     return sock
     88 except socket.error as e:

KeyboardInterrupt:

Last bits of data

This has been a problem on two of my work machines -- freshly reprovisioned. This is not a problem on my Windows Desktop, or Ubuntu server. Lastly, I had two colleagues both running MacOS and they do not have this problem. This is a problem with and without the IDE.

I'm unsure if this is a problem with a package or something else I have installed.

@kgs10 kgs10 added the type-bug An unexpected behavior, bug, or error label Aug 9, 2022
@JelleZijlstra
Copy link
Member

Do network requests using other mechanism work? (e.g., curl, a browser, other languages' web stacks)

This sounds like a problem with your network stack. There's also some chance it's something IPv6-specific; you can try to figure out whether it's trying to use an IPv6 or IPv4 address.

@kgs10
Copy link
Author

kgs10 commented Aug 9, 2022

Ahh thank you, I forgot to add that info. Yes, curl works.

$ time curl -sq 'https://google.com' > /dev/null
curl -sq 'https://google.com' > /dev/null  0.02s user 0.01s system 5% cpu 0.455 total

As for the IPv4/IPv6, that's an interesting thought. I'll try that out and get back to this forum.

@zware zware added the OS-mac label Aug 9, 2022
@kgs10
Copy link
Author

kgs10 commented Aug 9, 2022

@JelleZijlstra, quick testing is concluding that this is indeed something to do with IPv6 or possibly with running in dual stack. The below results are running with only an IPv4 address.

In [17]: time requests.get('https://www.google.com')
CPU times: user 15.1 ms, sys: 2.37 ms, total: 17.4 ms
Wall time: 160 ms
Out[17]: <Response [200]>

I will continue working on scoping this to see if this is a problem with Python or networking equipment. Any guidance or reading material I could use to debug this further on Python side? I hit a wall once I found out it was stuck at sock.connect. I generally don't go this low.

@JelleZijlstra
Copy link
Member

I don't have great ideas, but the past experience that led me to mention IPv6 is that I've seen something similar inside AWS machines: IPv4 is fine but IPv6 requests hang. You say you're on a Mac, so you're probably not in AWS, but maybe your network stack is having something similar happen. A useful datapoint could be whether IPv6 connections also hang when you use other tools, such as curl.

@ronaldoussoren
Copy link
Contributor

#88810 might be relevant here, that proposes to add an implementation of the Hapyy Eyeballs algorithm in non-asyncio socket creation to improve performance in dual-stack environments.

@kgs10
Copy link
Author

kgs10 commented Aug 10, 2022

After additional testing, I have been able to conclude that this is something specific with my AnyConnect VPN back to HQ. I'll work with IT to see what policy they have which could be blackholing IPv6 traffic.

When I'm not on VPN, everything works fine.

I'll close this out, but to @ronaldoussoren's point, implementing the Happy Eyeballs algorithm would mitigate against this. If that can't be done, having a way to alert the user of what's happening would be good. Especially since duel stack is only going to become more popular in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-mac type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants