Skip to content

Commit fb92f42

Browse files
committed
pythonGH-125866: RFC8089 file URIs in urllib.request
Adjust `urllib.request.pathname2url` and `url2pathname()` to generate and accept file URIs as described in RFC8089. `pathname2url()` gains a new *include_scheme* argument, which defaults to false. When set to true, the returned URL includes a `file:` prefix. `url2pathname()` now automatically removes a `file:` prefix if present. On Windows, `pathname2url()` now generates URIs that begin with two slashes rather than four when given a UNC path. On other platforms, `pathname2url()` now generates URIs that begin with three slashes rather than one when given an absolute path. `url2pathname()` now performs the opposite transformation, so `file:///etc/hosts` becomes `/etc/hosts`. Furthermore, `url2pathname()` now ignores local hosts (like "localhost" or any alias) and raises `URLError` for non-local hosts.
1 parent 6742f14 commit fb92f42

File tree

6 files changed

+217
-55
lines changed

6 files changed

+217
-55
lines changed

Doc/library/urllib.request.rst

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -147,18 +147,33 @@ The :mod:`urllib.request` module defines the following functions:
147147
attribute to modify its position in the handlers list.
148148

149149

150-
.. function:: pathname2url(path)
150+
.. function:: pathname2url(path, include_scheme=False)
151151

152-
Convert the pathname *path* from the local syntax for a path to the form used in
153-
the path component of a URL. This does not produce a complete URL. The return
154-
value will already be quoted using the :func:`~urllib.parse.quote` function.
152+
Convert the local pathname *path* to a percent-encoded URL. If
153+
*include_scheme* is false (the default), the URL is returned without a
154+
``file:`` scheme prefix; set this argument to true to generate a complete
155+
URL.
155156

157+
.. versionchanged:: 3.14
158+
The *include_scheme* argument was added.
156159

157-
.. function:: url2pathname(path)
160+
.. versionchanged:: 3.14
161+
Generates :rfc:`8089`-compliant file URLs for absolute paths. URLs for
162+
UNC paths on Windows systems begin with two slashes (previously four.)
163+
URLs for absolute paths on non-Windows systems begin with three slashes
164+
(previously one.)
165+
166+
167+
.. function:: url2pathname(url)
168+
169+
Convert the percent-encoded *url* to a local pathname.
170+
171+
.. versionchanged:: 3.14
172+
Supports :rfc:`8089`-compliant file URLs. Raises :exc:`URLError` if a
173+
scheme other than ``file:`` is used. If the URL uses a non-local
174+
authority, then on Windows a UNC path is returned, and on other
175+
platforms a :exc:`URLError` exception is raised.
158176

159-
Convert the path component *path* from a percent-encoded URL to the local syntax for a
160-
path. This does not accept a complete URL. This function uses
161-
:func:`~urllib.parse.unquote` to decode *path*.
162177

163178
.. function:: getproxies()
164179

Doc/whatsnew/3.14.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -447,6 +447,28 @@ unittest
447447
(Contributed by Jacob Walls in :gh:`80958`.)
448448

449449

450+
urllib.request
451+
--------------
452+
453+
* Improve support for ``file:`` URIs in :mod:`urllib.request`:
454+
455+
* :func:`~urllib.request.pathname2url` accepts a *include_scheme*
456+
argument, which defaults to false. When set to true, a complete URL
457+
with a ``file:`` prefix is returned.
458+
* :func:`~urllib.request.url2pathname` discards a ``file:`` prefix if given.
459+
* On Windows, :func:`~urllib.request.pathname2url` generates URIs that
460+
begin with two slashes (rather than four) when given a UNC path.
461+
* On non-Windows platforms, :func:`~urllib.request.pathname2url` generates
462+
URIs that begin with three slashes (rather than one) when given an
463+
absolute path. :func:`~urllib.request.url2pathname` performs the opposite
464+
transformation, so ``file:///etc/hosts` becomes ``/etc/hosts``.
465+
* On non-Windows platforms, :func:`~urllib.request.url2pathname` raises
466+
:exc:`urllib.error.URLError` if the URI includes a non-local authority,
467+
like ``file://other-machine/etc/hosts``.
468+
469+
(Contributed by Barney Gale in :gh:`125866`.)
470+
471+
450472
.. Add improved modules above alphabetically, not here at the end.
451473
452474
Optimizations

Lib/nturl2path.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
"""Convert a NT pathname to a file URL and vice versa.
22
3-
This module only exists to provide OS-specific code
3+
This module previously provided OS-specific code
44
for urllib.requests, thus do not use directly.
55
"""
6-
# Testing is done through test_urllib.
6+
# Testing is done through test_nturl2path.
77

88
def url2pathname(url):
99
"""OS-specific conversion from a relative URL of the 'file' scheme

Lib/test/test_nturl2path.py

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
import nturl2path
2+
import unittest
3+
import urllib.parse
4+
5+
6+
class nturl2path_Tests(unittest.TestCase):
7+
"""Test pathname2url() and url2pathname()"""
8+
9+
def test_basic(self):
10+
# Make sure simple tests pass
11+
expected_path = "parts\\of\\a\\path"
12+
expected_url = "parts/of/a/path"
13+
result = nturl2path.pathname2url(expected_path)
14+
self.assertEqual(expected_url, result,
15+
"pathname2url() failed; %s != %s" %
16+
(result, expected_url))
17+
result = nturl2path.url2pathname(expected_url)
18+
self.assertEqual(expected_path, result,
19+
"url2pathame() failed; %s != %s" %
20+
(result, expected_path))
21+
22+
def test_quoting(self):
23+
# Test automatic quoting and unquoting works for pathnam2url() and
24+
# url2pathname() respectively
25+
given = "needs\\quot=ing\\here"
26+
expect = "needs/%s/here" % urllib.parse.quote("quot=ing")
27+
result = nturl2path.pathname2url(given)
28+
self.assertEqual(expect, result,
29+
"pathname2url() failed; %s != %s" %
30+
(expect, result))
31+
expect = given
32+
result = nturl2path.url2pathname(result)
33+
self.assertEqual(expect, result,
34+
"url2pathname() failed; %s != %s" %
35+
(expect, result))
36+
given = "make sure\\using_quote"
37+
expect = "%s/using_quote" % urllib.parse.quote("make sure")
38+
result = nturl2path.pathname2url(given)
39+
self.assertEqual(expect, result,
40+
"pathname2url() failed; %s != %s" %
41+
(expect, result))
42+
given = "make+sure/using_unquote"
43+
expect = "make+sure\\using_unquote"
44+
result = nturl2path.url2pathname(given)
45+
self.assertEqual(expect, result,
46+
"url2pathname() failed; %s != %s" %
47+
(expect, result))
48+
49+
def test_pathname2url(self):
50+
# Test special prefixes are correctly handled in pathname2url()
51+
fn = nturl2path.pathname2url
52+
self.assertEqual(fn('\\\\?\\C:\\dir'), '///C:/dir')
53+
self.assertEqual(fn('\\\\?\\unc\\server\\share\\dir'), '/server/share/dir')
54+
self.assertEqual(fn("C:"), '///C:')
55+
self.assertEqual(fn("C:\\"), '///C:')
56+
self.assertEqual(fn('C:\\a\\b.c'), '///C:/a/b.c')
57+
self.assertEqual(fn('C:\\a\\b%#c'), '///C:/a/b%25%23c')
58+
self.assertEqual(fn('C:\\a\\b\xe9'), '///C:/a/b%C3%A9')
59+
self.assertEqual(fn('C:\\foo\\bar\\spam.foo'), "///C:/foo/bar/spam.foo")
60+
# Long drive letter
61+
self.assertRaises(IOError, fn, "XX:\\")
62+
# No drive letter
63+
self.assertEqual(fn("\\folder\\test\\"), '/folder/test/')
64+
self.assertEqual(fn("\\\\folder\\test\\"), '////folder/test/')
65+
self.assertEqual(fn("\\\\\\folder\\test\\"), '/////folder/test/')
66+
self.assertEqual(fn('\\\\some\\share\\'), '////some/share/')
67+
self.assertEqual(fn('\\\\some\\share\\a\\b.c'), '////some/share/a/b.c')
68+
self.assertEqual(fn('\\\\some\\share\\a\\b%#c\xe9'), '////some/share/a/b%25%23c%C3%A9')
69+
# Round-tripping
70+
urls = ['///C:',
71+
'/////folder/test/',
72+
'///C:/foo/bar/spam.foo']
73+
for url in urls:
74+
self.assertEqual(fn(nturl2path.url2pathname(url)), url)
75+
76+
def test_url2pathname_win(self):
77+
fn = nturl2path.url2pathname
78+
self.assertEqual(fn('/C:/'), 'C:\\')
79+
self.assertEqual(fn("///C|"), 'C:')
80+
self.assertEqual(fn("///C:"), 'C:')
81+
self.assertEqual(fn('///C:/'), 'C:\\')
82+
self.assertEqual(fn('/C|//'), 'C:\\')
83+
self.assertEqual(fn('///C|/path'), 'C:\\path')
84+
# No DOS drive
85+
self.assertEqual(fn("///C/test/"), '\\\\\\C\\test\\')
86+
self.assertEqual(fn("////C/test/"), '\\\\C\\test\\')
87+
# DOS drive paths
88+
self.assertEqual(fn('C:/path/to/file'), 'C:\\path\\to\\file')
89+
self.assertEqual(fn('C|/path/to/file'), 'C:\\path\\to\\file')
90+
self.assertEqual(fn('/C|/path/to/file'), 'C:\\path\\to\\file')
91+
self.assertEqual(fn('///C|/path/to/file'), 'C:\\path\\to\\file')
92+
self.assertEqual(fn("///C|/foo/bar/spam.foo"), 'C:\\foo\\bar\\spam.foo')
93+
# Non-ASCII drive letter
94+
self.assertRaises(IOError, fn, "///\u00e8|/")
95+
# UNC paths
96+
self.assertEqual(fn('//server/path/to/file'), '\\\\server\\path\\to\\file')
97+
self.assertEqual(fn('////server/path/to/file'), '\\\\server\\path\\to\\file')
98+
self.assertEqual(fn('/////server/path/to/file'), '\\\\\\server\\path\\to\\file')
99+
# Localhost paths
100+
self.assertEqual(fn('//localhost/C:/path/to/file'), 'C:\\path\\to\\file')
101+
self.assertEqual(fn('//localhost/C|/path/to/file'), 'C:\\path\\to\\file')
102+
# Round-tripping
103+
paths = ['C:',
104+
r'\\\C\test\\',
105+
r'C:\foo\bar\spam.foo']
106+
for path in paths:
107+
self.assertEqual(fn(nturl2path.pathname2url(path)), path)
108+
109+
110+
if __name__ == '__main__':
111+
unittest.main()

Lib/test/test_urllib.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1551,9 +1551,9 @@ def test_pathname2url_win(self):
15511551
'test specific to POSIX pathnames')
15521552
def test_pathname2url_posix(self):
15531553
fn = urllib.request.pathname2url
1554-
self.assertEqual(fn('/'), '/')
1555-
self.assertEqual(fn('/a/b.c'), '/a/b.c')
1556-
self.assertEqual(fn('/a/b%#c'), '/a/b%25%23c')
1554+
self.assertEqual(fn('/'), '///')
1555+
self.assertEqual(fn('/a/b.c'), '///a/b.c')
1556+
self.assertEqual(fn('/a/b%#c'), '///a/b%25%23c')
15571557

15581558
@unittest.skipUnless(sys.platform == 'win32',
15591559
'test specific to Windows pathnames.')
@@ -1595,10 +1595,10 @@ def test_url2pathname_win(self):
15951595
def test_url2pathname_posix(self):
15961596
fn = urllib.request.url2pathname
15971597
self.assertEqual(fn('/foo/bar'), '/foo/bar')
1598-
self.assertEqual(fn('//foo/bar'), '//foo/bar')
1599-
self.assertEqual(fn('///foo/bar'), '///foo/bar')
1600-
self.assertEqual(fn('////foo/bar'), '////foo/bar')
1601-
self.assertEqual(fn('//localhost/foo/bar'), '//localhost/foo/bar')
1598+
self.assertRaises(urllib.error.URLError, fn, '//foo/bar')
1599+
self.assertEqual(fn('///foo/bar'), '/foo/bar')
1600+
self.assertEqual(fn('////foo/bar'), '//foo/bar')
1601+
self.assertEqual(fn('//localhost/foo/bar'), '/foo/bar')
16021602

16031603
class Utility_Tests(unittest.TestCase):
16041604
"""Testcase to test the various utility functions in the urllib."""

Lib/urllib/request.py

Lines changed: 52 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1448,16 +1448,6 @@ def parse_http_list(s):
14481448
return [part.strip() for part in res]
14491449

14501450
class FileHandler(BaseHandler):
1451-
# Use local file or FTP depending on form of URL
1452-
def file_open(self, req):
1453-
url = req.selector
1454-
if url[:2] == '//' and url[2:3] != '/' and (req.host and
1455-
req.host != 'localhost'):
1456-
if not req.host in self.get_names():
1457-
raise URLError("file:// scheme is supported only on localhost")
1458-
else:
1459-
return self.open_local_file(req)
1460-
14611451
# names for the localhost
14621452
names = None
14631453
def get_names(self):
@@ -1474,8 +1464,7 @@ def get_names(self):
14741464
def open_local_file(self, req):
14751465
import email.utils
14761466
import mimetypes
1477-
host = req.host
1478-
filename = req.selector
1467+
filename = req.full_url
14791468
localfile = url2pathname(filename)
14801469
try:
14811470
stats = os.stat(localfile)
@@ -1485,24 +1474,22 @@ def open_local_file(self, req):
14851474
headers = email.message_from_string(
14861475
'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
14871476
(mtype or 'text/plain', size, modified))
1488-
if host:
1489-
host, port = _splitport(host)
1490-
if not host or \
1491-
(not port and _safe_gethostbyname(host) in self.get_names()):
1492-
if host:
1493-
origurl = 'file://' + host + filename
1494-
else:
1495-
origurl = 'file://' + filename
1496-
return addinfourl(open(localfile, 'rb'), headers, origurl)
1477+
return addinfourl(open(localfile, 'rb'), headers, filename)
14971478
except OSError as exp:
14981479
raise URLError(exp)
1499-
raise URLError('file not on local host')
15001480

1501-
def _safe_gethostbyname(host):
1481+
file_open = open_local_file
1482+
1483+
1484+
def _is_local_host(host):
1485+
if not host or host == 'localhost':
1486+
return True
15021487
try:
1503-
return socket.gethostbyname(host)
1488+
name = socket.gethostbyname(host)
15041489
except socket.gaierror:
1505-
return None
1490+
return False
1491+
return name in FileHandler().get_names()
1492+
15061493

15071494
class FTPHandler(BaseHandler):
15081495
def ftp_open(self, req):
@@ -1649,19 +1636,46 @@ def data_open(self, req):
16491636

16501637
MAXFTPCACHE = 10 # Trim the ftp cache beyond this size
16511638

1652-
# Helper for non-unix systems
1653-
if os.name == 'nt':
1654-
from nturl2path import url2pathname, pathname2url
1655-
else:
1656-
def url2pathname(pathname):
1657-
"""OS-specific conversion from a relative URL of the 'file' scheme
1658-
to a file system path; not recommended for general use."""
1659-
return unquote(pathname)
1660-
1661-
def pathname2url(pathname):
1662-
"""OS-specific conversion from a file system path to a relative URL
1663-
of the 'file' scheme; not recommended for general use."""
1664-
return quote(pathname)
1639+
def pathname2url(path, include_scheme=False):
1640+
"""Convert the local pathname *path* to a percent-encoded URL."""
1641+
prefix = 'file:' if include_scheme else ''
1642+
if os.name == 'nt':
1643+
path = path.replace('\\', '/')
1644+
drive, root, tail = os.path.splitroot(path)
1645+
if drive:
1646+
if drive[1:2] == ':':
1647+
prefix += '///'
1648+
elif root:
1649+
prefix += '//'
1650+
tail = quote(tail)
1651+
return prefix + drive + root + tail
1652+
1653+
def url2pathname(url):
1654+
"""Convert the percent-encoded URL *url* to a local pathname."""
1655+
scheme, authority, path = urlsplit(url, scheme='file')[:3]
1656+
if scheme != 'file':
1657+
raise URLError(f'URI does not use "file" scheme: {url!r}')
1658+
if os.name == 'nt':
1659+
path = unquote(path)
1660+
if authority and authority != 'localhost':
1661+
# e.g. file://server/share/path
1662+
path = f'//{authority}{path}'
1663+
elif path.startswith('///'):
1664+
# e.g. file://///server/share/path
1665+
path = path[1:]
1666+
else:
1667+
if path[0:1] == '/' and path[2:3] in ':|':
1668+
# e.g. file:////c:/path
1669+
path = path[1:]
1670+
if path[1:2] == '|':
1671+
# e.g. file:///c|path
1672+
path = path[:1] + ':' + path[2:]
1673+
path = path.replace('/', '\\')
1674+
else:
1675+
if not _is_local_host(authority):
1676+
raise URLError(f'file URI not on local host: {url!r}')
1677+
path = unquote(path)
1678+
return path
16651679

16661680

16671681
ftpcache = {}

0 commit comments

Comments
 (0)