Skip to content

gh-90385: Add pathlib.Path.walk() method #92517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
ac622b7
Add Path.walk and Path.walk_bottom_up methods
zmievsa May 8, 2022
14f031a
Fix errors in Path.walk docstrings and add caching of entries
zmievsa May 9, 2022
b203517
Merge branch 'main' into bpo-46227/add-pathlib.Path.walk-method
Ovsyanka83 May 9, 2022
3ad60a9
Refactor symlink handling
zmievsa May 9, 2022
889d7fe
Merge branch 'bpo-46227/add-pathlib.Path.walk-method' of github.com:O…
zmievsa May 9, 2022
2f98823
Add Path.walk docs and unite Path.walk interfaces
zmievsa May 10, 2022
513030a
Remove Path.walk_bottom_up definition
zmievsa May 10, 2022
5fdd72e
📜🤖 Added by blurb_it.
blurb-it[bot] May 10, 2022
452f24e
Add Path.walk tests
zmievsa May 10, 2022
3702a12
Make Path.walk variable naming consistent
zmievsa May 10, 2022
fabc925
Remove redundant FIXME
zmievsa May 10, 2022
b387b54
Minor Path.walk docs and tests fixes
zmievsa May 10, 2022
097fbbf
Merge branch 'main' into bpo-46227/add-pathlib.Path.walk-method
merwok Jun 27, 2022
76fadfc
Update Doc/library/pathlib.rst
Ovsyanka83 Jun 30, 2022
0c19871
Update Doc/library/pathlib.rst
Ovsyanka83 Jun 30, 2022
50b4a2b
Update Doc/library/pathlib.rst
Ovsyanka83 Jun 30, 2022
cade3e9
Update Doc/library/pathlib.rst
Ovsyanka83 Jun 30, 2022
b32627c
Update Doc/library/pathlib.rst
Ovsyanka83 Jun 30, 2022
d1a0833
Update Doc/library/pathlib.rst
Ovsyanka83 Jun 30, 2022
e367f1f
Update Doc/library/pathlib.rst
Ovsyanka83 Jun 30, 2022
bf8b0eb
Fix 'no blank lines' error
zmievsa Jun 30, 2022
d8667c7
Apply suggestions from code review
Ovsyanka83 Jul 3, 2022
4509797
More code review fixes for Path.walk
zmievsa Jul 3, 2022
20a73ed
Merge branch 'main' into bpo-46227/add-pathlib.Path.walk-method
Ovsyanka83 Jul 3, 2022
e61d57b
Merge branch 'main' into bpo-46227/add-pathlib.Path.walk-method
brettcannon Jul 8, 2022
15d96b9
Apply suggestions from code review
Ovsyanka83 Jul 9, 2022
92e1a7a
Apply suggestions from code review
Ovsyanka83 Jul 9, 2022
c509da3
Merge branch 'main' into bpo-46227/add-pathlib.Path.walk-method
Ovsyanka83 Jul 9, 2022
cfa730d
Code review fixes
zmievsa Jul 10, 2022
7aec96d
Clarify pathlib.Path.walk() error handling
zmievsa Jul 10, 2022
38fe1e5
Apply suggestions from code review
Ovsyanka83 Jul 10, 2022
eef3ba3
Code review fixes
zmievsa Jul 10, 2022
4dfdcd7
Merge branch 'bpo-46227/add-pathlib.Path.walk-method' of github.com:O…
zmievsa Jul 10, 2022
8fe3b62
Apply suggestions from code review
Ovsyanka83 Jul 12, 2022
e8ea6ba
Code review fixes
zmievsa Jul 12, 2022
79cf8fd
Remove backticks around True and False
zmievsa Jul 13, 2022
bed850e
Apply suggestions from code review
Ovsyanka83 Jul 17, 2022
203ec3d
Apply suggestions from code review
zmievsa Jul 17, 2022
eef6054
Apply suggestions from code review
brettcannon Jul 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions Doc/library/pathlib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -920,6 +920,112 @@ call fails (for example because the path doesn't exist).
to the directory after creating the iterator, whether a path object for
that file be included is unspecified.

.. method:: Path.walk(top_down=True, on_error=None, follow_symlinks=False)

Generate the file names in a directory tree by walking the tree
either top-down or bottom-up.

For each directory in the directory tree rooted at *self* (including
*self* but excluding '.' and '..'), yields a 3-tuple
``(dirpath, dirnames, filenames)``

*dirpath* is a Path to the directory, *dirnames* is a list of the names
of the subdirectories in *dirpath* (excluding ``'.'`` and ``'..'``), and
*filenames* is a list of the names of the non-directory files in *dirpath*.
Note that the names in the lists contain no path components. To get a full
path (which begins with *self*) to a file or directory in *dirpath*, do
``dirpath / name``. Whether or not the lists are sorted depends on the file
system. If a file or a directory is removed from or added to the *dirpath*
during the generation of *dirnames* and *filenames*, it is uncertain whether
the new entry will appear in the generated lists.

If optional argument *top_down* is ``True`` or not specified, the triple for a
directory is generated before the triples for any of its subdirectories
(directories are generated top-down). If *top_down* is ``False``, the triple
for a directory is generated after the triples for all of its subdirectories
(directories are generated bottom-up). No matter the value of *top_down*, the
list of subdirectories is retrieved before the tuples for the directory and
its subdirectories are generated.

When *top_down* is True, the caller can modify the *dirnames* list in-place
(For example, using :keyword:`del` or slice assignment), and :meth:`Path.walk`
will only recurse into the subdirectories whose names remain in *dirnames*;
this can be used to prune the search, or to impose a specific order of visiting,
or even to inform :meth:`Path.walk` about directories the caller creates or
renames before it resumes :meth:`Path.walk` again. Modifying *dirnames* when
*top_down* is False has no effect on the behavior of :meth:`Path.walk()`, since the
directories in *dirnames* have already been generated by the time *dirnames*
is yielded to the caller.

By default errors from :meth:`Path._scandir` call are ignored. If
optional argument *on_error* is specified, it should be a callable; it
will be called with one argument, an :exc:`OSError` instance. It can
report the error to continue with the walk, or raise the exception
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "report the error to continue the walk"? Do you mean suppress/consume the exception, or else re-raise it?

Copy link
Contributor Author

@zmievsa zmievsa Jul 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly that. Please, take a look at these lines again. I tried to clarify and simplify it a little bit.

to abort the walk. Note that the filename is available as the
``filename`` attribute of the exception object.

By default, :meth:`Path.walk` will not walk down into symbolic links that
resolve to directories. Set *follow_symlinks* to ``True`` to visit directories
pointed to by symlinks, on systems that support them.

.. note::

Be aware that setting *follow_symlinks* to ``True`` can lead to infinite
recursion if a link points to a parent directory of itself. :meth:`Path.walk`
does not keep track of the directories it visited already.

.. note::

If self is a relative Path, don't change the current working directory between
resumptions of :meth:`Path.walk`. :meth:`Path.walk` never changes the current
directory, and assumes that the caller doesn't either.

.. note::

:meth:`Path.walk` assumes the directories have not been modified between
its resumptions. I.e. If a directory from *dirnames* has been replaced
with a symlink and *follow_symlinks* = ``False``, :meth:`Path.walk` will
still try to descend into it. To prevent such behavior, remove directories
from *dirnames* if they have been modified and you do not want to
descend into them anymore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note feels like unnecessary detail to me - replacing a directory with a symlink is an edge case, and it can already be inferred from the prior section on dirnames that entries should be removed from dirnames if they no longer exist, or are no longer directories.

Copy link
Contributor Author

@zmievsa zmievsa Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You see, the fact that we still try to descend into these "no-longer-directories" is more of an optimization, and not an intuitive one at that. So I feel like having a short note about it at the end of the documentation of the method won't hurt.


.. note::

Unlike :func:`os.walk`, :meth:`Path.walk` adds symlinks to directories into *filenames*

This example displays the number of bytes taken by non-directory files in each
directory under the starting directory, except that it doesn't look under any
__pycache__ subdirectory::

from pathlib import Path
for root, dirs, files in Path("cpython/Lib/concurrent").walk(on_error=print):
print(
root,
"consumes",
sum((root / file).stat().st_size for file in files),
"bytes in",
len(files),
"non-directory files"
)
if '__pycache__' in dirs:
dirs.remove('__pycache__')

In the next example (simple implementation of :func:`shutil.rmtree`),
walking the tree bottom-up is essential, :func:`rmdir` doesn't allow
deleting a directory before the directory is empty::

# Delete everything reachable from the directory "top",
# assuming there are no symbolic links.
# CAUTION: This is dangerous! For example, if top == Path('/'),
# it could delete all your disk files.
for root, dirs, files in top.walk(topdown=False):
for name in files:
(root / name).unlink()
for name in dirs:
(root / name).rmdir()

.. versionadded:: 3.12

.. method:: Path.lchmod(mode)

Like :meth:`Path.chmod` but, if the path points to a symbolic link, the
Expand Down
116 changes: 116 additions & 0 deletions Lib/pathlib.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from collections import deque
import fnmatch
import functools
import io
Expand Down Expand Up @@ -1385,6 +1386,121 @@ def expanduser(self):

return self

def walk(self, top_down=True, on_error=None, follow_symlinks=False):
"""Generate a top-down directory tree from this directory

For each directory in the directory tree rooted at self (including
self but excluding '.' and '..'), yields a 3-tuple

dirpath, dirnames, filenames

dirpath is the Path to the directory. dirnames is a list of
the names of the subdirectories in dirpath (excluding '.' and '..').
filenames is a list of the names of the non-directory files in dirpath.
Note that the names in the lists are just names, with no path components.
To get a full path (which begins with top) to a file or directory in
dirpath, do dirpath / name.

If optional arg 'top_down' is true or not specified, the triple for a
directory is generated before the triples for any of its subdirectories
(directories are generated top down). If top_down is false, the triple
for a directory is generated after the triples for all of its
subdirectories (directories are generated bottom up).

When top_down is True, the caller can modify the dirnames list in-place
(e.g., via del or slice assignment), and walk will only recurse
into the subdirectories whose names remain in dirnames; this
can be used to prune the search, or to impose a specific order of
visiting. Modifying dirnames when
top_down is False has no effect on the behavior of Path.walk(), since the
directories in dirnames have already been generated by the time dirnames
itself is generated. No matter the value of top_down, the list of
subdirectories is retrieved before the tuples for the directory and its
subdirectories are generated.

By default errors from Path._scandir() call are ignored. If
optional arg 'on_error' is specified, it should be a callable; it
will be called with one argument, an OSError instance. It can
report the error to continue with the walk, or raise the exception
to abort the walk. Note that the filename is available as the
filename attribute of the exception object.

By default, Path.walk does not follow symbolic links to subdirectories
on systems that support them. In order to get this functionality, set
the optional argument 'follow_symlinks' to true. Unlike os.walk,
Path.walk only adds symbolic links to dirnames if follow_symlinks=True.

Caution: if self is a relative Path, don't change the
current working directory between resumptions of walk. walk never
changes the current directory, and assumes that the caller doesn't
either.

Caution: Unlike os.walk, Path.walk assumes the directories have not
been modified between its resumptions. I.e. If a directory from
dirnames has been replaced with a symlink and follow_symlinks=False,
walk will still try to descend into it. To prevent such behavior,
remove directories from dirnames if they have been modified and you
do not want Path.walk to descend into them anymore.

Example:

from pathlib import Path
for root, dirs, files in Path().walk(on_error=print):
print(
root,
"consumes",
sum((root / file).stat().st_size for file in files),
"bytes in",
len(files),
"non-directory files"
)
# don't visit __pycache__ directories
if '__pycache__' in dirs:
dirs.remove('__pycache__')
"""
sys.audit("pathlib.Path.walk", self, on_error, follow_symlinks)
return self._walk(top_down, on_error, follow_symlinks)

def _walk(self, top_down, on_error, follow_symlinks):
dirs = []
nondirs = []

# We may not have read permission for self, in which case we can't
# get a list of the files the directory contains. os.walk
# always suppressed the exception then, rather than blow up for a
# minor reason when (say) a thousand readable directories are still
# left to visit. That logic is copied here.
try:
scandir_it = self._scandir()
except OSError as error:
if on_error is not None:
on_error(error)
return

with scandir_it:
for entry in scandir_it:
try:
is_dir = entry.is_dir(follow_symlinks=follow_symlinks)
except OSError:
# If is_dir() raises an OSError, consider that the entry
# is not a directory, same behavior as os.path.isdir()
is_dir = False

if is_dir:
dirs.append(entry.name)
else:
nondirs.append(entry.name)

if top_down:
yield self, dirs, nondirs

for dir_name in dirs:
new_path = self._make_child_relpath(dir_name)
yield from new_path._walk(top_down, on_error, follow_symlinks)

if not top_down:
yield self, dirs, nondirs


class PosixPath(Path, PurePosixPath):
"""Path subclass for non-Windows systems.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added method walk to :class:`pathlib.Path` objects as a pathlib alternative for :func:`os.walk`