-
-
Notifications
You must be signed in to change notification settings - Fork 32k
bpo-29659: Expose copyfileobj() length
arg for public use
#328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow these steps to rectify the issue:
Thanks again to your contribution and we look forward to looking at it! |
length
arg for public uselength
arg for public use
Hello,
I have completed the above steps :)
Thanks!
Tyler Goodlet
…On Sun, Feb 26, 2017 at 8:19 PM, the-knights-who-say-ni < ***@***.***> wrote:
Hello, and thanks for your contribution!
I'm a bot set up to make sure that the project can legally accept your
contribution by verifying you have signed the PSF contributor agreement
<https://www.python.org/psf/contrib/contrib-form/> (CLA).
Unfortunately our records indicate you have not signed the CLA. For legal
reasons we need you to sign this before we can look at your contribution.
Please follow these steps to rectify the issue:
1. Sign the PSF contributor agreement
<https://www.python.org/psf/contrib/contrib-form/>. The "
bugs.python.org username" requested by the form is the "Login name"
field in "Your Details" at b.p.o
<https://cloud.githubusercontent.com/assets/2680980/23276970/d14a380c-f9d1-11e6-883d-e13b6b211239.png>
2. *Wait at least one US business day* and then check the "Contributor
form received entry under "Your Details" on bugs.python.org to see if
your account has been marked as having signed the CLA (the delay is due to
a person having to manually check your signed CLA)
3. Reply here saying you have completed the above steps
Thanks again to your contribution and we look forward to looking at it!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#328 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AARzZYMi41HaBN6XwUx3iPTjPSu_myyrks5rgiSogaJpZM4MMmcV>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must update Doc/library/shutil.rst: document the new parameter, but also add a ".. versionchanged:: 3.7" section to document the change.
Lib/shutil.py
Outdated
|
||
An in-memory buffer size can be set with `length`. | ||
""" | ||
length = length or 16*1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest:
if not length:
length = 16 * 1024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
Lib/shutil.py
Outdated
def copyfileobj(fsrc, fdst, length=None): | ||
"""Copy data from file-like object `fsrc` to file-like object `fdst`. | ||
|
||
An in-memory buffer size can be set with `length`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should document the default size: 16 kB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do 👍
Lib/shutil.py
Outdated
"""Copy data from src to dst. | ||
|
||
If follow_symlinks is not set and src is a symbolic link, a new | ||
symlink will be created instead of copying the file it points to. | ||
|
||
An in memory buffer size can be set with `length`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same: document default size.
4feede0
to
39a9b83
Compare
@Haypo I believe I've addressed everything! |
Doc/library/shutil.rst
Outdated
@@ -74,6 +74,9 @@ Directory and files operations | |||
Raise :exc:`SameFileError` instead of :exc:`Error`. Since the former is | |||
a subclass of the latter, this change is backward compatible. | |||
|
|||
.. versionchanged:: 3.7 | |||
Expose the `length` arg from `copyfileobj`; default is 16 KB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn'd say "expose ..." but just "Added length parameter", and versionadded is not the right place to document "default is ...", put it in the documentation body.
You didn't update copyfile() signature. Same remark for the 2 other functions.
Moreover, you should also document the length in the function document, not only in versionadded. The "documentation" can just ask to refer to copyfileobj().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoot yeah not sure how I missed the signatures...
Lib/shutil.py
Outdated
def copyfileobj(fsrc, fdst, length=None): | ||
"""Copy data from file-like object `fsrc` to file-like object `fdst`. | ||
|
||
An in-memory buffer size can be set with `length`; the default is 16 KB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please mention the unit, "size in bytes"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Haypo to clarify, do you prefer:
'An in-memory buffer size can be set with
length; the default size in bytes is 16*1024'
,
or
"An in-memory buffer size in bytes can be set with
length; the default is 16 KB.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"An in-memory buffer size in bytes can be set withlength; the default is 16 KB."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encourage you to use KiB, which other parts of the documentation already use. If you write 16 KB it could easily be interpreted as 16,000 B (lower-case k means 1000, as in km for kilometre).
39a9b83
to
158a71b
Compare
@Haypo hopefully 3rd times the charm ;) If you'd like me to squash the history to one commit let me know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: leave API of copyfileobj should alone as 'length=16*1024' to document default. For other functions, say something like 'If length not passed, copyfileobj uses its default.' We might change it someday.
Required: new or revised tests that use the expanded api (and fail without patch).
@terryjreedy the only problem with that is that we'll need to have the Yes I totally agree about the test. I guess it goes here? |
Yes I totally agree about the test. I guess it goes here?
|
Doc/library/shutil.rst
Outdated
@@ -74,6 +80,9 @@ Directory and files operations | |||
Raise :exc:`SameFileError` instead of :exc:`Error`. Since the former is | |||
a subclass of the latter, this change is backward compatible. | |||
|
|||
.. versionchanged:: 3.7 | |||
Added `length` parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation should use three spaces here:
.. versionchanged:: 3.7
Added the *length* parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`length` -> *length*
(and please finish the sentence with a full stop.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
When copying large files the copy time can be sufficiently shortened by increasing the memory buffer used in the `copyfileobj()` routine. Expose the `length` argument from `copyfileobj()` upwards for use throughout the module.
158a71b
to
bfeaee8
Compare
Doc/library/shutil.rst
Outdated
|
||
Note that if the current file position of the *fsrc* object is not 0, | ||
only the contents from the current file position to the end of the file | ||
will be copied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO “Note that” is redundant. In fact the whole first line could be dropped, and “Only the contents from the current file position . . .” could be put back in the first paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please address pending comments before I can approve the change ;-)
Lib/shutil.py
Outdated
"""Copy data from file-like object `fsrc` to file-like object `fdst`. | ||
|
||
An in-memory buffer size in bytes can be set with `length`; the default is | ||
16 KB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: KiB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Lib/shutil.py
Outdated
16 KB. | ||
""" | ||
if not length: | ||
length = 16*1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 8: add spaces around operators, "16 * 1024"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Lib/shutil.py
Outdated
@@ -251,10 +262,13 @@ def copy2(src, dst, *, follow_symlinks=True): | |||
If follow_symlinks is false, symlinks won't be followed. This | |||
resembles GNU's "cp -P src dst". | |||
|
|||
An in-memory buffer size in bytes can be set with `length`; the default is | |||
16 KB. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this empty line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Note: I created PR #4293 to replace KB with KiB in the Python code base ;-) |
bfeaee8
to
0043566
Compare
I think everything has been addressed except for @terryjreedy's:
I don't know how to keep the default value in Whatever you all think is best I'll do 👍 |
Lib/test/test_shutil.py
Outdated
src = os.path.join(src_dir, 'foo') | ||
write_file(src, 'foo' * 10 ** 6) | ||
# buffer with 20 Kib | ||
fn(src, dst_dir, length=20 * 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use a smaller file: write('x' * 100) and then length=20.
By the way, I suggest to use tempfile.NamedTemporaryFile rather than temporary files:
with tempfile.NamedTemporaryFile() as src:
with tempfile.NamedTemporaryFile() as dst:
write_file(src.name, b'x' * 100, binary=True)
fn(src.name, dst.name, length=20)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe thanks @vstinner! You did the work for me 👍
0043566
to
8daf7a8
Compare
@@ -73,8 +73,14 @@ class RegistryError(Exception): | |||
and unpacking registries fails""" | |||
|
|||
|
|||
def copyfileobj(fsrc, fdst, length=16*1024): | |||
"""copy data from file-like object fsrc to file-like object fdst""" | |||
def copyfileobj(fsrc, fdst, length=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def copyfileobj(fsrc, fdst, length=16*1024):
I personally prefer this way in my code to concentrate 2 lines worth of code into one.
if not length:
length = 16 * 1024
It looks a little nicer and not to mention saves lines of code that basically do the same thing.
And yes I am also guilty of doing it this way and then people started hating me for it and calling my a bad programmer for it.
Just to clear this up since it seems my original reasoning isn't showing well and the drive-by comments are piling up. Afaict there are 3 ways to deal with the default value of
If one of these solutions is preferred over the others after acknowledging the trade-offs I'm fine to do it, but just simply saying |
@vstinner looks like the appveyor build failed due to the |
Sorry, I don't know why the test failed. |
Looks to me like an normal permission error as python seems to not have permissions to write to the temp folder? Maybe somehow have python write to somewhere that does not require administrator permissions? It would be nice if python had an nice function that you provide the path and it checks if you have permissions to write to there to avoid having to try / except permission errors. 🤔 That function would be the perfect thing to fix this test on Windows. |
Ok so besides the outstanding:
Is there anything else in the patch that needs to be addressed? |
I dont think so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike length=-1 "feature".
16 KiB. | ||
""" | ||
if not length: | ||
length = 16 * 1024 | ||
while 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that it's ok to use a loop if the length is negative. I suggest to have a special case for negative value calling read() (no parameter) only once.
The integer *length*, if given, is the buffer size; the default value | ||
in bytes is 16 KiB. A negative *length* value means to copy the data without | ||
looping over the source data in chunks; by default the data is read in | ||
chunks to avoid uncontrolled memory consumption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"A negative length value means to copy the data without looping over the source data in chunks"
I dislike this definition. In practice, negative means "unlimited" buffer size: the whole input file is loaded into memory.
I'm not sure that it's a good practice to try to load files of unknown size into memory.
I suggest to remove this feature which seems more like a side effect than a carefully designed API.
If you want to get fast copy, pass a very large length like 1 GB. But if Python starts to load 1 TB into memory, it's likely to crash the system... At least, to slow down the system, a lot.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Fixed by https://bugs.python.org/issue33671 |
Bumps [gidgethub](https://github.com/brettcannon/gidgethub) from 4.0.0 to 4.1.0. - [Release notes](https://github.com/brettcannon/gidgethub/releases) - [Changelog](https://github.com/brettcannon/gidgethub/blob/master/docs/changelog.rst) - [Commits](https://github.com/brettcannon/gidgethub/commits/v4.1.0) Signed-off-by: dependabot-preview[bot] <[email protected]> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com> Co-authored-by: Mariatta <[email protected]>
Corresponds to issue bpo-29659
When copying large files the copy time can be sufficiently
shortened by increasing the memory buffer used in the
copyfileobj()
routine. Expose the
length
argument fromcopyfileobj()
upwardsfor use throughout the module.
https://bugs.python.org/issue29659