-
-
Notifications
You must be signed in to change notification settings - Fork 329
remove() before os.rename() breaks thread-safety #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sorry for the slow response, many thanks for raising this.
It sounds like the least bad option might be to use os.replace() under PY3
and fall back to something else under PY2. I'll give it some thought.
…On Tuesday, 15 May 2018, Remo Goetschi ***@***.***> wrote:
This section in storage.py
<meteotest@3ccdf7d>
breaks thread/process-safety:
# move temporary file into placeif os.path.exists(file_path):
os.remove(file_path)
os.rename(temp_path, file_path)
Problem is, a "reader" process may fail because the file file_path
sometimes does not exist.
The os.path.exists() check was introduced in order to fix tests on
Windows:
https://github.com/zarr-developers/zarr/blame/
fe1c120/zarr/storage.py#L209
Possible solutions:
- Execute os.remove(file_path) on Windows only.
- Use os.replace()
<https://docs.python.org/3.5/library/os.html#os.replace> instead of
os.rename. This breaks Python 2.7 compatibility, though.
Both solutions probably do not guarantee atomic renaming on Windows.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#263>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QiMBs-07HjSRrBssJcizq9ZcVp5aks5tytrbgaJpZM4T_kc_>
.
--
If I do not respond to an email within a few days, please feel free to
resend your email and/or contact me by other means.
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health
Big Data Institute
Li Ka Shing Centre for Health Information and Discovery
Old Road Campus
Headington
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596 or +44 (0)7866 541624
Email: [email protected]
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: @alimanfoo <https://twitter.com/alimanfoo>
|
The problem with if sys.version_info[0] >= 3: # Py3
os.replace(temp_file, file_path)
else: # Py2
try:
os.rename(temp_path, file_path)
except OSError:
# on Windows, OSError is raised if destination already exists
with some_process_synchronizer[file_path + ".rename.lock"]:
if os.path.exists(file_path):
os.remove(file_path)
os.rename(temp_path, file_path) Like this, renaming on Posix is atomic. On Py2+Windows, checking + renaming must be wrapped in a critical section to avoid race conditions (it's probably still unsafe in a distributed/cloud environment?). BTW, is os.replace/rename safe for zarr cloud stores? |
That looks sensible in principle, although we might get into difficulty because I believe file locking is not supported on some file systems (e.g., NFSv3). Making the change to use os.replace on PY3 seems a no-brainer, we should do that. Maybe we can just live with the fact that, on PY2 and windows, this operation may not be atomic. I'm not sure it ever would be a problem anyway. Zarr was not intended to support situations where there are both readers and writers working concurrently on the same array. It is intended to support multiple concurrent readers, or multiple concurrent writers. Could this race condition ever occur with concurrent writers? Only if two writers were attempting to update the same chunk at the same time. But if that could happen, then the user should be using a synchronizer anyway, which would also protect against this race condition. So hence I'm pretty comfortable living with being not fully atomic on PY2+windows. I may not have thought out all the possibilities here though, so please feel free to discuss. On cloud stores the operations are not atomic AFAIK, data is updated in-place, although this will depend entirely on how the storage layer is implemented. E.g., GCSFS just opens and writes, although I imagine it would be possible to implement in a different way to be atomic. Btw the main driver behind the design to write to a temp file then move into place was to avoid file corruptions due to some write failing half way through. The intention was that concurrency issues would be handled in a different layer, via the synchronization features. |
Thanks @alimanfoo for looking at this. There are two ways the
For our use-case we really need atomic rename. Otherwise we would require read-locks and those are a pain.
That would be ok with us. |
Thanks @sbalmer. PR welcome if you have the time, otherwise happy to schedule this in for the next round of work. At some point I'd be interested to know more about how you're using zarr. The usage pattern of having concurrent readers and writers is not something I've designed for up to now, but if zarr can be made to work then it would be good to know more so I can keep it in mind for future. |
I should get around to do a PR next week. I've discussed with @weatherfrog how we can cover the cases: If Python >= 3.3: Great, use This way we get read-consistency during writes for all situations except Windows pre Python 3.3. Regarding overall read consistency There are also limitations around deletion. Race-conditions for parallel reads during deletes are introduced by |
So there is a backport of |
Using the backport sounds good to me.
…On 30 May 2018 at 07:08, jakirkham ***@***.***> wrote:
So there is a backport of os.replace, which we could use.
ref: https://pypi.org/project/pyosreplace/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#263 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qv2q4xDn1YyJCS2Raow9sIjovJzBks5t3jdWgaJpZM4T_kc_>
.
--
If I do not respond to an email within a few days, please feel free to
resend your email and/or contact me by other means.
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health
Big Data Institute
Li Ka Shing Centre for Health Information and Discovery
Old Road Campus
Headington
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596 or +44 (0)7866 541624
Email: [email protected]
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: @alimanfoo <https://twitter.com/alimanfoo>
|
Any interest in creating a PR, @weatherfrog @sbalmer? |
Backport looks OK. I'm out of office this week, but I can create a PR next week (together with @sbalmer). |
In case it helps, we added the backport to conda-forge. ref: https://github.com/conda-forge/pyosreplace-feedstock |
When the chunk file is first removed before the new version is moved into place, racing reads may encounter a missing chunk. Using rename() or replace() without remove() avoids the issue on Posix-Systems as the methods are atomic. The fallback of remove() -> rename() is included for Windows pre Python 3.3. Fixes zarr-developers#263
When the chunk file is first removed before the new version is moved into place, racing reads may encounter a missing chunk. Using rename() or replace() without remove() avoids the issue on Posix-Systems as the methods are atomic. The fallback of remove() -> rename() is included for Windows pre Python 3.3. Fixes zarr-developers#263
When the chunk file is first removed before the new version is moved into place, racing reads may encounter a missing chunk. Using rename() or replace() without remove() avoids the issue on Posix-Systems as the methods are atomic. The fallback of remove() -> rename() is included for Windows pre Python 3.3. Fixes zarr-developers#263
The
So I'm not sure it would be useful here. I've made the merge request without that package. |
Says Python 2.7 is supported. We actually build a conda-forge package for it and run the tests. Not sure why the build fails for you, but would try the conda-forge package if you need a working binary. |
Ah, missed this part.
import sys
if sys.version_info >= (3, 3):
from os import replace
elif sys.platform == "win32":
from osreplace import replace
else:
# POSIX rename() is always atomic
from os import rename as replace |
Thanks @jakirkham for the clarifications. Still I have some reservations about adding a package dependency over this issue:
Since the only limitation is lacking concurrent read/write consistency for some rare combinations of platforms I'd rather keep it simple. After all at the moment nobody using the latest releases gets consistency during concurrent writes and there seems to be little pressure over it. On the other hand if the package is only pulled-in on Windows it doesn't affect us much :-) |
They are not equivalent actually. The reason On Python 2, we don't have this function so we have to do something else. Fortunately The Python lines suggested are not atomic. In fact, Victor Stinner suggested the same lines in Python issue 8828 and was shot down by other core devs because those lines are not atomic. Hence why they wrote the Yes, it is an extra dependency, but it only affects Windows users on Python 2 to ensure correct and consistent behavior. So it doesn't really affect anyone else. Plus we've already done the work for you testing this (have verified previously it pip installs without issues on Windows) and getting the binary packaged. Am not sure I understand the reluctance to add it. |
FWIW I'm happy with adding a dependency on pyosreplace, I like the idea that we could have a cross-platform solution that works on pythons 2 and 3. I do think we want to drop python 2 support at some time in the not too distance future, but we will keep supporting python 2 for the next zarr release at least. @jakirkham how would this dependency work in setup.py? Is there a way to add platform and python version specific entries into |
Sure. There are a couple ways we can do this.
|
Environmental markers sound good to me, could go in both setup.py and
requirements files.
…On Fri, 9 Nov 2018 at 01:16, jakirkham ***@***.***> wrote:
Sure. There are a couple ways we can do this.
1. Conditionally add the requirement to install_requires.
2. Using environmental markers
<https://www.python.org/dev/peps/pep-0496/#examples> joined with and.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#263 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QlIUicrhLVgDhL3M5jZqlS5TRWnDks5utNdngaJpZM4T_kc_>
.
--
Please feel free to resend your email and/or contact me by other means if
you need an urgent reply.
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health
Big Data Institute
Li Ka Shing Centre for Health Information and Discovery
Old Road Campus
Headington
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596 or +44 (0)7866 541624
Email: [email protected]
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: @alimanfoo <https://twitter.com/alimanfoo>
|
@jakirkham sorry I wasn't aware you actually had the need to get this fixed for Python 2. My reluctance to add the package is mainly about the extra complexity introduced during the build. I have no easy means of testing that part. |
No worries. Well as far as testing, we do have CI setup. Testing there should be sufficient. |
When the chunk file is first removed before the new version is moved into place, racing reads may encounter a missing chunk. Using rename() or replace() without remove() avoids the issue on Posix-Systems as the methods are atomic. The fallback of remove() -> rename() is included for Windows pre Python 3.3. Fixes zarr-developers#263
* avoid race condition during chunk write When the chunk file is first removed before the new version is moved into place, racing reads may encounter a missing chunk. Using rename() or replace() without remove() avoids the issue on Posix-Systems as the methods are atomic. The fallback of remove() -> rename() is included for Windows pre Python 3.3. Fixes #263 * move feature-detection to init-time so it's not repeated on every write * use pyosreplace to get atomic replace() on Windows * disable coverage count for legacy branches * Use conditional instead of env marker Because the env markers didn't work. Just guessing at this point. * add pyosreplace package to requirements * release notes [ci skip]
This section in storage.py breaks thread/process-safety:
Problem is, a "reader" process may fail because the file
file_path
is non-existent from time to time.The
os.path.exists()
check was introduced in order to fix tests on Windows:https://github.com/zarr-developers/zarr/blame/fe1c120930ebd9c529d3286bfe88500f8edd6620/zarr/storage.py#L209
Possible solutions:
os.remove(file_path)
on Windows only.os.rename
. This breaks Python 2.7 compatibility, though.Both solutions probably do not guarantee atomic renaming on Windows.
The text was updated successfully, but these errors were encountered: