-
-
Notifications
You must be signed in to change notification settings - Fork 329
Concurrent writes to the same chunk using DirectoryStore #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for writing this up, @alimanfoo. Agree with everything you have laid out. It's worth pointing out that some filesystems may be slow to update changes. So there could be deadlocks or worse collisions. Should we start thinking about this in the case of other stores? AFAIK Python 2 doesn't have |
Thanks for writing this up, @alimanfoo <https://github.com/alimanfoo>.
Agree with everything you have laid out.
It's worth pointing out that some filesystems may be slow to update
changes. So there could be deadlocks or worse collisions
Yeah, I'm guessing that relying on opening files in exclusive mode would
through up a bunch of other issues, although I have limited experience at
this level. E.g., one question I had is whether NFS or other distributed
file systems support it or not. We might want to offer it as an option at
least initially to gain some experience.
Should we start thinking about this in the case of other stores?
Possibly, not sure what can be done though.
In some cases nothing I'm sure, e.g., cloud object stores. People using
those stores will just need to understand they either align their writes or
use some form of locking outside the store.
In other cases I think stores may already implement their own internal
locking, e.g., bdb.
AFAIK Python 2 doesn't have mode="x". So we may need to use something
else for compatibility (though there may be other motivating factors). One
option if we want to lock the actual file would be portalocker
<https://github.com/WoLpH/portalocker>. Alternatively we could use
fasteners per usual to solve this with a different file. There are a
bunch of other similar packages that could be worth looking at as well.
FWIW I think it would be good to keep the separation between
synchronization (locking) and storage. I.e., we don't try to implement any
locking within file-system based stores. Instead we continue to provide a
mechanism by which users can add a locking layer if they need to, which is
decoupled from what type of storage they are using. That way they are free
to use different types of locking with different types of stores. Main
problem I believe for file systems is that a bunch of them don't support
locking, so you couldn't rely on it. And anyway if you know you are working
with a multithreaded app, you might prefer to use thread locks rather than
file locks for performance, for example.
So I think the question is not how to implement synchronization within a
store, but rather whether there are any opportunities within a store to be
defensive and raise errors if writes do collide.
—
… You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#328 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QjXazuUmH3U571-iQP0NlRCAfFsrks5utbu9gaJpZM4YWi-h>
.
|
On a bit of a tangent, the filesystem-spec project has the concept of transactions, which use move in the case of LocalFileSystem once the transaction is completed; this is semi-atomic and single-process, so not really along the lines of what's being talked about here. People may find it interesting to investigate the code I have there. But that mechanism, or something like it, could be an inherent piece of functionality of the storage layer, not of the zarr-specific key-value mapping class. My point is, it's worth considering how this discussion could be extended to arbitrary storage backends, not just local file. Does it make sense to talk about sync mechanisms and races with lower-latency systems such as S3? |
I like the idea of providing a detection help as long as it's reasonably cheap. In my view it needn't work reliably on all platforms. Just having it would be a plus. As for the implementation: Wouldn't locking the entire store (for example using a lock on |
Currently, DirectoryStore (and sub-classes) implement a form of atomic write, which means that when a key/value is being set (e.g., data for a chunk are being written) the data will first be written to a temporary file, and then if the write is successful, the temporary file will be moved into place. The rationale for this design is that data should never be left in a half-written state, i.e., a write either succeeds completely, or fails completely.
Also, DirectoryStore currently aims to be safe to use with multiple threads and/or processes. A question arises, what should happen if two threads or two processes attempt to write to the same key at the same time?
Before discussing that question, it is worth saying that in general, if there is a chance that two threads or processes might write to the same key at the same time, then that is something the user should be trying to avoid. Even if
DirectoryStore
uses atomic writes, if two write operations occur concurrently, then some data will get lost - one of the concurrent writes will get overwritten by the other. The user can do two things to avoid this. Either (1) they craft their own code carefully to ensure write operations are fully aligned with chunk boundaries, so two threads or processes are never writing to the same chunk, or (2) they use the support provided in zarr for synchronisation (locking). Zarr decouples storage from synchronisation, which allows different methods of synchronisation (locking) to be used with different types of storage. In other words, synchronisation is not the responsibility of the storage layer.That said, it can (and does) happen that a user who is relatively new to Zarr might not yet be aware of these issues, and so might try to run a program that does not align writes with chunk boundaries, and does not implement any synchronisation. A specific example of that is #277. Ironically, there is currently a bug (#263) in the atomic writing implementation, which means that a race condition can occur during concurrent writes to the same chunks, which generates an error. In the case of #277, that bug was in fact a boon, because it caused an error that ultimately led to the user realising that synchronisation issues were occurring, which then led them to rework their own code.
However, the atomic write bug (#263) is in the process of being fixed (#327). When it is fixed, that will mean that in a situation like #277, the user would not get any error messages, and write operations would silently get lost. Presumably the problem would take much longer to surface, because the user will need to inspect the data to realise something has gone wrong. It might even be so subtle as to go unnoticed, which could be very bad for obvious reasons.
This is causing me some concern, and making me think that DirectoryStore should at least fail (i.e., generate an exception) if an attempt is made by two processes to concurrently write to the same key. That way, a user would realise as in #277 that something was wrong, and they need to implement a solution.
I also wonder if a possible solution might be that, instead of each atomic write opening a completely new temporary file, whether there should be just one temporary file for each key to which data are initially written, and this file is opened in exclusive mode (i.e., mode='x') so that if two concurrent threads or processes attempt to write an error is generated.
I.e., the code currently from this line would become something like:
This also relates to #325 as there is discussion there about how to open a file for writing to.
The text was updated successfully, but these errors were encountered: