Skip to content

std: Return Result from RWLock/Mutex methods #19661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 30, 2014

Conversation

alexcrichton
Copy link
Member

All of the current std::sync primitives have poisoning enable which means that
when a task fails inside of a write-access lock then all future attempts to
acquire the lock will fail. This strategy ensures that stale data whose
invariants are possibly not upheld are never viewed by other tasks to help
propagate unexpected panics (bugs in a program) among tasks.

Currently there is no way to test whether a mutex or rwlock is poisoned. One
method would be to duplicate all the methods with a sister foo_catch function,
for example. This pattern is, however, against our error guidelines.
As a result, this commit exposes the fact that a task has failed internally
through the return value of a Result.

All methods now return a LockResult<T> or a TryLockResult<T> which
communicates whether the lock was poisoned or not. In a LockResult, both the
Ok and Err variants contains the MutexGuard<T> that is being returned in
order to allow access to the data if poisoning is not desired. This also means
that the lock is always held upon returning from .lock().

A new type, PoisonError, was added with one method into_guard which can
consume the assertion that a lock is poisoned to gain access to the underlying
data.

This is a breaking change because the signatures of these methods have changed,
often incompatible ways. One major difference is that the wait methods on a
condition variable now consume the guard and return it in as a LockResult to
indicate whether the lock was poisoned while waiting. Most code can be updated
by calling .unwrap() on the return value of .lock().

[breaking-change]

@aturon
Copy link
Member

aturon commented Dec 14, 2014

cc @wycats @nikomatsakis

We don't necessarily have to add this now, but it seems like you might want some additional flexibility:

  • The ability to recover use of a lock after it has been poisoned. For example, if you overwrite the data protected by the lock with a new value.
  • Possibly, the ability to read from a poisoned lock. This could be done by returning the previously-held data as part of the Err component -- basically alerting you to the fact that the data is suspect due to another thread's panic, but not preventing you from accessing it outright. Of course, you can work around this by protecting something like () and having the real data stored in an UnsafeCell, but that's not very ergonomic.

What do you think about these cases?

As an aside, I still feel slightly uneasy about the division of places where we try to help with exception safety and places where we don't. For example, a function down the stack could panic while using an &mut reference to data that is owned higher up the stack, and the latter's destructor will see it in whatever state it was left in. Granted, in that case you're talking purely about destructors during unwinding, whereas with mutexes the data may be seen at an arbitrary point in time. But it would be good to try to lay out the distinction very clearly, in docs or a blog post or something.

@aturon
Copy link
Member

aturon commented Dec 14, 2014

(Otherwise, the PR looks like a fine implementation of what was discussed at the workweek.)

@alexcrichton
Copy link
Member Author

Thinking more on this, I may be personally leaning more towards removing poisoning altogether or moving it to an opt-in primitive. For example, I can think of a few drawbacks:

  • In returning Result, I forgot that you can ignore the result of Condvar::wait which means you can read stale data even though it's been panicked over. This seems like a blocker for this PR.
  • We could plausibly add PoisonError::replace, but this causes a bit of a schism in the RWLock api. For example the error type on read() would have to be different than write() (you can only replace on the write lock). It also means that the PoisonError itself would have to carry the payload of the lock guard, which may mean that it's difficult to use with try! as it means that the type can't implement Error and you'd have to get rid of the borrowed reference quickly to propagate upwards.
  • I, too, am a bit worried that we have poisoning on a Mutex but not a RefCell, for example. We've said in the past that intra-task inspection of stale state is semi-ok (why RefCell doesn't poison) while inter-task inspection is bad (which is why Mutex panics). All these issues about panicking and recovering from this with a Mutex and RWLock seem too tricky to stabilize on, however.

@sfackler
Copy link
Member

The more I think about it, the more removing poisoning makes sense to me.

@thestinger
Copy link
Contributor

This is making everyone pay for the fact that some people want to use exception handling. Harming the API for everyone in order to support this niche feature doesn't make sense.

@thestinger
Copy link
Contributor

@sfackler: Poisoning could be removed, but that's the end of restricted exception handling. It would mean catching an exception in the same task would have to be safe too. I think that's saner than pretending Rust doesn't have exception handling.

@alexcrichton alexcrichton force-pushed the mutex-result branch 3 times, most recently from 427c2bc to c16ef0b Compare December 22, 2014 20:44
@alexcrichton
Copy link
Member Author

re-r? @aturon

I've updated the PR description/commit message to reflect the tweaked state of affairs from our discussion.

@alexcrichton alexcrichton force-pushed the mutex-result branch 2 times, most recently from 0070659 to dcc862c Compare December 29, 2014 17:05
panicking: bool,
#[allow(missing_copy_implementations)]
pub struct Guard {
panicking: uint,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why uint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I think I forgot to remove this after playing around with it. Whatever reason I had to put this here originally is no longer valid!

All of the current std::sync primitives have poisoning enable which means that
when a task fails inside of a write-access lock then all future attempts to
acquire the lock will fail. This strategy ensures that stale data whose
invariants are possibly not upheld are never viewed by other tasks to help
propagate unexpected panics (bugs in a program) among tasks.

Currently there is no way to test whether a mutex or rwlock is poisoned. One
method would be to duplicate all the methods with a sister foo_catch function,
for example. This pattern is, however, against our [error guidelines][errors].
As a result, this commit exposes the fact that a task has failed internally
through the return value of a `Result`.

[errors]: https://github.com/rust-lang/rfcs/blob/master/text/0236-error-conventions.md#do-not-provide-both-result-and-fail-variants

All methods now return a `LockResult<T>` or a `TryLockResult<T>` which
communicates whether the lock was poisoned or not. In a `LockResult`, both the
`Ok` and `Err` variants contains the `MutexGuard<T>` that is being returned in
order to allow access to the data if poisoning is not desired. This also means
that the lock is *always* held upon returning from `.lock()`.

A new type, `PoisonError`, was added with one method `into_guard` which can
consume the assertion that a lock is poisoned to gain access to the underlying
data.

This is a breaking change because the signatures of these methods have changed,
often incompatible ways. One major difference is that the `wait` methods on a
condition variable now consume the guard and return it in as a `LockResult` to
indicate whether the lock was poisoned while waiting. Most code can be updated
by calling `.unwrap()` on the return value of `.lock()`.

[breaking-change]
@aturon
Copy link
Member

aturon commented Dec 29, 2014

r=me after the nit. I think this actually came out really nicely!

@reem
Copy link
Contributor

reem commented Dec 29, 2014

I'm very against this change - in <99% of cases this just replaces .lock() with .lock().unwrap() which I think is a needless ergonomic regression. I think we would be much better served by a family of try_ methods like in RefCell.

@alexcrichton
Copy link
Member Author

Currently we're not planning on stabilizing the try_* variants of methods on RefCell due to them being against our current conventions of not providing both variants. For RefCell I believe we plan on adding some is_borrowed or related methods to query the state, and then an action can be taken.

Unfortunately the same logic cannot necessarily be applied to synchronization primitives because you're unable to perform an atomic "is this ready to go, ok let's go" operation. As a result, we must provide functionality as one method, and our conventions dictate that the method must return a Result (the more general of the two).

The Mutex and RefCell types are pretty different in their use cases despite their similarities. For example a RefCell was chosen to panic by default because it's believed that its usage is quite common, and .borrow().unwrap() would simply be untenable. When contrasted with Mutex, it's actually fairly rare to use a Mutex. Additionally, there's an intrinsic pressure to reduce the number of calls to lock() already (for less serialization), but this pressure does not really exist for RefCell. As a result, the usage of Mutex is actually somewhat uncommon (depending on your application of course).

Consequently, this PR chooses to have mutexes return results (and later have channels return results]) due to the relatively uncommon usage of the primitives, and the richness of information that the error type can convey.

If we were to ignore our convention in this respect, and still provide both variants, this also generates a very real naming problem. For example channels right now use foo and foo_opt as the names of the pair of methods, but this is seen as inadequate (_opt has nothing to do with options). The try_ prefix cannot be used because for synchronization primitives it has a different meaning than RefCell in that it generally means "this does not block." By sticking to our conventions we sidestep this question altogether and create what's actually a pretty uniform interface across all primitives.

We definitely do not want to create "needless ergonomic regressions" when stabilizing primitives, we try to always focus on enhancing functionality and having primitives adhere to existing conventions. This change is directly aligning with our error conventions, and it is clearly indicating the semantics of locks which may have been somewhat hidden before.

@reem
Copy link
Contributor

reem commented Dec 29, 2014

I'm sympathetic to the naming issue as well as the racy nature of "is ready" checks around synchronized structures - I just don't particularly like having operations where in the majority of cases the "right" thing to do is unwrap the result.

This also reduces the quality of the logged error from something about mutex's panicking to "unwrap on None", but that can be mostly mitigated by setting RUST_BACKTRACE.

@alexcrichton
Copy link
Member Author

Note that special care is taken here with the messages generated by .unwrap() on the return value, they should basically be exactly the same as before.

This commit performs a stabilization pass over the sync::{mutex, rwlock,
condvar} modules, marking the following items as stable:

* Mutex
* Mutex::new
* Mutex::lock
* Mutex::try_lock
* MutexGuard
* RWLock
* RWLock::new
* RWLock::read
* RWLock::try_read
* RWLock::write
* RWLock::try_write
* RWLockReadGuard
* RWLockWriteGuard
* Condvar
* Condvar::new
* Condvar::wait
* Condvar::notify_one
* Condvar::notify_all
* PoisonError
* TryLockError
* TryLockError::Poisoned
* TryLockError::WouldBlock
* LockResult
* TryLockResult

The following items remain unstable to explore future possibilities of unifying
the static/non-static variants of the types:

* StaticMutex
* StaticMutex::new
* StaticMutex::lock
* StaticMutex::try_lock
* StaticMutex::desroy
* StaticRWLock
* StaticRWLock::new
* StaticRWLock::read
* StaticRWLock::try_read
* StaticRWLock::write
* StaticRWLock::try_write
* StaticRWLock::destroy

The following items were removed in favor of `Guard<'static, ()>` instead.

* StaticMutexGuard
* StaticRWLockReadGuard
* StaticRWLockWriteGuard
@alexcrichton
Copy link
Member Author

re-r? @aturon the last commit? It places the #[stable] tag on most APIs touched here.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Dec 30, 2014
All of the current std::sync primitives have poisoning enable which means that
when a task fails inside of a write-access lock then all future attempts to
acquire the lock will fail. This strategy ensures that stale data whose
invariants are possibly not upheld are never viewed by other tasks to help
propagate unexpected panics (bugs in a program) among tasks.

Currently there is no way to test whether a mutex or rwlock is poisoned. One
method would be to duplicate all the methods with a sister foo_catch function,
for example. This pattern is, however, against our [error guidelines][errors].
As a result, this commit exposes the fact that a task has failed internally
through the return value of a `Result`.

[errors]: https://github.com/rust-lang/rfcs/blob/master/text/0236-error-conventions.md#do-not-provide-both-result-and-fail-variants

All methods now return a `LockResult<T>` or a `TryLockResult<T>` which
communicates whether the lock was poisoned or not. In a `LockResult`, both the
`Ok` and `Err` variants contains the `MutexGuard<T>` that is being returned in
order to allow access to the data if poisoning is not desired. This also means
that the lock is *always* held upon returning from `.lock()`.

A new type, `PoisonError`, was added with one method `into_guard` which can
consume the assertion that a lock is poisoned to gain access to the underlying
data.

This is a breaking change because the signatures of these methods have changed,
often incompatible ways. One major difference is that the `wait` methods on a
condition variable now consume the guard and return it in as a `LockResult` to
indicate whether the lock was poisoned while waiting. Most code can be updated
by calling `.unwrap()` on the return value of `.lock()`.

[breaking-change]
@bors bors merged commit 35e63e3 into rust-lang:master Dec 30, 2014
blevin added a commit to blevin/kiss3d that referenced this pull request Jan 3, 2015
Main thing is unwrapping RWLock return values -- see:
    rust-lang/rust#19661
The other stuff is moving off deprecated iterator API to
avoid warnings.

TODO: we may need to pull kiss3d_recording out to a separate cargo
project, since it appears Cargo can only produce one lib crate
per project.
@alexcrichton alexcrichton deleted the mutex-result branch January 7, 2015 05:15
lnicola pushed a commit to lnicola/rust that referenced this pull request Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants