Description
The Lin and STM Thread modes do not work as well as the Domain
modes.
Failures to trigger an issue (or unexpectedly triggering one for a change) adds noise to our CI results.
Some past history:
- The Lin Thread mode was added in Tests linearizability with threads #1
- Increase reproducability of Lin Thread tests #24 introduced the idea of using a statistical test to guide improvements
- Add thread for STM #88 adds a corresponding Thread mode for STM
[wip] improve bug finding with Thread #99 tried out some ideas for improvements- generating symbolic
Thread.yield
s - overriding the
Stdlib
with one that triggers moreThread.yield
s than usual
- generating symbolic
As part of the latter, I spent some more time banging my head against the Thread
issue.
In the end I used custom runtime_events
to record when Thread
s would spawn
to understand what we were observing.
This meant I could get a nice diagram like this:
Which confirmed to me that most of our Thread
tests indeed had two Threads
running concurrently (starting and ending roughly at the same time)
I then went one step further and added custom events for run
of one cmd
(a small and fast function). This is the result:
The little arrows mark simultaneous call and return of run
. There are two next to each other, because Thread.yield
triggers right after one (on Thread 0) and then the next happens (on Thread 1). This tells me that the Thread
s are also interleaving - it just happens too deterministically - in between each run
- but very very very rarely in the middle of it.
The above may be helpful to guide further improvement attempts.