-
Notifications
You must be signed in to change notification settings - Fork 291
block for lock during transcript.inplace
#6075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9046950 to
bf0caff
Compare
.github/workflows/ci.yaml
Outdated
|
|
||
| - name: verify stack ghci startup | ||
| if: runner.os == 'macOS' && steps.cache-ucm-binaries.outputs.cache-hit != 'true' | ||
| if: runner.arch == 'x64' && runner.os == 'Linux' && steps.cache-ucm-binaries.outputs.cache-hit != 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to a different PR?
.github/workflows/ci.yaml
Outdated
| with: | ||
| path: ${{ env.runtime_tests_codebase }} | ||
| key: runtime-tests-codebase-${{env.runtime_tests_causalhash}} | ||
| key: runtime-tests-codebase-${{ matrix.os }}-${{env.runtime_tests_causalhash}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure whether we need this
.github/workflows/ci.yaml
Outdated
| if: steps.cache-interpreter-test-results.outputs.cache-hit != 'true' | ||
| run: | | ||
| echo "passing=true" >> "${{env.interpreter_test_results}}" | ||
| - name: setup tmate session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could leave this in for convenience?
|
|
||
| if [ -z "$1" ]; then | ||
| stack build | ||
| stack build --fast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dolio any objections to leaving this in? if incurring a rebuild with optimizations disabled is a concern, you can also pass in your already-built unison to this script as $1
4e0c5c2 to
f34fe5f
Compare
transcript.inplace
the transcript.inplace runner seems to open the codebase for migrations, then opens it a second time to tun the actual transcript. there seems to be a race condition in which releasing the lock the first time doesn't actually block for the lock to be released; and in fact it doesn't seem to be released in time to acquire the lock the second time, and we were seeing crashes in CI as a result. there were a few other places where it looked like it might be fine and good (though not necessary) to also block for the lock, but i left them out in order to have a more surgical PR.
141f0d1 to
d2ed60f
Compare
Overview
The
transcript.inplacerunner seems to open the codebase for migrations, then opens it a second time to tun the actual transcript. There seems to be a race condition in which releasing the lock the first time doesn't actually block for the lock to be released; and in fact it doesn't seem to be released in time to acquire the lock the second time, and we were seeing crashes in CI as a result. cc @dolioNot sure why this locking issue just started manifesting recently.
What does this change accomplish and why?
Include "before and after" examples if appropriate. (You can copy/paste screenshots directly into this editor.)
List any Github issues that this PR closes, in closing-issues-using-keywords format.
n/a
Implementation approach and notes
Adds a new locking type which prints a message and blocks until it can acquire a lock, if it can't immediately acquire one. Then it uses that new locking type when trying to open the codebase the second time after running migrations before running the transcript.
Interesting/controversial decisions
There were a few other places where it looked like it might be fine and good (though not necessary) to also block for the lock, but I left them out in order to have a more surgical PR.
Test coverage
CI has started passing again
https://github.com/unisonweb/unison/actions/runs/20403439799/job/58629670481?pr=6075
vs
https://github.com/unisonweb/unison/actions/runs/20391436262/job/58601486802?pr=6075
not needed
Loose ends
Not exactly lose ends because they're not related to this issue, but see the remaining open conversations.
Also, like with any PR I'm reminded that CI is too slow, and could maybe use mOrE cAcHiNg
Final checklist
.cabalfiles, make sure thepackage.yamlfiles are up-to-date instead.