-
Notifications
You must be signed in to change notification settings - Fork 652
Data migrations worker rework #27014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data migrations worker rework #27014
Conversation
CI test resultstest results on build#69755
test results on build#69827
test results on build#69949
test results on build#70067
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this fixing a bug we can reference to justify the backport or is ther more context around the motivation for backproting this?
This reverts commit 48f2cce.
03bd809
to
ee1c027
Compare
@dotnwat just the bug. It invokes an UB, and we're lucky (or ignorant) it didn't result in anything serious. Should I add a release note line about it? |
@bashtanov can you add a bit more detailed commit message for the second commit in this PR ? Please provide the motivation for changes and describe the idea behind the new work tracking logic |
3acb6f0
to
e272f5f
Compare
Can you expand on this and why a refactor is needed as opposed to backporting a fix for UB and refactoring in upstream? |
@dotnwat I've updated the main commit message and the PR description accordingly. I cannot think of a way to eliminate the UB without major chages in the logic. We do need to store up to two "work info" objects per NTP, and the rest of the change is about juggling them correctly. |
I'm probably vastly simplifying things, but
Sounds like it's just a matter of protecting a shared data structure? |
Invalid memory access is not the only problem that needs to be fixed. Allowing concurrent work on the same NTP is also wrong because they can conflict or run in the wrong order. E.g. imagine a migration progressing up to some point and then cancelled. Quite possibly operations on affected partitions will be something opposite there, e.g. block writes when preparing to migrate away and then unblock to cancel the migration.
Well, it is not shared, we need to store both separately, as one is needed for the running task and the other one for the requested one. We need them both physically on the shard, so we either need to alter To protect from concurrent execution we would need a mutex. It would introduce an implicit queue of those waiting for it, while in reality we need a simpler logic, as only the last one in the queue is needed. All in all, my attempts to make less changes resulted in logic still quite broken or at least in much less confidence in its correctness. |
@bashtanov please capture our discussions in the commit message. a person reading the commit in the future should be able to
|
Previously work tracking logic was flawed. When for an NTP a work was already running and backend requested another work (e.g. for a different stage or for a new migration), it was handled inappropriately. 1) Existing running work was only aborted by triggering abort source, but not waited to actually stop. As a result, works on the same NTP could run concurrently, in particular in the wrong order. This might lead to incorrect results, e. g. topic writes unblocked then blocked instead of the other way round. 2) Work info, which is supplementary data a work uses, was overwritten by the new one. The old work which was still running might access deallocated or reused memory where the old work info was, thus triggering an UB. 3) Per-work abort sources were not in use, only main one was. Reorganised logic: 1) Allow no more than one running work per NTP and no more than one queued. Semaphores and mutexes would not be able to help with it as they maintain a potentially long queue of waiters. We allow only one waiting. 2) Store its belongigns separately from the one requested if they are different. 3) Use both main and individual abort sources.
confluentinc/librdkafka#4963 Failed to fetch committed offsets for 0 partition(s)
e272f5f
to
12a4420
Compare
/backport v25.2.x |
/backport v25.1.x |
/backport v24.3.x |
/backport v24.2.x |
Failed to create a backport PR to v24.3.x branch. I tried:
|
Failed to create a backport PR to v25.1.x branch. I tried:
|
Failed to create a backport PR to v24.2.x branch. I tried:
|
Failed to create a backport PR to v25.2.x branch. I tried:
|
Previously work tracking logic was flawed. When for an NTP a work was
already running and backend requested another work (e.g. for a different
stage or for a new migration), it was handled inappropriately:
but not waited to actually complete;
by the new one; the old work which was still running might access
deallocated or reused memory where the old work info was;
Reorganised logic:
different;
Backports Required
Release Notes