Skip to content

fix reload failures on active/incomplete orphans #6817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: 8.4.x
Choose a base branch
from

Conversation

dwsutherland
Copy link
Member

@dwsutherland dwsutherland commented Jun 26, 2025

closes #6814
closes #6815

Note - A good way to see the bugs is to comment out a fix and run the test for each.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@dwsutherland dwsutherland added this to the 8.4.4 milestone Jun 26, 2025
@dwsutherland dwsutherland self-assigned this Jun 26, 2025
@dwsutherland dwsutherland added the bug Something is wrong :( label Jun 26, 2025
@@ -0,0 +1 @@
Fixes reload failures about orphaned tasks.
Copy link
Member

@wxtim wxtim Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cryptic IMO

Suggested change
Fixes reload failures about orphaned tasks.
Fixes a bug where reload removed information about orphaned tasks from the data store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even simpler:

Suggested change
Fixes reload failures about orphaned tasks.
Fixes a couple of rare scheduler crashes.

Copy link
Member

@hjoliver hjoliver Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should at least hint at what triggers the bug. How about this?

Suggested change
Fixes reload failures about orphaned tasks.
Fixes two rare bugs associated with reloading the workflow configuration
after removing tasks or xtriggers from it.

start,
log_filter,
):
"""Reload should not fail about orphaned tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Reload should not fail about orphaned tasks.
"""Reload should not remove orphaned task data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug was not data being removed, it was the scheduler crashing

Comment on lines +160 to +161
- Removal of both xtrigger and associated active/incomplete task.
- Broadcast deltas generated after reload.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that I have the issue mapping the right way around!

Suggested change
- Removal of both xtrigger and associated active/incomplete task.
- Broadcast deltas generated after reload.
- Removal of both xtrigger and associated active/incomplete task.
https://github.com/cylc/cylc-flow/issues/6814
- Broadcast deltas generated after reload.
https://github.com/cylc/cylc-flow/issues/6815

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the other way round

Comment on lines +164 to +166
'scheduler': {
'allow implicit tasks': True
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required: The fixture adds this as a default. 😄

Suggested change
'scheduler': {
'allow implicit tasks': True
},

Comment on lines +178 to +180
'scheduler': {
'allow implicit tasks': True
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'scheduler': {
'allow implicit tasks': True
},

@@ -0,0 +1 @@
Fixes reload failures about orphaned tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even simpler:

Suggested change
Fixes reload failures about orphaned tasks.
Fixes a couple of rare scheduler crashes.

Comment on lines +160 to +161
- Removal of both xtrigger and associated active/incomplete task.
- Broadcast deltas generated after reload.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the other way round

schd = scheduler(id_)
async with start(schd):
# spawn in bar
foo = schd.pool.get_tasks()[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
foo = schd.pool.get_tasks()[0]
foo = schd.pool._get_task_by_id('20010101T0000Z/foo')

Comment on lines +195 to +196
bar = schd.pool.get_tasks()[0]
assert bar.identity == '20010101T0000Z/bar'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bar = schd.pool.get_tasks()[0]
assert bar.identity == '20010101T0000Z/bar'
bar = schd.pool._get_task_by_id('20010101T0000Z/bar')

Comment on lines +193 to +194
schd.pool.task_events_mgr.process_message(
foo, '20010101T0000Z', 'succeeded')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd arg is severity, not cycle point

Suggested change
schd.pool.task_events_mgr.process_message(
foo, '20010101T0000Z', 'succeeded')
schd.pool.task_events_mgr.process_message(foo, 'INFO', 'succeeded')

Comment on lines +198 to +199
schd.pool.task_events_mgr.process_message(
bar, '20010101T0000Z', 'failed')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
schd.pool.task_events_mgr.process_message(
bar, '20010101T0000Z', 'failed')
schd.pool.task_events_mgr.process_message(bar, 'INFO', 'failed')

):
"""Reload should not fail about orphaned tasks.

The following aspects of reload about orphans are tested:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can "reload about orphans" be rephrased more clearly? "reload after removing tasks from the workflow config" maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :( small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reload fail on xtrigger removal causing depleted task pool set non-existent task while reloading causes crash
5 participants