Task Runner: Intermittent: Aggregator process is killed after restart without any error/exception log

**Describe the bug**
This is an intermittent issue where after restart of aggregator process (by killing the process id and starting it again), it gets killed on its own with no error/exception logs to indicate the reason.

The resiliency test failing because of this is part of PR and PQ pipelines which are otherwise quite stable. 

**To Reproduce**
Steps to reproduce the behavior:
1. Start the federation with `torch/mnist`, 2 collaborators and 10+ rounds.
2. Ensure that the rounds are increasing.
3. Restart aggregator
4. Aggregator is silently gone with collaborators running and still trying to connect to it.

Example failures - 

1. When only aggregator restarts - https://github.com/securefederatedai/openfl/actions/runs/14839141823/job/41657945065#step:4:205

    [aggregator.log](https://github.com/user-attachments/files/20204298/aggregator.log)

2. When aggregator and all collaborators restart - https://github.com/securefederatedai/openfl/actions/runs/15014267823/job/42188592296#step:4:322
  
    [aggregator.log](https://github.com/user-attachments/files/20203592/aggregator.log) - where `Starting the Aggregator Service.` appears thrice indicating 3 start/restarts, but no error/exception etc.

**Expected behavior**
Irrespective of number/stage of restart for any participant, it should be able to come up and join the federation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Task Runner: Intermittent: Aggregator process is killed after restart without any error/exception log #1620

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Task Runner: Intermittent: Aggregator process is killed after restart without any error/exception log #1620

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions