Skip to content

Pub tests leave zombie processes #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DartBot opened this issue Jun 5, 2015 · 11 comments
Closed

Pub tests leave zombie processes #75

DartBot opened this issue Jun 5, 2015 · 11 comments
Assignees
Labels
type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@DartBot
Copy link

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/46275?v=3" align="left" width="96" height="96"hspace="10"> Issue by munificent
Originally opened as dart-lang/sdk#4742


Run:

./tools/test.py --checked --arch=all

The VM guys have seen that that often leaves stale processes hanging around.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/2108507?v=3" align="left" width="48" height="48"hspace="10"> Comment by dgrove


We need to get this fixed asap.


Added this to the M1 milestone.
Removed Priority-Medium label.
Added Priority-High label.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/46275?v=3" align="left" width="48" height="48"hspace="10"> Comment by munificent


This appears to be a dart:io issue. When a process is killed, it doesn't kill its child processes. Here's a test case to repro:

// a.dart
#import('dart:io');
#import('dart:isolate');
void main() {
  var process = Process.start('dart', ['b.dart']);
  new Timer(1000, (_) => process.kill());
}

// b.dart
#import('dart:io');
void main() => Process.start('dart', ['c.dart']);

// c.dart
#import('dart:isolate');
void main() {
  var i = 0;
  new Timer.repeating(200, (_) => print(i++));
}

If you run a.dart, you'll be left with a zombie dart process that's running c.dart forever.


Removed Area-Pub label.
Added Area-IO, Triaged labels.

@DartBot DartBot added type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) Fixed labels Jun 5, 2015
@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/46275?v=3" align="left" width="48" height="48"hspace="10"> Comment by munificent


Set owner to @whesse.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/2909286?v=3" align="left" width="48" height="48"hspace="10"> Comment by madsager


cc @whesse.
cc @sgjesse.
Set owner to @madsager.
Added Accepted label.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/2909286?v=3" align="left" width="48" height="48"hspace="10"> Comment by madsager


Bob, this is standard Linux/Mac behavior and it is not easy to do much about. The question is why you would need to use kill on a parent process in the first place? To avoid creating zombie processes your processes should wait for its child processes to finish. Do you have any idea why that is not happening in pub?


Set owner to @munificent.
Removed Area-IO label.
Added Area-Pub label.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/2909286?v=3" align="left" width="48" height="48"hspace="10"> Comment by madsager


I haven't been able to reproduce this on my machine. Are people aborting their test runs using Ctrl-C or similar? Are tests timing out when this happens?

Reproduction steps from the people seeing this would be great.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/5032565?v=3" align="left" width="48" height="48"hspace="10"> Comment by sgmitrovic


This happens on Mac OS x when running

./tools/test.py --checked --arch=all

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/46275?v=3" align="left" width="48" height="48"hspace="10"> Comment by munificent


It's not pub doing this. It's test.dart. The flow is like this:

  1. test.dart spawns a process to run pub_test.dart.
  2. pub_test.dart spawns a process to invoke pub.
  3. test.dart decides the test has timed out and kills pub_test.dart.
  4. The pub process pub_test.dart spawned is left hanging around.

This can also probably happen if you just Ctrl-C test.dart too.

Of course, pub tests timing out is a problem, but fixing that is a different issue. We've split the suites up into smaller ones, which helps, but they are still generally more heavyweight than your typical "run a few lines of code than exit" test. I can do some tweaking to make sure they have more time.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/2909286?v=3" align="left" width="48" height="48"hspace="10"> Comment by madsager


Thanks for the context Bob. As far as I can tell, the only thing we can do here is to make sure that these tests have enough time to run (or cut them down so they do).

Parent processes have to wait for their children. If they don't you will get zombies. That is the process model and I don't know of any way to change that. I'm more than happy to take input if any of you know how to reliably kill a process subtree by killing the root without creating zombies.


cc @iposva-google.
cc @a-siva.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/5449880?v=3" align="left" width="48" height="48"hspace="10"> Comment by iposva-google


It seems to be the case that these zombie processes spawned by pub are waiting indefinitely on some information from the main process. This would be an issue even for command line pub (not driven) by the test harness if the user runs out of patience and kills the main pub process. You cannot expect the user to then hunt down and kill the different sub processes spawned by pub.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/188?v=3" align="left" width="48" height="48"hspace="10"> Comment by nex3


Pub now has a 30s timeout on all of its operations that are likely to cause it to persist forever. This should mean that no zombie process persists for more than a minute or so, even if the parent process is ungracefully shut down. If you continue to see persistent zombie processes, please let us know.


Added Fixed label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

2 participants