Skip to content

Running pub global run concurrently #3165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sigurdm opened this issue Oct 1, 2021 · 21 comments · Fixed by #3285
Closed

Running pub global run concurrently #3165

sigurdm opened this issue Oct 1, 2021 · 21 comments · Fixed by #3285
Labels
type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@sigurdm
Copy link
Contributor

sigurdm commented Oct 1, 2021

If the snapshot is missing we do deleteBinstubs before recreating them, this opens a window for a race condition.

@dcharkes
Copy link
Contributor

dcharkes commented Oct 1, 2021

Thanks!

cc @cskau-g

The PR that started using this: flutter/flutter#87231

@gspencergoog
Copy link
Contributor

gspencergoog commented Oct 1, 2021

I just tried a test: I ran a short-running command: dart pub global run snippets --help, 10000 times concurrently on a 72-core machine. It didn't have an incident, and completed all 10000 runs.

However, if I did this instead:

  1. remove ~/.pub-cache
  2. dart pub global activate snippets 0.2.3
  3. invoked the same 10000 runs of dart pub global run snippets --help concurrently

Then it fails, all of the executions crash, probably because on the first run after install, 72 cores are busy trying to build the package executable, and something isn't quite locked right.

(I used the process_runner app in https://pub.dev/packages/process_runner to do the concurrent runs, which makes a task queue that keeps 72 running at the same time)

@gspencergoog
Copy link
Contributor

Here's some of the stderr output:

Dart_LoadScriptFromKernel: The binary program does not contain 'main'.
Cannot delete file, path = '/usr/local/google/home/gspencer/.pub-cache/bin/snippets' (OS Error: No such file or directory, errno = 2)
Could not make "/usr/local/google/home/gspencer/.pub-cache/bin/snippets" executable (exit code 1):
chmod: cannot access '/usr/local/google/home/gspencer/.pub-cache/bin/snippets': No such file or directory
Can't load Kernel binary: Invalid kernel binary: Indicated size is invalid.
Dart_LoadScriptFromKernel: The binary program does not contain 'main'.

(I eliminated duplicate lines)

@dcharkes
Copy link
Contributor

dcharkes commented Oct 1, 2021

chmod: cannot access '/usr/local/google/home/gspencer/.pub-cache/bin/snippets': No such file or directory

That is exactly what we were seeing on the flutter analyze step on the HHH bot. Thanks @gspencergoog!

@ghost
Copy link

ghost commented Oct 4, 2021

chmod: cannot access '/usr/local/google/home/gspencer/.pub-cache/bin/snippets': No such file or directory

That is exactly what we were seeing on the flutter analyze step on the HHH bot. Thanks @gspencergoog!

In fact all of those errors you list, Greg, is what we've been seeing. Which is great because it ties it all together and points to the same underlying issue.

@sigurdm, I'm surprised by Greg's findings above as I thought for sure we'd established that activate should have created the binstub at that point, and any subsequent runs should find and run that without any further steps.
(In fact I even tested this myself by running activate and checking the cache..)
I guess there's some step we've missed?

@sigurdm
Copy link
Contributor Author

sigurdm commented Oct 4, 2021

I'm also a bit puzzled. @gspencergoog can you share the script/setup you used?

I tried something like:

export PUB_CACHE=`mktemp -d`

echo $PUB_CACHE

FILE=`mktemp`
/bin/cat <<EOF > $FILE
  pub global run snippets --help 2> /dev/null || echo failure!
  echo .
EOF

pub global activate snippets 0.2.3
seq 1 500 | xargs -n 1 -P 0 bash $FILE

But did not seem to provoke any failures.

@sigurdm
Copy link
Contributor Author

sigurdm commented Oct 4, 2021

It would be interesting to bisect this, and see if it has anything to do with the support for incremental compilation: #3074

@gspencergoog
Copy link
Contributor

gspencergoog commented Oct 4, 2021

This fails consistently for me after about 10-15 seconds (when the first failures start returning).

rm -rf ~/.pub-cache
rm -f /tmp/pub_race.log
pub global activate process_runner
pub global activate snippets

cat <<EOF > do_pub.sh
#!/bin/bash
pub global run snippets --help 2>> /tmp/pub_race.log
EOF
chmod +x do_pub.sh

yes "./do_pub.sh" | head -1000 | pub global run process_runner --report --source=-

I assume you have the same monster machine as I do, so it should fail for you too.

@gspencergoog
Copy link
Contributor

gspencergoog commented Oct 4, 2021

Oh, and just to point out that if you have run it once, and don't clean the pub cache, and just run:

yes "do_pub.sh" | head -1000 | pub global run process_runner --report --source=-

then all the runs succeed. It has something to do with the simultaneous building of the snapshots.

@sigurdm
Copy link
Contributor Author

sigurdm commented Oct 5, 2021

My computer seemingly is not beefy enough to reproduce (16 cores) :(

@dcharkes I heard rumors that you have a mean machine, could you try running the script above? (I had to change the last line to yes "./do_pub.sh" | head -1000 | pub global run process_runner --report --source=- to make it run).

@jakemac53 Do you think this could be related to the incremental compilation change? My intuition says that after pub global activate there should be no changes to the activated package, and thus no recompilation needed, but clearly something like that is happening here.

@dcharkes
Copy link
Contributor

dcharkes commented Oct 5, 2021

I do not get a reproduction on my beefy machines (nor MacBook) either.

Script used:

rm -rf ~/.pub-cache
rm -f /tmp/pub_race.log
dart pub global activate process_runner
dart pub global activate snippets

cat <<EOF > do_pub.sh
#!/bin/bash
dart pub global run snippets --help 2>> /tmp/pub_race.log
EOF
chmod +x do_pub.sh

yes "./do_pub.sh" | head -1000 | dart pub global run process_runner --report --source=-

Dart version used:

dart --version
Dart SDK version: 2.15.0-edge.091338d7a3359f60d0acff7b9972c839b90600ed (be) (Tue Sep 28 04:30:05 2021 +0000) on "linux_x64"

dart --version
Dart SDK version: 2.15.0-168.0.dev (dev) (Thu Sep 30 12:23:13 2021 -0700) on "linux_x64"

dart --version
Dart SDK version: 2.15.0-82.0.dev (dev) (Sat Sep 4 03:33:09 2021 -0700) on "macos_x64"

@dcharkes dcharkes changed the title Investigate thread safety of pub global run Running pub global run concurrently Oct 5, 2021
@gspencergoog
Copy link
Contributor

My computer seemingly is not beefy enough to reproduce (16 cores) :(

Sounds like a perfect excuse for an upgrade! :-)

I do not get a reproduction on my beefy machines (nor MacBook) either.

Huh, OK, I wonder what the difference is. I get a repro pretty much every time I run it. I'll try and bisect it today.

@gspencergoog
Copy link
Contributor

OK, I'm really confused now. I can't get it to reproduce now, despite having run it four or five times in a row yesterday and had it happen every time.

@gspencergoog
Copy link
Contributor

gspencergoog commented Oct 5, 2021

Could it have something to do with package extraction?

After rolling back to yesterday's Flutter, I just started to be able to reproduce the failure again, and it looks like this:

Installed executables extract_sample and snippets.
Warning: Executable "extract_sample" runs "bin/extract_sample.dart", which was not found in snippets.
Warning: Executable "snippets" runs "bin/snippets.dart", which was not found in snippets.
Activated snippets 0.2.5.
^Cbs:   4% done,   7/1000 completed, 27 in progress, 928 pending,  38 failed.    

And the errors were:

Could not make "/usr/local/google/home/gspencer/.pub-cache/bin/snippets" executable (exit code 1):
chmod: cannot access '/usr/local/google/home/gspencer/.pub-cache/bin/snippets': No such file or directory
Can't load Kernel binary: Invalid kernel binary: Indicated size is invalid.
Cannot open file, path = '/usr/local/google/home/gspencer/.pub-cache/bin/snippets' (OS Error: No such file or directory, errno = 2)
Can't load Kernel binary: Invalid kernel binary: Indicated size is invalid.
Cannot delete file, path = '/usr/local/google/home/gspencer/.pub-cache/bin/snippets' (OS Error: No such file or directory, errno = 2)
Cannot delete file, path = '/usr/local/google/home/gspencer/.pub-cache/bin/snippets' (OS Error: No such file or directory, errno = 2)
Cannot open file, path = '/usr/local/google/home/gspencer/.pub-cache/bin/snippets' (OS Error: No such file or directory, errno = 2)
Cannot open file, path = '/usr/local/google/home/gspencer/.pub-cache/bin/snippets' (OS Error: No such file or directory, errno = 2)
Could not make "/usr/local/google/home/gspencer/.pub-cache/bin/snippets" executable (exit code 1):
chmod: cannot access '/usr/local/google/home/gspencer/.pub-cache/bin/snippets': No such file or directory
Can't load Kernel binary: Invalid kernel binary: Indicated size is invalid.
Can't load Kernel binary: Invalid kernel binary: Indicated size is invalid.
Can't load Kernel binary: Invalid kernel binary: Indicated size is invalid.
Can't load Kernel binary: Invalid kernel binary: Indicated size is invalid.

(and so on)

The interesting thing to me are the warnings during the activation: they seem to indicate that a check for the executables is happening before the package is done being unpacked, and it doesn't do the normal "Building package executables" stage (presumably because it doesn't think they exist). I don't remember seeing that yesterday, but I suppose I could have missed it.

This is the dart version (from yesterday's Flutter) that I can reproduce this in:

Dart SDK version: 2.15.0-170.0.dev (dev) (Fri Oct 1 00:36:01 2021 -0700) on "linux_x64"

The corresponding Flutter hash is 1b73a35fba3d5a4dfbeae42107d9a7227bac1615 for the Dart version I was using, which corresponds to the Flutter Engine hash 74fdd30dc25e91d355ff18fdf52de27b7713f11a.

I was able to reproduce this multiple (at least 20) times with that version of Dart, and it reproduced every time. But not with a more current version on Flutter's HEAD from today (Dart SDK version: 2.15.0-177.0.dev (dev) (Sun Oct 3 13:45:39 2021 -0700) on "linux_x64", Flutter hash 98120d19d35c8f13721973b8cc42501805032829)

The script I was able to reproduce it with on yesterday's Dart was:

#!/bin/bash

DART="/usr/local/google/home/gspencer/code/flutter/bin/cache/dart-sdk/bin/dart"
TMPDIR=$(mktemp -d /tmp/pub_test.XXXXX)
echo "Tmpdir is $TMPDIR"

DO_PUB="$TMPDIR/do_pub.sh"
LOG_FILE="$TMPDIR/pub_race.log"
CMD_FILE="$TMPDIR/pub_cmd.txt"

rm -f "$LOG_FILE"
cat <<EOF > "$DO_PUB"
#!/bin/bash
"$DART" pub global run snippets --help 2>> "$LOG_FILE"
EOF
chmod +x "$DO_PUB"

rm -rf ~/.pub-cache
"$DART" pub global activate process_runner 4.1.2
"$DART" pub global activate snippets 0.2.5
yes "$DO_PUB" | head -1000 > "$CMD_FILE"
"$DART" pub global run process_runner --report --source="$CMD_FILE"

Interestingly, the following does NOT fail at all with that same revision:

#!/bin/bash

DART="/usr/local/google/home/gspencer/code/flutter/bin/cache/dart-sdk/bin/dart"
TMPDIR=$(mktemp -d /tmp/pub_test.XXXXX)
PUB_CACHE=$TMPDIR/pub-cache
export PUB_CACHE
echo "Pub cache is $PUB_CACHE"

DO_PUB="$TMPDIR/do_pub.sh"
LOG_FILE="$TMPDIR/pub_race.log"
CMD_FILE="$TMPDIR/pub_cmd.txt"
rm -f "$LOG_FILE"
cat <<EOF > "$DO_PUB"
#!/bin/bash
export PUB_CACHE
PUB_CACHE="$PUB_CACHE"
"$DART" pub global run snippets --help 2>> "$LOG_FILE"
EOF
chmod +x "$DO_PUB"

"$DART" pub global activate process_runner 4.1.2
"$DART" pub global activate snippets 0.2.5
yes "$DO_PUB" | head -1000 > "$CMD_FILE"
"$DART" pub global run process_runner --report --source="$CMD_FILE"

I checked to make sure there weren't any other dart processes running, and that my ~/.pub-cache stayed empty when running the second script (to make sure there wasn't something else populating it in the background somehow). I was suspecting multiple dart processes of accessing the ~/.pub-cache at the same time, but that doesn't appear to be the case.

@gspencergoog
Copy link
Contributor

So it seems like some change may have fixed it between 2.15.0-170.0.dev and 2.15.0-177.0.dev.

@gspencergoog
Copy link
Contributor

gspencergoog commented Oct 5, 2021

One difference between the outcome of the two scripts could possibly be because one is writing to a memory filesystem (/tmp is a "tmpfs" memdisk on my machine), and one is writing to a physical (SSD) filesystem. So it could just be that changing that means one wins the race and the other doesn't.

EDIT: I tested this, and the second script still works even if I change the TMPDIR to a non-/tmp directory, so that's not it.

Anecdotally, I did find that PUB_CACHE doesn't really like being a relative path: it had all kinds of "file not found" errors when I just removed the "/tmp" from the mktemp, I had to use TMPDIR=$(mktemp -d $PWD/pub_test.XXXXX). Here's a sample of the error when PUB_CACHE was set to pub_test.k1DRq/pub-cache without an absolute path:

Failed to build snippets:snippets:
pub_test.k1DRq/pub-cache/hosted/pub.dartlang.org/snippets-0.2.5/bin/snippets.dart:7:8: Error: Error when reading 'pub_test.k1DRq/pub-cache/global_packages/snippets/pub_test.k1DRq/pub-cache/hosted/pub.dartlang.org/args-2.3.0/lib/args.dart': No such file or directory
import 'package:args/args.dart';

@sigmundch
Copy link
Member

I'm curious if there are any updates, this is still indirectly causing flakes daily on the Flutter HHH bots.

@gspencergoog
Copy link
Contributor

I stopped collecting data because I was no longer able to reproduce the problem.

@gspencergoog
Copy link
Contributor

I just ran the scripts above again, and I still can't repro the problem with Dart version 2.16.0-85.0.dev on Linux on my 72-core machine.

@sigmundch
Copy link
Member

thanks for looking into this @sigurdm !

@salmankhilji
Copy link

I am getting a 100% repro of this bug if I attempt to dart pub global activate --source path . a custom version of the snippets tool. I am still running version 0.2.5 with no changes whatsoever.

After manually activating a custom version of dart, I then commented out the following lines in docs.sh:

    # Install and activate the snippets tool, which resides in the
    # assets-for-api-docs repo:
    # https://github.com/flutter/assets-for-api-docs/tree/master/packages/snippets
    # >>> If you update this version, also update it in dev/bots/analyze_sample_code.dart <<<
    # "$DART" pub global activate snippets 0.2.5

    # Run the snippets tool once to force building of the package executable,
    # since "dart pub global run" has issues with startup concurrency.
    # TODO(gspencergoog): Remove once pub issue is fixed, https://github.com/dart-lang/pub/issues/3165
    # "$DART" pub global run snippets --help

Then simply running dev/bots/docs.sh from $FLUTTER_ROOT takes about a minute or so to crash my machine with all 32GB of RAM taken up.

The weird thing is that no matter how many times I run the following command:

$ dart pub global run snippets --help
Building package executable... 
Built snippets:snippets.

It always insists on building the snippets binary. This may be the root cause of this bug.

$ flutter --version
Flutter 2.10.2 • channel stable • https://github.com/flutter/flutter.git
Framework • revision 097d3313d8 (3 days ago) • 2022-02-18 19:33:08 -0600
Engine • revision a83ed0e5e3
Tools • Dart 2.16.1 • DevTools 2.9.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
5 participants