Skip to content

Reliably find unused port to start extension backend http service on #1451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

annagrin
Copy link
Contributor

We start extension backend http server with port 0, which creates flakes
in flutter web tests. Find an unused port instead.

Closes: #1450

We start extension backend http server with port 0, which creates flakes
in flutter web tests. Find an unused port instead.

Closes: dart-lang#1450
Copy link
Member

@natebosch natebosch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused - I expect binding on 0 to be more reliable, but more difficult for coordination. Picking a port ahead of time should increase the likelihood of conflict if anything, but we usually do it to make coordinating across uses of the port easier.

@grouma
Copy link
Member

grouma commented Nov 19, 2021

+1 to what @natebosch said. Can you provide more context?

@annagrin
Copy link
Contributor Author

+1 to what @natebosch said. Can you provide more context?

Afaik picking 0 is creating an ephemeral port by the VM, which for some reason introduces flakes in web tool tests, possibly due to races with another process. Given that the port number is assigned by the VM, I have much less control on picking one with this API, so retrying would be a shot in the dark. So changed it to picking a port that is free and retrying binding to it a few times to prevent races, the same way we do for the rest of the http servers (that removed other flakes due to races before so looks reliable).

The coordination does not change - in both cases, the port number is available on server.port.

@natebosch
Copy link
Member

for some reason introduces flakes in web tool tests, possibly due to races with another process

This is what I don't understand. Binding on 0 and allowing the OS to pick the port is what I understood to be the more reliable approach that is less likely to cause race conditions with other processes.

What step is it that fails when it does flake?

@annagrin
Copy link
Contributor Author

annagrin commented Nov 20, 2021

This is what I don't understand. Binding on 0 and allowing the OS to pick the port is what I understood to be the more reliable approach that is less likely to cause race conditions with other processes.

I don't know if the VM or OS has any way to resolve a race between two processes in any of those cases.
/cc: @a-siva @aam do you know if VM or OS is supposed to resolve the race between two processes using HttpMultiServer.bind(hostname, 0) ? From the trace below it looks like it just fails to bind in case the port it has chosen is already in use, is it intentional?

I could add retries to the previous solution to avoid races, but I prefer the current fix for several reasons:

  • we know it works on resolving previous races, so it gives us more information if the tests are still flaky after the change, helping us conclude that either I am wrong in my reasoning about the failure, or that the previous solution is in fact racy:)
  • iterations are too costly - it takes a full cycle of propagating the changes from dwds to google3 to flutter and then monitoring the bot for a couple of weeks until we know if the flake has been resolved.
  • it is used everywhere else in our code, so I prefer to use it for consistency.
  • I am not aware of any cons - do you know if it can be harmful in any way?

What step is it that fails when it does flake?

See the parent flutter issue for details (flutter/flutter#93256). The stack trace:

[  +92 ms] SocketException: Failed to create server socket (OS Error: The shared flag to bind() needs to be `true` if binding multiple times on the same (address, port) combination.), address = ::1, port = 42919
             #0      _NativeSocket.bind (dart:io-patch/socket_patch.dart:988:7)
             <asynchronous suspension>
             #1      HttpMultiServer._loopback (package:http_multi_server/http_multi_server.dart:183:22)
             <asynchronous suspension>
             #2      ExtensionBackend.start (package:dwds/src/servers/extension_backend.dart:57:18)
             <asynchronous suspension>
             #3      Dwds.start (package:dwds/dwds.dart:138:26)
             <asynchronous suspension>
             #4      WebAssetServer.start (package:flutter_tools/src/isolated/devfs_web.dart:274:23)
             <asynchronous suspension>
             #5      WebDevFS.create (package:flutter_tools/src/isolated/devfs_web.dart:727:22)
             <asynchronous suspension>
             #6      ResidentWebRunner.run.<anonymous closure> (package:flutter_tools/src/isolated/resident_web_runner.dart:285:25)
             <asynchronous suspension>
             #7      asyncGuard.<anonymous closure> (package:flutter_tools/src/base/async_guard.dart:111:24)
             <asynchronous suspension>

@annagrin annagrin requested a review from a-siva November 20, 2021 02:27
@aam
Copy link

aam commented Nov 29, 2021

do you know if VM or OS is supposed to resolve the race between two processes using HttpMultiServer.bind(hostname, 0) ?

HttpServer.bind delegates to OS to choose the port. My understanding is that it is done atomically, so no two processes can get the same port. We use it(setting port to 0, querying for it after server is set up) extensively in dart vm test suite.

@annagrin
Copy link
Contributor Author

HttpServer.bind delegates to OS to choose the port. My understanding is that it is done atomically, so no two processes can get the same port. We use it(setting port to 0, querying for it after server is set up) extensively in dart vm test suite.

Should I file the failure as a VM bug, is there something we are doing incorrectly in setting the http server?

@aam
Copy link

aam commented Nov 29, 2021

HttpServer.bind delegates to OS to choose the port. My understanding is that it is done atomically, so no two processes can get the same port. We use it(setting port to 0, querying for it after server is set up) extensively in dart vm test suite.

Should I file the failure as a VM bug, is there something we are doing incorrectly in setting the http server?

I'm not familiar with http_multi_server package, perhaps that should be looked at first in regards how it uses http core HttpServer bind methods. Is it possible for example to add some tracing around the calls http_multi_server makes to HttpServer to see how it uses HttpServer?

@a-siva
Copy link
Contributor

a-siva commented Nov 29, 2021

I don't know if the VM or OS has any way to resolve a race between two processes in any of those cases. /cc: @a-siva @aam do you know if VM or OS is supposed to resolve the race between two processes using HttpMultiServer.bind(hostname, 0) ? From the trace below it looks like it just fails to bind in case the port it has chosen is already in use, is it intentional?

if port number '0' is used it should bind to a ephemeral port, it is possible to get an EADDRINUSE error even with a ephemeral port specification, see doc "The port number was specified as zero in the socket address structure, but, upon attempting to bind to an ephemeral port, it was determined that all port numbers in the ephemeral port range are currently in use", it is highly unlikely that you are running into a situation where we are running out port numbers.

I looked at the dart:io implementation in https://github.com/dart-lang/sdk/blob/master/runtime/bin/socket.cc#L103, we seem to maintain a cache of address/ports in use and if a port already in use is provided we error out, port number 0 is also being stored in the cache. This is clearly doesn't look right.

@a-siva
Copy link
Contributor

a-siva commented Nov 29, 2021

Should I file the failure as a VM bug, is there something we are doing incorrectly in setting the http server?

Yes, please do file a bug with the stack trace you have attached above.

@a-siva
Copy link
Contributor

a-siva commented Nov 29, 2021

I looked at the dart:io implementation in https://github.com/dart-lang/sdk/blob/master/runtime/bin/socket.cc#L103, we seem to maintain a cache of address/ports in use and if a port already in use is provided we error out, port number 0 is also being stored in the cache. This is clearly doesn't look right.

Sorry ignore that, port number 0 is not being cached, but the error message indicates port 42919 is being used and not 0 which probably means 42919 is being reused

@annagrin
Copy link
Contributor Author

Filed dart-lang/sdk#47806. Will check this in in for now to unblock the flutter tests, might want to revert the fix later.

@annagrin annagrin merged commit 8033d67 into dart-lang:master Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry starting extension backend's http server
5 participants