Skip to content

Make sure err isn't nil when returning failure #2861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2024

Conversation

jandubois
Copy link
Member

Fixes #2859

@jandubois
Copy link
Member Author

A separate issue is if the pseudoLoopbackForwarder should accept connections from [::1] in addition to 127.0.0.1.

@jandubois
Copy link
Member Author

@AkihiroSuda This fixes the crash, but doesn't actually forward to 127.0.0.1:

$ curl http://127.0.0.1:80
curl: (7) Failed to connect to 127.0.0.1 port 80 after 0 ms: Couldn't connect to server

@jandubois
Copy link
Member Author

jandubois commented Nov 7, 2024

@AkihiroSuda This fixes the crash, but doesn't actually forward to 127.0.0.1:

Actually, it only stops accepting connections from 127.0.0.1 after it has rejected a connection from [::1] once.

This also reminds of something @Nino-K told me: when the guest agent sends a duplicate port to add, then the host agent removes the existing port forwarding. Will see if I can find this in his WIP PR.

@jandubois jandubois force-pushed the not-localhost branch 2 times, most recently from ed5c6da to e539fd3 Compare November 7, 2024 05:58
@jandubois jandubois marked this pull request as ready for review November 7, 2024 05:59
@AkihiroSuda
Copy link
Member

Thanks, can we have a test?

diff --git a/hack/test-templates.sh b/hack/test-templates.sh
index 57ff37a4..0d689312 100755
--- a/hack/test-templates.sh
+++ b/hack/test-templates.sh
@@ -307,9 +307,11 @@ if [[ -n ${CHECKS["port-forwards"]} ]]; then
                        fi
                        limactl shell "$NAME" $sudo $CONTAINER_ENGINE info
                        limactl shell "$NAME" $sudo $CONTAINER_ENGINE pull --quiet ${nginx_image}
-                       limactl shell "$NAME" $sudo $CONTAINER_ENGINE run -d --name nginx -p 8888:80 ${nginx_image}
-
-                       timeout 3m bash -euxc "until curl -f --retry 30 --retry-connrefused http://${hostip}:8888; do sleep 3; done"
+                       for hostport in 8888 80; do
+                               limactl shell "$NAME" $sudo $CONTAINER_ENGINE run -d --name nginx -p ${hostrport}:80 ${nginx_image}
+                               timeout 3m bash -euxc "until curl -f --retry 30 --retry-connrefused http://${hostip}:${hostport}; do sleep 3; done"
+                               limactl shell "$NAME" $sudo $CONTAINER_ENGINE rm -f nginx
+                       done
                fi
        fi
        set +x

@jandubois
Copy link
Member Author

This PR now fixes the crash and also avoids closing the forwarder the first time a non-local address tries to connect to it.

$ curl 127.0.0.1
<!DOCTYPE html>


$ curl localhost
curl: (56) Recv failure: Connection reset by peer

$ curl 127.0.0.1
<!DOCTYPE html>

I guess we could use a custom error type instead of matching on the string of the error message to make it a bit more robust.

I also think we should accept connections from [::1], but that should be a separate PR.

@jandubois
Copy link
Member Author

jandubois commented Nov 7, 2024

Fixes #2859

I don't actually know if this PR fixes the hanging problem; I only know it fixes the crash @AkihiroSuda has shown in the same issue.

And it makes the forwarding listeners a bit more robust in general against accidental removal.

I don't plan to make any more changes tonight; @rfay maybe you can test this PR (or wait until it has been merged into master so you also get my other fix in #2860)? Just so we know if this fixes all known issues, or if there is still something else.

@AkihiroSuda
Copy link
Member

CI is failing

@jandubois
Copy link
Member Author

CI is failing

Yes, because we cannot bind to ${hostip}:80 I think. I'll see if I can still fix it tonight.

@jandubois
Copy link
Member Author

It is still failing in CI, but works for me locally. I'm out of time now and will look into this tomorrow. Unless somebody beats me to it...

@jandubois jandubois marked this pull request as draft November 7, 2024 16:52
@jandubois
Copy link
Member Author

Converting back to draft since we'll switch the default back to SSH in #2864.

And I just tested curl localhost with the SSH forwarder, so it needs to work with gRPC too.

@jandubois jandubois force-pushed the not-localhost branch 4 times, most recently from 8a8eefb to a744670 Compare November 8, 2024 00:33
@jandubois
Copy link
Member Author

Yes, because we cannot bind to ${hostip}:80 I think. I'll see if I can still fix it tonight.

I've restricted the test binding to port 80 to only macOS, which is also the only platform where the lower port makes a difference due to the pseudoloopback forwarders.

However, since we have reverted to the SSH forwarder, this test will not currently test the gRPC implementation.

Should we run at least one of the VZ tests (default or fedora) with LIMA_SSH_PORT_FORWARDER=false?

@jandubois jandubois force-pushed the not-localhost branch 2 times, most recently from 8a08a76 to af3b7d6 Compare November 8, 2024 00:59
@jandubois jandubois mentioned this pull request Nov 8, 2024
@jandubois jandubois marked this pull request as ready for review November 8, 2024 01:36
@jandubois jandubois requested a review from AkihiroSuda November 8, 2024 01:36
@AkihiroSuda AkihiroSuda added this to the v1.0.1 milestone Nov 8, 2024
@AkihiroSuda
Copy link
Member

Fixes #2859

Does this PR fix the hang too? Or just fixes the panic?

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@jandubois
Copy link
Member Author

Does this PR fix the hang too?

I have not been able to reproduce the hangs. Do you have any repro steps?

Or just fixes the panic?

It fixes the panic, adds support for IPv6, and makes sure the forwarder isn't removed when a connection is attempted from a rejected (non-loopback) address.

@AkihiroSuda
Copy link
Member

I have not been able to reproduce the hangs. Do you have any repro steps?

Just running docker.yaml on the CI seems enough.
And I don’t know other way to reproduce the hang.

@jandubois
Copy link
Member Author

Just running docker.yaml on the CI seems enough.

It seems almost all the PRs merged for v1.0.1 have been running docker.yaml without hanging, otherwise they wouldn't be green: https://github.com/lima-vm/lima/pulls?q=is%3Apr+is%3Aclosed+milestone%3Av1.0.1

Anyways, I can't remember seeing the docker test hang in this PR, but also not in any of the others I did after v1.0.0.

@AkihiroSuda
Copy link
Member

I think I have restarted several failing jobs, so they are marked green

@jandubois jandubois merged commit 0e93110 into lima-vm:master Nov 8, 2024
29 checks passed
@jandubois jandubois deleted the not-localhost branch November 8, 2024 02:25
@jandubois
Copy link
Member Author

I think I have restarted several failing jobs, so they are marked green

Ok, let's see if this is going to stop now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gRPC port forwarder seems to occasionally hang (with template://docker); also panics
2 participants