Skip to content

/etc/resolv.conf is not mounted with the correct permissions when the host has a umask 0077 #3704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
apostasie opened this issue Nov 30, 2024 · 11 comments · Fixed by #3708
Closed
Labels
area/network bug Something isn't working expert

Comments

@apostasie
Copy link
Contributor

Description

This is quite confounding.

sudo nerdctl run --rm -ti debian sh -c -- "apt-get update"

Will fail

Err:1 http://deb.debian.org/debian bookworm InRelease
  Temporary failure resolving 'deb.debian.org'
Err:2 http://deb.debian.org/debian bookworm-updates InRelease
  Temporary failure resolving 'deb.debian.org'
Err:3 http://deb.debian.org/debian-security bookworm-security InRelease
  Temporary failure resolving 'deb.debian.org'

(same with ubuntu)

BUT

sudo nerdctl run --rm -ti alpine sh -c -- "apk update; apk add curl"

Works just fine.

Furthermore:

  • using net host works just fine - this definitely has to do with bridge
  • using a different nameserver (sudo nerdctl run --dns 1.1.1.1 --rm -ti debian bash) does NOT fix the problem
  • using docker with the same images, on the same host, works just fine

Since this is working with alpine, my intuition is to blame glibc.

@AkihiroSuda does this problem sound familiar in any way?

Any pointer on how to debug this?

Steps to reproduce the issue

No response

Describe the results you received and expected

na

What version of nerdctl are you using?

Host is:

apo@amaterasu:~/post $ uname -a
Linux amaterasu 6.6.51+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64 GNU/Linux
apo@amaterasu:~/post $ cat /etc/issue
Debian GNU/Linux 12 \n \l
apo@amaterasu:~/post $ sudo nerdctl version
WARN[0000] unable to determine buildctl version: exec: "buildctl": executable file not found in $PATH
Client:
 Version:	v2.0.0
 OS/Arch:	linux/arm64
 Git commit:	ef588dafa080e3dbc9c061ff3802affb66aef291
 buildctl:
  Version:

Server:
 containerd:
  Version:	1.7.23
  GitCommit:	57f17b0a6295a39009d861b89e3b3b87b005ca27
 runc:
  Version:	1.1.14
  GitCommit:	v1.1.14-0-g2c9f560
apo@amaterasu:~/post $ sudo nerdctl info
Client:
 Namespace:	default
 Debug Mode:	false

Server:
 Server Version: 1.7.23
 Storage Driver: overlayfs
 Logging Driver: json-file
  Cgroup Driver:  : systemd
  Cgroup Version: : 2
 Plugins:
  Log:     fluentd journald json-file none syslog
  Storage: native overlayfs
 Security Options:
  seccomp
   Profile:	builtin
  cgroupns
 Kernel Version:   6.6.51+rpt-rpi-v8
 Operating System: Debian GNU/Linux 12 (bookworm)
 OSType:           linux
 Architecture:     aarch64
 CPUs:             4
 Total Memory:     3.703GiB
 Name:             amaterasu
 ID:               928b00e8-0257-43b5-be2c-5016e071c1f0

WARNING: No memory limit support
WARNING: No swap limit support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
apo@amaterasu:~/post $ /opt/cni/bin/bridge --version
CNI bridge plugin v1.5.1
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

@apostasie apostasie added the kind/unconfirmed-bug-claim Unconfirmed bug claim label Nov 30, 2024
@apostasie
Copy link
Contributor Author

Tried a bunch of cni versions (from 1.4 to 1.6), they all exhibit the same issue.

@apostasie
Copy link
Contributor Author

apostasie commented Nov 30, 2024

Also same results with containerd 1.7 vs. containerd 2.0

Also same with nerdctl 1.7 vs nerdctl 2.0

It is not a regression.

Also tested on different hosts - same problem.

Either I hosed something on these boxes wrt networking, or there has always been an issue on arm with bridge (seems unlikely...)?

@AkihiroSuda
Copy link
Member

Is this reproducible with the ARM instance of GHA?

@apostasie
Copy link
Contributor Author

Is this reproducible with the ARM instance of GHA?

I have not tried yet.
Note that things work fine for me inside Lima on an M1 mac, so, there are definitely cases where arm64 is fine...

Will try on GHA though.

@apostasie
Copy link
Contributor Author

Maybe it is related to subnetting for these arm64 boxes.

It is still mindblowing that it works for alpine and not debian, but then it would not be the first time there would be something really weird in glibc wrt dns resolution.

@apostasie
Copy link
Contributor Author

It works fine on the CI.

This has to do with something specific wrt networking on these boxes.

@apostasie
Copy link
Contributor Author

This is nuts.

Networking actually works just fine inside the debian container (dig / curl are happy), just NOT for apt-get, which fails resolving, regardless of which dns server is used in /etc/resolv.conf.

And again, things work just fine with docker for the same images, on the same machine - which is even more baffling.

Possible culprits would be ipv6 (nerdctl seems to enable ipv6 on the iface by default while docker does not) - or something related to CNI doing something with UDP packets (in the specific hardware context of these boxes) that apt (or glibc) does not like.

I give up on this. No time on my side to deepdivedebug apt / glibc (or CNI for that matter).

If anyone else hits this and has ideas on how to debug this further, tag me.

@apostasie
Copy link
Contributor Author

Looks like I just can't let this go (though cannot continue further tonight).

Here is where things are:

Standard resolution works just fine - tested for golang apps, using net.LookupIP, with netcgo and with netgo.
Clearly the issue is when apt is doing something fancier than that.

The message in apt comes from: https://salsa.debian.org/apt-team/apt/-/blob/main/methods/connect.cc?ref_type=heads#L407-430

But then at that point, it is already "too late": with docker, the domain getting resolved is debian.map.fastlydns.net - while with containerd / nerdctl, it is still deb.debian.org.

That is likely because the earlier SRV lookup failed in Connect:
https://salsa.debian.org/apt-team/apt/-/blob/main/methods/connect.cc?ref_type=heads#L478

Which points to https://salsa.debian.org/apt-team/apt/-/blob/main/apt-pkg/contrib/srvrec.cc?ref_type=heads#L36 as the problem area.

So, intuition here is that there is a problem with SRV lookups - getservbyport_r / friends.

Either:

  • this is some glibc version / compilation fuckerism (between host / CNI / containerd / container)
    OR
  • CNI is manipulating UDP/DNS response in a way that doesn't please glibc

@apostasie
Copy link
Contributor Author

apostasie commented Dec 2, 2024

OMFG

ls -lA /etc/resolv.conf
-rw------- 1 root root 56 Dec  2 03:55 /etc/resolv.conf

^^^^^^

🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️

We obviously have a permission problem mounting /etc/resolv.conf with aggressive umasks on the host...

@apostasie apostasie changed the title Network issue with bridge, ONLY with (glibc?) debian images /etc/resolv.conf is not mounted with the correct permissions when the host has a umask 0077 Dec 2, 2024
@apostasie
Copy link
Contributor Author

The reason it affects ONLY apt-get is likely because apt-get drops out of root.

@apostasie
Copy link
Contributor Author

This also affects /etc/hosts, possibly others.

@AkihiroSuda AkihiroSuda added bug Something isn't working and removed kind/unconfirmed-bug-claim Unconfirmed bug claim labels Dec 2, 2024
apostasie added a commit to apostasie/nerdctl that referenced this issue Dec 2, 2024
WriteFile sets permissions before umask is applied.
For people using agressive umasks (0077), /etc/resolv.conf will end-up unreadable for non root processes.

See containerd#3704

Signed-off-by: apostasie <[email protected]>
apostasie added a commit to apostasie/nerdctl that referenced this issue Dec 2, 2024
WriteFile uses syscall.Open, so permissions are modified by umask, if set.
For people using agressive umasks (0077), /etc/resolv.conf will end-up unreadable for non root processes.

See containerd#3704

Signed-off-by: apostasie <[email protected]>
apostasie added a commit to apostasie/nerdctl that referenced this issue Dec 2, 2024
WriteFile uses syscall.Open, so permissions are modified by umask, if set.
For people using agressive umasks (0077), /etc/resolv.conf will end-up unreadable for non root processes.

See containerd#3704

Signed-off-by: apostasie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/network bug Something isn't working expert
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants