Description
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I have an application which uses close_range
syscall running inside a container. When I run the container, and the application makes that syscall, I get an error saying "Permission denied".
At first I was thinking this was a problem with the application, but after some investigating, I am starting to think this may be a podman issue and may have something to do with how it handles seccomp profiles.
Steps to reproduce the issue:
walk.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <linux/close_range.h>
#include <linux/limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <string.h>
#include <unistd.h>
#include <dirent.h>
/* Show the contents of the symbolic links in /proc/self/fd */
static void
show_fds(void)
{
DIR *dirp = opendir("/proc/self/fd");
if (dirp == NULL) {
perror("opendir");
exit(EXIT_FAILURE);
}
for (;;) {
struct dirent *dp = readdir(dirp);
if (dp == NULL)
break;
if (dp->d_type == DT_LNK) {
char path[PATH_MAX], target[PATH_MAX];
snprintf(path, sizeof(path), "/proc/self/fd/%s",
dp->d_name);
ssize_t len = readlink(path, target, sizeof(target));
printf("%s ==> %.*s\n", path, (int) len, target);
}
}
closedir(dirp);
}
int
main(int argc, char *argv[])
{
for (int j = 1; j < argc; j++) {
int fd = open(argv[j], O_RDONLY);
if (fd == -1) {
perror(argv[j]);
exit(EXIT_FAILURE);
}
printf("%s opened as FD %d\n", argv[j], fd);
}
show_fds();
printf("========= About to call close_range() =======\n");
if (syscall(__NR_close_range, 3, ~0U, 0) == -1) {
perror("close_range");
exit(EXIT_FAILURE);
}
show_fds();
exit(EXIT_SUCCESS);
}
-
Copy the above script to /tmp on your host machine
-
Using
buildah
:
buildah bud --no-cache --platform linux/amd64 -f - /tmp <<'EOF'
FROM alpine:edge
RUN apk update && apk add --upgrade build-base libc-dev linux-headers
COPY walk.c /app/walk.c
RUN gcc -o /app/walk /app/walk.c
ENTRYPOINT ["/app/walk"]
EOF
- Run the resulting image with podman (replace
7bd46f9814bb
with the id of the built image)
podman run --rm -it 7bd46f9814bb /app/walk.c
Describe the results you received:
The result will look something like:
/app/walk.c opened as FD 3
/proc/self/fd/0 ==> /dev/pts/0
/proc/self/fd/1 ==> /dev/pts/0
/proc/self/fd/2 ==> /dev/pts/0
/proc/self/fd/3 ==> /app/walk.c
/proc/self/fd/4 ==> /proc/1/fd
========= About to call close_range() =======
close_range: Operation not permitted
Describe the results you expected:
Now repeat this same process on your host linux machine (assuming you are running atleast kernel version 5.9)
The program should run successfully with an output similar to:
/tmp/walk.c opened as FD 3
/proc/self/fd/0 ==> /dev/pts/1
/proc/self/fd/1 ==> /dev/pts/1
/proc/self/fd/2 ==> /dev/pts/1
/proc/self/fd/3 ==> /tmp/walk.c
/proc/self/fd/4 ==> /proc/547032/fd
========= About to call close_range() =======
/proc/self/fd/0 ==> /dev/pts/1
/proc/self/fd/1 ==> /dev/pts/1
/proc/self/fd/2 ==> /dev/pts/1
/proc/self/fd/3 ==> /proc/547032/fd
This is what I expected inside the container
Additional information you deem important (e.g. issue happens only occasionally):
If you run the image with the option --security-opt seccomp=unconfined
, everything works fine.
Does that mean podman
is simply blocking the close_range
syscall? Where does podman's default seccomp.json file live? I was under the impression that they use the default one from docker, which whitelists close_range
syscall.
Output of podman version
:
Version: 3.1.2
API Version: 3.1.2
Go Version: go1.16.3
Git Commit: 51b8ddbc22cf5b10dd76dd9243924aa66ad7db39
Built: Wed Apr 21 15:34:03 2021
OS/Arch: linux/amd64
Output of podman info --debug
:
host:
arch: amd64
buildahVersion: 1.20.1
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: /usr/bin/conmon is owned by conmon 1:2.0.27-1
path: /usr/bin/conmon
version: 'conmon version 2.0.27, commit: 65fad4bfcb250df0435ea668017e643e7f462155'
cpus: 12
distribution:
distribution: arcolinux
version: unknown
eventLogger: journald
hostname: ArcoB
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 10000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 10000
size: 65536
kernel: 5.11.16-arch1-1
linkmode: dynamic
memFree: 18868236288
memTotal: 41711120384
ociRuntime:
name: crun
package: /usr/bin/crun is owned by crun 0.19.1-1
path: /usr/bin/crun
version: |-
crun version 0.19.1
commit: 1535fedf0b83fb898d449f9680000f729ba719f5
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
remoteSocket:
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
selinuxEnabled: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: /usr/bin/slirp4netns is owned by slirp4netns 1.1.9-1
version: |-
slirp4netns version 1.1.9
commit: 4e37ea557562e0d7a64dc636eff156f64927335e
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.1
swapFree: 32211202048
swapTotal: 32211202048
uptime: 6h 42m 15.48s (Approximately 0.25 days)
registries:
search:
- docker.io
- ghcr.io
store:
configFile: /home/chigozirim/.config/containers/storage.conf
containerStore:
number: 2
paused: 0
running: 1
stopped: 1
graphDriverName: overlay
graphOptions:
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: /usr/bin/fuse-overlayfs is owned by fuse-overlayfs 1.5.0-1
Version: |-
fusermount3 version: 3.10.3
fuse-overlayfs: version 1.5
FUSE library version 3.10.3
using FUSE kernel interface version 7.31
graphRoot: /home/chigozirim/.local/share/containers/storage
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 5
runRoot: /run/user/1000/containers
volumePath: /home/chigozirim/.local/share/containers/storage/volumes
version:
APIVersion: 3.1.2
Built: 1619040843
BuiltTime: Wed Apr 21 15:34:03 2021
GitCommit: 51b8ddbc22cf5b10dd76dd9243924aa66ad7db39
GoVersion: go1.16.3
OsArch: linux/amd64
Version: 3.1.2
Package info (e.g. output of rpm -q podman
or apt list podman
):
Name : podman
Version : 3.1.2-1
Description : Tool and library for running OCI-based containers in
pods
URL : https://github.com/containers/libpod
Licenses : Apache
Repository : community
Installed Size : 76.0 MB
Depends On : cni-plugins conmon containers-common device-mapper
iptables libseccomp runc slirp4netns libsystemd
fuse-overlayfs libgpgme.so=11-64
Optional Dependencies : podman-docker: for Docker-compatible CLI [Installed]
btrfs-progs: support btrfs backend devices [Installed]
catatonit: --init flag support [Installed]
crun: support for unified cgroupsv2 [Installed]
Make Dependencies : btrfs-progs go go-md2man git gpgme systemd
Packager : Morten Linderud <[email protected]>
Build Date : 2021-04-21
Install Date : 2021-04-21
Install Reason : Explicitly installed
Signatures : Yes
Backup files : /etc/cni/net.d/87-podman-bridge.conflist
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):