Skip to content

Permission denied when container process executes close_range syscall #10337

Closed
@smac89

Description

@smac89

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description
I have an application which uses close_range syscall running inside a container. When I run the container, and the application makes that syscall, I get an error saying "Permission denied".

At first I was thinking this was a problem with the application, but after some investigating, I am starting to think this may be a podman issue and may have something to do with how it handles seccomp profiles.

Steps to reproduce the issue:

walk.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <linux/close_range.h>
#include <linux/limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <string.h>
#include <unistd.h>
#include <dirent.h>

/* Show the contents of the symbolic links in /proc/self/fd */

static void
show_fds(void)
{
   DIR *dirp = opendir("/proc/self/fd");
   if (dirp  == NULL) {
       perror("opendir");
       exit(EXIT_FAILURE);
   }

   for (;;) {
       struct dirent *dp = readdir(dirp);
       if (dp == NULL)
           break;

       if (dp->d_type == DT_LNK) {
           char path[PATH_MAX], target[PATH_MAX];
           snprintf(path, sizeof(path), "/proc/self/fd/%s",
                    dp->d_name);

           ssize_t len = readlink(path, target, sizeof(target));
           printf("%s ==> %.*s\n", path, (int) len, target);
       }
   }

   closedir(dirp);
}

int
main(int argc, char *argv[])
{
   for (int j = 1; j < argc; j++) {
       int fd = open(argv[j], O_RDONLY);
       if (fd == -1) {
           perror(argv[j]);
           exit(EXIT_FAILURE);
       }
       printf("%s opened as FD %d\n", argv[j], fd);
   }

   show_fds();

   printf("========= About to call close_range() =======\n");

   if (syscall(__NR_close_range, 3, ~0U, 0) == -1) {
       perror("close_range");
       exit(EXIT_FAILURE);
   }

   show_fds();
   exit(EXIT_SUCCESS);
}
  1. Copy the above script to /tmp on your host machine

  2. Using buildah:

buildah bud --no-cache --platform linux/amd64 -f - /tmp <<'EOF'
FROM alpine:edge
RUN apk update && apk add --upgrade build-base libc-dev linux-headers
COPY walk.c /app/walk.c
RUN gcc -o /app/walk /app/walk.c
ENTRYPOINT ["/app/walk"]
EOF
  1. Run the resulting image with podman (replace 7bd46f9814bb with the id of the built image)
podman run --rm -it 7bd46f9814bb /app/walk.c

Describe the results you received:

The result will look something like:

/app/walk.c opened as FD 3
/proc/self/fd/0 ==> /dev/pts/0
/proc/self/fd/1 ==> /dev/pts/0
/proc/self/fd/2 ==> /dev/pts/0
/proc/self/fd/3 ==> /app/walk.c
/proc/self/fd/4 ==> /proc/1/fd
========= About to call close_range() =======
close_range: Operation not permitted

Describe the results you expected:

Now repeat this same process on your host linux machine (assuming you are running atleast kernel version 5.9)

The program should run successfully with an output similar to:

/tmp/walk.c opened as FD 3
/proc/self/fd/0 ==> /dev/pts/1
/proc/self/fd/1 ==> /dev/pts/1
/proc/self/fd/2 ==> /dev/pts/1
/proc/self/fd/3 ==> /tmp/walk.c
/proc/self/fd/4 ==> /proc/547032/fd
========= About to call close_range() =======
/proc/self/fd/0 ==> /dev/pts/1
/proc/self/fd/1 ==> /dev/pts/1
/proc/self/fd/2 ==> /dev/pts/1
/proc/self/fd/3 ==> /proc/547032/fd

This is what I expected inside the container

Additional information you deem important (e.g. issue happens only occasionally):

If you run the image with the option --security-opt seccomp=unconfined, everything works fine.

Does that mean podman is simply blocking the close_range syscall? Where does podman's default seccomp.json file live? I was under the impression that they use the default one from docker, which whitelists close_range syscall.

Output of podman version:

Version:      3.1.2
API Version:  3.1.2
Go Version:   go1.16.3
Git Commit:   51b8ddbc22cf5b10dd76dd9243924aa66ad7db39
Built:        Wed Apr 21 15:34:03 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.20.1
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.0.27-1
    path: /usr/bin/conmon
    version: 'conmon version 2.0.27, commit: 65fad4bfcb250df0435ea668017e643e7f462155'
  cpus: 12
  distribution:
    distribution: arcolinux
    version: unknown
  eventLogger: journald
  hostname: ArcoB
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 10000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 10000
      size: 65536
  kernel: 5.11.16-arch1-1
  linkmode: dynamic
  memFree: 18868236288
  memTotal: 41711120384
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 0.19.1-1
    path: /usr/bin/crun
    version: |-
      crun version 0.19.1
      commit: 1535fedf0b83fb898d449f9680000f729ba719f5
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    selinuxEnabled: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.1.9-1
    version: |-
      slirp4netns version 1.1.9
      commit: 4e37ea557562e0d7a64dc636eff156f64927335e
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 32211202048
  swapTotal: 32211202048
  uptime: 6h 42m 15.48s (Approximately 0.25 days)
registries:
  search:
  - docker.io
  - ghcr.io
store:
  configFile: /home/chigozirim/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 1
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: /usr/bin/fuse-overlayfs is owned by fuse-overlayfs 1.5.0-1
      Version: |-
        fusermount3 version: 3.10.3
        fuse-overlayfs: version 1.5
        FUSE library version 3.10.3
        using FUSE kernel interface version 7.31
  graphRoot: /home/chigozirim/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 5
  runRoot: /run/user/1000/containers
  volumePath: /home/chigozirim/.local/share/containers/storage/volumes
version:
  APIVersion: 3.1.2
  Built: 1619040843
  BuiltTime: Wed Apr 21 15:34:03 2021
  GitCommit: 51b8ddbc22cf5b10dd76dd9243924aa66ad7db39
  GoVersion: go1.16.3
  OsArch: linux/amd64
  Version: 3.1.2

Package info (e.g. output of rpm -q podman or apt list podman):

Name                  : podman
Version               : 3.1.2-1
Description           : Tool and library for running OCI-based containers in
                        pods
URL                   : https://github.com/containers/libpod
Licenses              : Apache
Repository            : community
Installed Size        : 76.0 MB
Depends On            : cni-plugins conmon containers-common device-mapper
                        iptables libseccomp runc slirp4netns libsystemd
                        fuse-overlayfs libgpgme.so=11-64
Optional Dependencies : podman-docker: for Docker-compatible CLI [Installed]
                        btrfs-progs: support btrfs backend devices [Installed]
                        catatonit: --init flag support [Installed]
                        crun: support for unified cgroupsv2 [Installed]
Make Dependencies     : btrfs-progs go go-md2man git gpgme systemd
Packager              : Morten Linderud <[email protected]>
Build Date            : 2021-04-21
Install Date          : 2021-04-21
Install Reason        : Explicitly installed
Signatures            : Yes
Backup files          : /etc/cni/net.d/87-podman-bridge.conflist

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.locked - please file new issue/PRAssist humans wanting to comment on an old issue or PR with locked comments.stale-issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions