Skip to content

Sparodic Hang from nscd. #402693

@kevincox

Description

@kevincox

I am still diagnosing this but I have noticed it rarely when opening a new terminal. The terminal hangs for a while before the shell loads. This seems to happen most often when I don't have a working internet connection but due to the sporadic nature I haven't been able to confirm this.

I have managed to track down the source to nscd. This is a snippet from strace -r zsh -i -c exit:

     0.000089 getppid()                 = 7445
     0.000048 getpid()                  = 7447
     0.000047 getuid()                  = 1000
     0.000064 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11
     0.000082 connect(11, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
     0.000230 sendto(11, "\2\0\0\0\v\0\0\0\7\0\0\0passwd\0", 19, MSG_NOSIGNAL, NULL, 0) = 19
     0.000246 poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 0 (Timeout)
     5.004649 close(11)                 = 0
     0.000249 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11
     0.000113 connect(11, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
     0.000149 sendto(11, "\2\0\0\0\1\0\0\0\5\0\0\0001000\0", 17, MSG_NOSIGNAL, NULL, 0) = 17
     0.000121 poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=11, revents=POLLIN|POLLERR|POLLHUP}])
     1.373307 read(11, 0x7ffd8b2dc4d0, 36) = -1 ECONNRESET (Connection reset by peer)
     0.000456 close(11)                 = 0
     0.000302 newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0444, st_size=256, ...}, 0) = 0
     0.000323 newfstatat(AT_FDCWD, "/", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
     0.000068 openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 11
     0.000143 fstat(11, {st_mode=S_IFREG|0444, st_size=256, ...}) = 0
     0.000124 read(11, "passwd:    files systemd\ngroup: "..., 4096) = 256
     0.000131 read(11, "", 4096)        = 0
     0.000074 fstat(11, {st_mode=S_IFREG|0444, st_size=256, ...}) = 0
     0.000058 close(11)                 = 0
     0.000106 openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 11
     0.000078 fstat(11, {st_mode=S_IFREG|0644, st_size=11018, ...}) = 0
     0.000051 lseek(11, 0, SEEK_SET)    = 0
     0.000055 read(11, "root:x:0:0:System administrator:"..., 4096) = 4096
     0.000130 close(11)                 = 0

You can see that a request 11 (\v) which seems to correspond to GETFDPW takes 5s before the client gives up waiting.

I appear to be using nsncd which NixOS defaults to. I notice that this command intentionally does not respond:

https://github.com/twosigma/nsncd/blob/81615a8473d8bf3e422a03407e9fcf42712094ed/src/handlers.rs#L720-L727

It claims "Luckily clients fall back to explicit queries if no FDs are sent over." which is technically true, but it seems that this is after some timeout. I haven't read the code very carefully though, so I don't understand the exact behaviour that nsncd is performing.

So quite possibly this is an upstream issue, but the fact that this affects the default NixOS config and may be a configuration issue I have decided to open it here first.

glibc (or whatever is in zsh) then follows up with a GETPWBYUID request. But even this takes 1.4s before ECONNRESET for an unclear reason.


For reference here is a "normal" zsh startup.

     0.000069 getppid()                 = 8164
     0.000037 getpid()                  = 8166
     0.000037 getuid()                  = 1000
     0.000052 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11
     0.000061 connect(11, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
     0.000076 sendto(11, "\2\0\0\0\v\0\0\0\7\0\0\0passwd\0", 19, MSG_NOSIGNAL, NULL, 0) = 19
     0.000067 poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=11, revents=POLLIN|POLLHUP}])
     0.000087 recvmsg(11, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="", iov_len=7}, {iov_base="", iov_len=8}], msg_iovlen=2, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 0
     0.000064 close(11)                 = 0
     0.000045 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11
     0.000042 connect(11, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
     0.000049 sendto(11, "\2\0\0\0\1\0\0\0\5\0\0\0001000\0", 17, MSG_NOSIGNAL, NULL, 0) = 17
     0.000050 poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=11, revents=POLLIN|POLLHUP}])
     0.000519 read(11, "\2\0\0\0\1\0\0\0\t\0\0\0\2\0\0\0\350\3\0\0d\0\0\0\n\0\0\0\17\0\0\0"..., 36) = 36
     0.000262 read(11, "kevincox\0x\0Kevin Cox\0/home/kevin"..., 67) = 67
     0.000256 close(11)                 = 0

Interestingly there is a response there, so IDK how this "no response" in the handler works.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions