core: Run sysusers after doing passwd/group layering dance#5403
core: Run sysusers after doing passwd/group layering dance#5403cgwalters merged 4 commits intocoreos:mainfrom
Conversation
For now, we'll just treat sysusers entries from RPM packages like we do scriptlets that `useradd`/`groupadd`; that is, we want them to happen at compose time and go into altfiles in case those same sysusers own content in the commit. All we need to do to make that happen is to run `systemd-sysusers` _after_ we do the `/etc/passwd` <--> `/usr/lib/passwd` switcheroo so that the new entries go into what will become `/usr/lib/passwd`. And all we need to do that is to just move down the sysusers execution a bit. Fixes: coreos#5365
I noticed while hacking on coreos#5365 that during the rpmdb writing, librpm was actually re-executing systemd-sysusers *from the host context* which is not at all what we want. Apparently, `RPMTRANS_FLAG_JUSTDB` doesn't imply this and we need to explicitly also pass `RPMTRANS_FLAG_NOSYSUSERS`. That flag doesn't exist in el9, so add a compile-time conditional for it. This fixes the issue for new systems, but people who have upgraded to f42 and overlaid packages with sysusers entries will have new entries in `/etc/passwd` and `/etc/group` files because of this. And this can cause problems now if the UIDs chosen were different because the `/etc` entries will take precedence over nss-altfiles even though owned content will match nss-altfiles. In practice, I think since coreos#5365 breaks exactly those use cases where the sysusers entries own content, we don't have to worry about that subcase. But for sysusers entries that _don't_ own content, the transaction would go through and so there could still be UID conflicts there. I guess we'll need to figure out if to somehow try to fix this or just issue a PSA about it.
|
Note the second commit's message here is really important. Hmm, it probably should be its own PR to draw more attention to it. Anyway, this is missing tests. We don't have any tests that exercise sysusers right now, but it wouldn't be hard. I did verify this unbreaks |
|
Here's a test case I wrote earlier |
Eeek!
Hmmm...but isn't this at least a partial argument that we should take the path of always mutating the live Or at least, if we choose not to do that, I think we may need to at least scrape the current |
Nice. Mind pushing that here?
Yeah, I thought about that. Not sure... I'm not convinced the "alignment with RPM" is sufficiently beneficial if we can only do it for the non-owning subset and we still have to carry machinery around for the owning case and continue to support the legacy
So let's break this down into the owning case and non-owning case. I think we can rule out the former, because that's exactly #5365 which is currently broken. That leaves the non-owning case (which to be clear, could still own unmanaged content in |
Signed-off-by: Colin Walters <[email protected]>
|
The container test is failing with
Which is summarized in bootc-dev/bootc#1179 I think what we're hitting here really is actually quite generic to the case of dropping out sysusers entries from an image build? Though I'm a bit confused as to how this PR is triggering this now. |
|
Just faced this issue upgrading from f41 kinoite, hoping for a quick resolution! |
|
OK, pushed another commit here which hopefully fixes that. |
When doing the passwd/group layering dance, we need to take care of also moving the canonical shadow files from the base layer out of the way the same way we do passwd and group. All those files are coherent with each other so it doesn't make sense to move some and not others. Otherwise, when we move the `/usr/lib` version of passwd and group into `/etc` as part of the dance and we try to run systemd-sysusers, it can be confused by the fact that the shadow files seemingly already have some entries while passwd and group do not. In the end those `/etc` files don't really matter because they're almost guaranteed to be dirtied on local systems. So the canonical copy in the image will never actually be used.
|
@jlebon: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
How will the upgrade path be like? The people that have those kind of issues are not able to install or update packages at the moment |
|
In the short term, having this merged allows folks to pull from our automatic builds from git main to test out the fix as it stands now.
The most likely thing here is:
|
|
Thanks! let's make sure that it will be clearly documented :) |
|
@Fale @cgwalters what would be the best place to document it? |
|
@maxgio92 IMO, since this workaround should only be temporarily needed, I would suggest proposing a Common Issue on discussion.fedoraproject.org https://discussion.fedoraproject.org/t/about-the-proposed-common-issues-category/69491 Alternatively, you could submit docs to the Silverblue docs repo; perhaps under the Troubleshooting section: https://github.com/fedora-silverblue/silverblue-docs |
|
@cgwalters @Fale nothing at all explains how to upgrade to this rpm-ostree from an now installed rpm-ostree based system, nor is there a document anywhere that says how to achieve a working a system again, its quite useless to have a fix that many people want without a a known document pointer that tells people how they can recover to a working state |
|
@outbackdingo I will leave here the steps I followed to upgrade from a Fedora IoT 42 install with a deployment from early June, suffering from this bug with rpm-ostree and groups added by layered packages:
Until the new rpm-ostree package hits the repositories, the above step should be repeated every time you need to install a new deployment. |
rpm-ostree override replace rpm-ostree-2025.8.63.ga07e7661-1.fc43.x86_64.rpm rpm-ostree-libs-2025.8.63.ga07e7661-1.fc43.x86_64.rpm fedora:fedora/42/x86_64/cosmic-atomic rpm -qa | grep rpm-ostree and it still fails |
|
@Procsiab and note.. your method now says error: Deployment is already in unlocked state: transient Nothing to do. So it seems this fix does not in fact resolve the issue of missing users when trying to install openvpn |
|
@outbackdingo Try to start from a deployment not in an "override" state: I could deploy successfully after I was stuck with the same issue on the nut package because of the group missing bug. |
interestingly, now after a third reboot, openvpn did in fact install ... i still show overlays however |
|
In my case the package that is causing issue is qemu, and to install this merged code, I've used the below Steps :
sudo rpm-ostree uninstall qemu qemu-kvm
sudo curl -o /etc/yum.repos.d/CoreOS-continuous.repo https://copr.fedorainfracloud.org/coprs/g/CoreOS/continuous/repo/fedora-42/group_CoreOS-continuous-fedora-42.repo
sudo rpm-ostree override replace --experimental --freeze --from repo='copr:copr.fedorainfracloud.org:group_CoreOS:continuous' rpm-ostree rpm-ostree-libs
sudo rpm-ostree install qemu qemu-kvmMy plan is to keep those overrides in place until the default base image contain the fixed version of rpm-ostree sudo rpm-ostree override reset -aIs this a sound plan ? I'm asking this because my internet gateway is running as a Virtual Machine on a Silverblue 41 , and the 6 step above will not work (Step 1 will cut my internet connection ). |
|
I had to perform again the steps I described in my previous comment (#5403 (comment)) to deploy
By using this release I could complete successfully the deployment and the finalize staged step. |
|
Here's a workaround solution script that installs the latest stable version of rpm-ostree and rpm-ostree-libs in usroverlay so that #!/bin/sh
# # fix_rpmostree.sh
# Install latest rpm-ostree in order to workaround [sysusers] package layering issues
#
# ## References:
# - https://github.com/coreos/rpm-ostree/pull/5403#issuecomment-3046706179
# - https://github.com/coreos/rpm-ostree/pull/5403#issuecomment-3139129186
# - https://packages.fedoraproject.org/pkgs/rpm-ostree/rpm-ostree-libs/
# - https://packages.fedoraproject.org/pkgs/rpm-ostree/rpm-ostree/
# - https://bodhi.fedoraproject.org/updates/?packages=rpm-ostree
# - https://dl.fedoraproject.org/pub/fedora/linux/updates/42/Everything/x86_64/Packages/r/rpm-ostree-2025.9-1.fc42.x86_64.rpm
# - https://dl.fedoraproject.org/pub/fedora/linux/updates/42/Everything/x86_64/Packages/r/rpm-ostree-libs-2025.9-1.fc42.x86_64.rpm
set -x
VERSION="${VERSION:-"2025.9-1"}"
#VERSION="${VERSION:-"2025.10-1"}"
RELEASE="${RELEASE:-"fc42"}"
ARCH="${ARCH:-"x86_64"}"
PACKAGE_SUFFIX="${VERSION}.${RELEASE}.${ARCH}.rpm"
RPMOSTREELIBS_RPM="rpm-ostree-libs-${PACKAGE_SUFFIX}"
RPMOSTREELIBS_RPM_URL="https://dl.fedoraproject.org/pub/fedora/linux/updates/42/Everything/x86_64/Packages/r/${RPMOSTREELIBS_RPM}"
RPMOSTREE_RPM="rpm-ostree-${PACKAGE_SUFFIX}"
RPMOSTREE_RPM_URL="https://dl.fedoraproject.org/pub/fedora/linux/updates/42/Everything/x86_64/Packages/r/${RPMOSTREE_RPM}"
DNF="dnf"
DNF="/usr/bin/python -m dnf.cli.main"
sudo rpm-ostree usroverlay
sudo ${DNF} install -y "${RPMOSTREELIBS_RPM_URL}" "${RPMOSTREE_RPM_URL}"
sudo rpm-ostree status
sudo rpm-ostree status -b -J '$.deployments[*].version'
pending_base_version=$(sudo rpm-ostree status -b -J '$..pending-base-version' | tail -n+2 | head -n1 | sed 's/ "\(.*\)"/\1/')
sudo rpm-ostree deploy "${pending_base_version}"
sudo ostree admin finalize-staged -v |
Does |
For now, we'll just treat sysusers entries from RPM packages like we do
scriptlets that
useradd/groupadd; that is, we want them to happenat compose time and go into altfiles in case those same sysusers own
content in the commit.
All we need to do to make that happen is to run
systemd-sysusersafter we do the
/etc/passwd<-->/usr/lib/passwdswitcheroo sothat the new entries go into what will become
/usr/lib/passwd.And all we need to do that is to just move down the sysusers execution
a bit.
Fixes: #5365