Impact
This attack is primarily a more sophisticated version of CVE-2019-19921, which was a flaw which allowed an attacker to trick runc into writing the LSM process labels for a container process into a dummy tmpfs file and thus not apply the correct LSM labels to the container process. The mitigation runc applied for CVE-2019-19921 was fairly limited and effectively only caused runc to verify that when runc writes LSM labels that those labels are actual procfs files.
Rather than using a fake tmpfs file for /proc/self/attr/<label>, an attacker could instead (through various means) make /proc/self/attr/<label> reference a real procfs file, but one that would still be a no-op (such as /proc/self/sched). This would have the same effect but would clear the "is a procfs file" check. Runc is aware that this kind of attack would be possible (even going so far as to discuss this publicly as "future work" at conferences), and runc is working on a far more comprehensive mitigation of this attack, but this security issue was disclosed before runc could complete this work.
In all known versions of runc, an attacker can trick runc into misdirecting writes to /proc to other procfs files through the use of a racing container with shared mounts (runc has also verified this attack is possible to exploit using a standard Dockerfile with docker buildx build as that also permits triggering parallel execution of containers with custom shared mounts configured). This redirect could be through symbolic links in a tmpfs or theoretically other methods such as regular bind-mounts.
Note that while /proc/self/attr/<label> was the example used above (which is LSM-specific), this issue affect all writes to /proc in runc and thus also affects sysctls (written to /proc/sys/...) and some other APIs.
Additional Impacts
While investigating this issue, runc discovered that another risk with these redirected writes is that they could be redirected to dangerous files such as /proc/sysrq-trigger rather than just no-op files like /proc/self/sched. For instance, the default AppArmor profile name in Docker is docker-default, which when written to /proc/sysrq-trigger would cause the host system to crash.
When this was discovered, runc conducted an audit of other write operations within runc and found several possible areas where runc could be used as a semi-arbitrary write gadget when combined with the above race attacks. The most concerning attack scenario was the configuration of sysctls. Because the contents of the sysctl are free-form text, an attacker could use a misdirected write to write to /proc/sys/kernel/core_pattern and break out of the container (as described in CVE-2025-31133, kernel upcalls are not namespaced and so coredump helpers will run with complete root privileges on the host). Even if the attacker cannot configure custom sysctls, a valid sysctl string (when redirected to /proc/sysrq-trigger) can easily cause the machine to hang.
Note that the fact that this attack allows you to disable LSM labels makes it a very useful attack to combine with CVE-2025-31133 (as one of the only mitigations available to most users for that issue is AppArmor, and this attack would let you bypass that). However, the misdirected write issue above means that you could also achieve most of the same goals without needing to chain together attacks.
Patches
This advisory is being published as part of a set of three advisories:
The patches fixing this issue have accordingly been combined into a single patchset. The following patches from that patchset resolve the issues in this advisory:
- db19bbed5348 ("internal/sys: add VerifyInode helper")
- 6fc191449109 ("internal: move utils.MkdirAllInRoot to internal/pathrs")
- ff94f9991bd3 ("*: switch to safer securejoin.Reopen")
- 44a0fcf685db ("go.mod: update to github.com/cyphar/[email protected]")
- 77889b56db93 ("internal: add wrappers for securejoin.Proc*")
- fdcc9d3cad2f ("apparmor: use safe procfs API for labels")
- ff6fe1324663 ("utils: use safe procfs for /proc/self/fd loop code")
- b3dd1bc562ed ("utils: remove unneeded EnsureProcHandle")
- 77d217c7c377 ("init: write sysctls using safe procfs API")
- 435cc81be6b7 ("init: use securejoin for /proc/self/setgroups")
- d61fd29d854b ("libct/system: use securejoin for /proc/$pid/stat")
- 4b37cd93f86e ("libct: align param type for mountCgroupV1/V2 functions")
- d40b3439a961 ("rootfs: switch to fd-based handling of mountpoint targets")
- ed6b1693b8b3 ("selinux: use safe procfs API for labels")
-
Please note that this patch includes a private patch for github.com/opencontainers/selinux that could not be made public through a public pull request (as it would necessarily disclose this embargoed security issue).
The patch includes a complete copy of the forked code and a replace directive (as well as go mod vendor applied), which should still work with downstream build systems. If you cannot apply this patch, you can safely drop it -- some of the other patches in this series should block these kinds of racing mount attacks entirely.
See opencontainers/selinux#237 for the upstream patch.
- 3f925525b44d ("rootfs: re-allow dangling symlinks in mount targets")
- a41366e74080 ("openat2: improve resilience on busy systems")
runc 1.2.8, 1.3.3, and 1.4.0-rc.3 have been released and all contain fixes for these issues. As per runc's new release model, runc 1.1.x and earlier are no longer supported and thus have not been patched.
Mitigations
-
Do not run untrusted container images from unknown or unverified sources.
-
For the basic no-op attack, this attack allows a container process to run with the same LSM labels as runc. For most AppArmor deployments this means it will be unconfined, and for SELinux it will likely be container_runtime_t. Runc has not conducted in-depth testing of the impact on SELinux -- it is possible that it provides some reasonable protection but it seems likely that an attacker could cause harm to systems even with such an SELinux setup.
-
For the more involved redirect and write gadget attacks, unfortunately most LSM profiles (including the standard container-selinux profiles) provide the container runtime access to sysctl files (including /proc/sysrq-trigger) and so LSMs likely do not provide much protection against these attacks.
-
Using rootless containers provides some protection against these kinds of bugs (privileged writes in runc being redirected) -- by having runc itself be an unprivileged process, in general you would expect the impact scope of a runc bug to be less severe as it would only have the privileges afforded to the host user which spawned runc. For this particular bug, the privilege escalation caused by the inadvertent write issue is entirely mitigated with rootless containers because the unprivileged user that the runc process is executing as cannot write to the aforementioned procfs files (even intentionally).
Other Runtimes
As this vulnerability boils down to a fairly easy-to-make logic bug, runc has provided information to other OCI (crun, youki) and non-OCI (LXC) container runtimes about this vulnerability.
Based on discussions with other runtimes, it seems that crun and youki may have similar security issues and will release a co-ordinated security release along with runc. LXC appears to use the host's /proc for all procfs operations, and so is likely not vulnerable to this issue (this is a trade-off -- runc uses the container's procfs to avoid CVE-2016-9962-style attacks).
Credits
Thanks to Li Fubang (@lifubang from acmcoder.com, CIIC) and Tõnis Tiigi (@tonistiigi from Docker) for both independently discovering this vulnerability, as well as Aleksa Sarai (@cyphar from SUSE) for the original research into this class of security issues and solutions.
Additional thanks go to Tõnis Tiigi for finding some very useful exploit templates for these kinds of race attacks using docker buildx build.
References
Impact
This attack is primarily a more sophisticated version of CVE-2019-19921, which was a flaw which allowed an attacker to trick runc into writing the LSM process labels for a container process into a dummy
tmpfsfile and thus not apply the correct LSM labels to the container process. The mitigation runc applied for CVE-2019-19921 was fairly limited and effectively only caused runc to verify that when runc writes LSM labels that those labels are actual procfs files.Rather than using a fake
tmpfsfile for/proc/self/attr/<label>, an attacker could instead (through various means) make/proc/self/attr/<label>reference a realprocfsfile, but one that would still be a no-op (such as/proc/self/sched). This would have the same effect but would clear the "is a procfs file" check. Runc is aware that this kind of attack would be possible (even going so far as to discuss this publicly as "future work" at conferences), and runc is working on a far more comprehensive mitigation of this attack, but this security issue was disclosed before runc could complete this work.In all known versions of runc, an attacker can trick runc into misdirecting writes to
/procto other procfs files through the use of a racing container with shared mounts (runc has also verified this attack is possible to exploit using a standard Dockerfile withdocker buildx buildas that also permits triggering parallel execution of containers with custom shared mounts configured). This redirect could be through symbolic links in atmpfsor theoretically other methods such as regular bind-mounts.Note that while
/proc/self/attr/<label>was the example used above (which is LSM-specific), this issue affect all writes to/procin runc and thus also affects sysctls (written to/proc/sys/...) and some other APIs.Additional Impacts
While investigating this issue, runc discovered that another risk with these redirected writes is that they could be redirected to dangerous files such as
/proc/sysrq-triggerrather than just no-op files like/proc/self/sched. For instance, the default AppArmor profile name in Docker isdocker-default, which when written to/proc/sysrq-triggerwould cause the host system to crash.When this was discovered, runc conducted an audit of other write operations within runc and found several possible areas where runc could be used as a semi-arbitrary write gadget when combined with the above race attacks. The most concerning attack scenario was the configuration of sysctls. Because the contents of the sysctl are free-form text, an attacker could use a misdirected write to write to
/proc/sys/kernel/core_patternand break out of the container (as described in CVE-2025-31133, kernel upcalls are not namespaced and so coredump helpers will run with complete root privileges on the host). Even if the attacker cannot configure custom sysctls, a valid sysctl string (when redirected to/proc/sysrq-trigger) can easily cause the machine to hang.Note that the fact that this attack allows you to disable LSM labels makes it a very useful attack to combine with CVE-2025-31133 (as one of the only mitigations available to most users for that issue is AppArmor, and this attack would let you bypass that). However, the misdirected write issue above means that you could also achieve most of the same goals without needing to chain together attacks.
Patches
This advisory is being published as part of a set of three advisories:
The patches fixing this issue have accordingly been combined into a single patchset. The following patches from that patchset resolve the issues in this advisory:
Please note that this patch includes a private patch for
github.com/opencontainers/selinuxthat could not be made public through a public pull request (as it would necessarily disclose this embargoed security issue).The patch includes a complete copy of the forked code and a
replacedirective (as well asgo mod vendorapplied), which should still work with downstream build systems. If you cannot apply this patch, you can safely drop it -- some of the other patches in this series should block these kinds of racing mount attacks entirely.See opencontainers/selinux#237 for the upstream patch.
runc 1.2.8, 1.3.3, and 1.4.0-rc.3 have been released and all contain fixes for these issues. As per runc's new release model, runc 1.1.x and earlier are no longer supported and thus have not been patched.
Mitigations
Do not run untrusted container images from unknown or unverified sources.
For the basic no-op attack, this attack allows a container process to run with the same LSM labels as
runc. For most AppArmor deployments this means it will beunconfined, and for SELinux it will likely becontainer_runtime_t. Runc has not conducted in-depth testing of the impact on SELinux -- it is possible that it provides some reasonable protection but it seems likely that an attacker could cause harm to systems even with such an SELinux setup.For the more involved redirect and write gadget attacks, unfortunately most LSM profiles (including the standard container-selinux profiles) provide the container runtime access to sysctl files (including
/proc/sysrq-trigger) and so LSMs likely do not provide much protection against these attacks.Using rootless containers provides some protection against these kinds of bugs (privileged writes in runc being redirected) -- by having runc itself be an unprivileged process, in general you would expect the impact scope of a runc bug to be less severe as it would only have the privileges afforded to the host user which spawned runc. For this particular bug, the privilege escalation caused by the inadvertent write issue is entirely mitigated with rootless containers because the unprivileged user that the
runcprocess is executing as cannot write to the aforementioned procfs files (even intentionally).Other Runtimes
As this vulnerability boils down to a fairly easy-to-make logic bug, runc has provided information to other OCI (crun, youki) and non-OCI (LXC) container runtimes about this vulnerability.
Based on discussions with other runtimes, it seems that crun and youki may have similar security issues and will release a co-ordinated security release along with runc. LXC appears to use the host's
/procfor all procfs operations, and so is likely not vulnerable to this issue (this is a trade-off -- runc uses the container's procfs to avoid CVE-2016-9962-style attacks).Credits
Thanks to Li Fubang (@lifubang from acmcoder.com, CIIC) and Tõnis Tiigi (@tonistiigi from Docker) for both independently discovering this vulnerability, as well as Aleksa Sarai (@cyphar from SUSE) for the original research into this class of security issues and solutions.
Additional thanks go to Tõnis Tiigi for finding some very useful exploit templates for these kinds of race attacks using
docker buildx build.References