CVE-2026-46223 Overview
CVE-2026-46223 is a Linux kernel vulnerability in the control groups (cgroup) subsystem. The flaw causes a self-deadlock during rmdir(2) operations when the caller is also the reaper of zombie processes pinning a PID namespace teardown. The cgroup_drain_dying() wait blocks in TASK_UNINTERRUPTIBLE state while waiting for tasks to release their cset links, but those tasks cannot be reaped because the reaper itself is stuck in rmdir. The fix defers the css percpu_ref kill on rmdir until the cgroup is fully depopulated, allowing ->css_offline() to run asynchronously after all tasks have left.
Critical Impact
Local processes that perform rmdir on cgroups during PID namespace teardown can trigger an A-A deadlock, hanging system management processes such as systemd running as host PID 1.
Affected Products
- Linux kernel (mainline)
- Stable kernel branches containing commits d245698d727a, a72f73c4dd9b, 1b164b876c36, 4c56a8ac6869, and 13e786b64bd3
- Distributions shipping kernels with the cgroup rmdir rework chain back to v7.0-era commits
Discovery Timeline
- 2026-05-28 - CVE-2026-46223 published to NVD
- 2026-05-28 - Last updated in NVD database
Technical Details for CVE-2026-46223
Vulnerability Analysis
The Linux kernel cgroup subsystem must guarantee that a controller's ->css_offline() callback does not run while tasks still perform kernel-side work in the cgroup. A chain of five commits, beginning with d245698d727a, attempted to enforce this invariant by moving task cset unlink from do_exit() to finish_task_switch(). As a result, exiting tasks past exit_signals() linger on cset->tasks until their final context switch.
Subsequent commits patched the user-visible divergence: hiding exiting tasks from cgroup.procs, sleeping in rmdir(2) until dying tasks drained, fixing the wait predicate, and exposing nr_dying_subsys_* synchronously. The cgroup_drain_dying() wait introduced in 1b164b876c36 is the proximate bug.
Root Cause
The deadlock occurs when the process invoking rmdir is simultaneously the reaper for zombie tasks that pin a PID namespace teardown. The reaper sleeps in TASK_UNINTERRUPTIBLE inside cgroup_drain_dying(), waiting for those PIDs to release. The PIDs cannot be released because their reaper is blocked in rmdir. No internal lock ordering violation exists; the wait itself is the defect.
Attack Vector
A local unprivileged or privileged process operating inside a container or PID namespace can trigger the hang. The reproducer combines a PID namespace teardown with a zombie reaper, executing under vng. The original chain also leaves a pre-existing race in cgroup_apply_control_disable(), where kill_css() runs synchronously while tasks past exit_signals() may still be linked, allowing ->css_offline() to fire before drain completes.
No synthetic exploitation code is published. Reproduction relies on the in-tree deterministic repros and a boot parameter that widens the post-exit_signals() window. See the Kernel Git Commit 33fa2e6 and Kernel Git Commit 93618ed for upstream patches.
Detection Methods for CVE-2026-46223
Indicators of Compromise
- Processes blocked in TASK_UNINTERRUPTIBLE state with stacks inside cgroup_drain_dying() or cgroup_rmdir
- systemd or other host PID 1 reapers hung while reaping orphaned PIDs from a collapsing PID namespace
- Growing nr_dying_subsys_* counters in cgroup debug interfaces without corresponding cleanup progress
Detection Strategies
- Audit kernel stack traces via /proc/<pid>/stack for processes wedged in cgroup_drain_dying or percpu_ref_kill_and_confirm
- Monitor dmesg for hung_task warnings naming container runtime or init processes during namespace teardown
- Correlate cgroup rmdir syscall latency spikes with container shutdown events in observability pipelines
Monitoring Recommendations
- Track long-running rmdir syscalls against cgroup mount points using eBPF or perf tracing
- Alert on hung_task_timeout_secs triggers tied to kernel threads handling cgroup destruction
- Forward kernel log telemetry to a centralized data lake for cross-host correlation of namespace teardown anomalies
How to Mitigate CVE-2026-46223
Immediate Actions Required
- Apply the upstream kernel patches referenced in commits 33fa2e6b1507a0a377a151a8826438bedad1d0b0 and 93618edf753838a727dbff63c7c291dee22d656b
- Track stable backports from your distribution vendor and schedule reboots on affected hosts
- Inventory container hosts that frequently create and destroy PID namespaces, prioritizing them for patching
Patch Information
The fix defers css percpu_ref kill on rmdir until the cgroup is fully depopulated. The ->css_offline() chain, driven by css_killed_work_fn() via percpu_ref_kill_and_confirm(), now starts only after all tasks have left the cgroup. The user-visible rmdir returns as soon as cgroup.procs and related files are empty. The v2 patch pins the cgroup across the deferred destroy work using explicit cgroup_get() and cgroup_put() around queue_work() and the work function. See Kernel Git Commit 33fa2e6 and Kernel Git Commit 93618ed.
Workarounds
- Avoid having host PID 1 act as the reaper for orphan PIDs reparented during PID namespace teardown where feasible
- Stagger container teardown so that rmdir callers are not simultaneously reaping zombies pinning the same namespace
- If issues persist after patching, the fallback is to revert the full commit chain (d245698d727a through 13e786b64bd3) and reapply the rework in a development branch
# Verify running kernel version and check for the patched commits
uname -r
# Inspect a suspected hung process for cgroup_drain_dying in its stack
cat /proc/$(pidof systemd)/stack
# Enable hung task detection for faster identification
sysctl -w kernel.hung_task_timeout_secs=60
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


