[youki] Fixing bounding capabilities leak when `config.json` omits the capability set

27 May 2026

issue: #3434:failed to drop capabilities in youki exec

fix: #3554:drop bounding caps by default if unset

TL;DR

youki's run and exec paths handled a missing .process.capabilities.bounding inconsistently
youki run didn't touch the bounding capabilities set while youki exec correctly dropped all bounding capabilities
solution is to default unset bounding capabilities to an empty set

Comparison table for clarity

command	\|	reaction to unset bounding	\|	result
`! runc run`	\|	drop all bounding caps	\|	failure (👍)
`! youki exec`	\|	drop all bounding caps	\|	failure (👍)
`! youki run`	\|	don't touch bounding caps	\|	success (should not! 🚨)

What are capabilities sets?

Physically, they are 5 64-bit bitmasks and each thread (not process) has a set of its own. So each thread carries 320 bits of its capabilities data. linux:/include/linux/cred.h#L126-L130

126kernel_cap_t	cap_inheritable; /* caps our children can inherit */
127kernel_cap_t	cap_permitted;	/* caps we're permitted */
128kernel_cap_t	cap_effective;	/* caps we can actually use */
129kernel_cap_t	cap_bset;	/* capability bounding set */
130kernel_cap_t	cap_ambient;	/* Ambient capability set */

Conceptually, they were introduced to Linux for more granular control over different privileges that historically were tightly coupled to the root user (uid=0). Thanks to capabilities, we can, for example, give a container's root process the privilege to kill other processes within the same container (CAP_KILL) without also giving it the right to mount host's file system and escape the container environment (CAP_SYS_ADMIN). This allows tools like supervisord github:supervisor/supervisor to orchestrate processes inside containers while keeping the host system safe.

Logically, they are sets of tokens that a thread (a task) can use to prove to the kernel that it is allowed to perform a privileged operation. On top of that, there are rules describing how these sets affect each other and how processes pass capabilities to their children.

What is the bounding capabilities set?

Firstly, here's the formula for the permitted set after execve() call: (thread_inheritable & file_inheritable) | (file_permitted & thread_bounding_set) | thread_ambient.

The bounding set acts as the upper hard limit for what capabilities a process can gain through the file_permitted & thread_bounding_set term during execve(). Once a capability is removed from the bounding set, no subsequent execve() can reintroduce it via file permitted capabilities. The one exception is the inheritable path: if the capability was already in the thread's inheritable set before the bounding set was reduced, it can still enter permitted through thread_inheritable & file_inheritable during execve().

The ambient path thread_ambient won't allow the new thread/process to get the dropped capability either because Linux enforces the invariant that a cap can only be in the ambient set if it's in both permitted and inheritable. Moreover, dropping a cap from the bounding set also clears the cap from the ambient set.

What was the bug in `youki`?

The actual bug is pretty simple to spot in the code youki:/crates/libcontainer/src/capabilities.rs#L133-L141:

133/// Drop any extra granted capabilities, and reset to defaults which are in oci specification
134pub fn drop_privileges<S: Syscall + ?Sized>(
135  cs: &LinuxCapabilities,
136  syscall: &S,
137) -> Result<(), SyscallError> {
138  // 👀 When bounding is unset `youki` skipped it.
139  if let Some(bounding) = cs.bounding() {
140    tracing::debug!("dropping bounding capabilities to {:?}", bounding);
141    syscall.set_capability(CapSet::Bounding, &to_set(bounding))?;
142  }

The solution is clear updated revision:

133/// Drop any extra granted capabilities, and reset to defaults which are in oci specification
134pub fn drop_privileges<S: Syscall + ?Sized>(
135  cs: &LinuxCapabilities,
136  syscall: &S,
137) -> Result<(), SyscallError> {
138  let empty_caps = Default::default();
139  let bounding = cs.bounding().as_ref().unwrap_or(&empty_caps);
140  tracing::debug!("dropping bounding capabilities to {:?}", bounding);
141  syscall.set_capability(CapSet::Bounding, &to_set(bounding))?;

What about `runc`?

They do it mostly the same, except for the ordering of user setup and capability manipulation. runc:/libcontainer/init_linux.go#L340-L366

340// drop capabilities in bounding set before changing user
341if err := w.ApplyBoundingSet(); err != nil {
342  return fmt.Errorf("unable to apply bounding set: %w", err)
343}
344// preserve existing capabilities while we change users
345if err := system.SetKeepCaps(); err != nil {
346  return fmt.Errorf("unable to set keep caps: %w", err)
347}
348if err := setupUser(config); err != nil {
349  return fmt.Errorf("unable to setup user: %w", err)
350}
351// Change working directory AFTER the user has been set up, if we haven't done it yet.
352if doChdir {
353  if err := unix.Chdir(config.Cwd); err != nil {
354    return fmt.Errorf("chdir to cwd (%q) set in config.json failed: %w", config.Cwd, err)
355  }
356}
357// Make sure our final working directory is inside the container.
358if err := verifyCwd(); err != nil {
359  return err
360}
361if err := system.ClearKeepCaps(); err != nil {
362  return fmt.Errorf("unable to clear keep caps: %w", err)
363}
364if err := w.ApplyCaps(); err != nil {
365  return fmt.Errorf("unable to apply caps: %w", err)
366}

TL;DR

Comparison table for clarity

What are capabilities sets?

What is the bounding capabilities set?

What was the bug in youki?

What about runc?

Further reading

What was the bug in `youki`?

What about `runc`?