Kernel synchronization in XNU: locks, atomics, and lock-free patterns
Spinlocks, mutexes, reader-writer locks, and lock-class groups — the synchronization primitives XNU offers, when each is appropriate, and how the per-CPU caches stay fast under contention.
Every nontrivial kernel data structure needs concurrency control. XNU offers a small, opinionated set of synchronization primitives — fewer than Linux, more than a microkernel — and each has a specific niche. This article walks them in order of cost and use case, ending with the lock-class system that catches priority inversions and deadlocks.
The four primitives
The XNU primitives, from cheapest to most expensive:
- Atomic operations (
OSCompareAndSwap,atomic_store_explicit, etc.) — single-instruction read-modify-write, no lock at all. - Spinlocks (
lck_spin_t) — bare-metal busy-wait. Cheap if uncontended; brutal under contention. - Mutexes (
lck_mtx_t) — block-and-wait when contended. The default for most kernel data. - Reader-writer locks (
lck_rw_t) — many readers OR one writer. Useful when reads dominate.
apple-oss-distributions/xnuosfmk/kern/locks.hThe lock type declarations — lck_spin_t, lck_mtx_t, lck_rw_t, plus the group infrastructure.View on GitHub(line —) apple-oss-distributions/xnuosfmk/kern/locks.cThe actual implementation — lock/unlock fast paths, contended slow paths, debug instrumentation.View on GitHub(line —)
Atomic operations — when no lock is needed
For simple counters, flags, and pointer swaps, you don't need a lock. The CPU's atomic instructions (compare-and-swap, fetch-and-add, exclusive load/store on ARM) are enough.
XNU exposes these via:
os/atomic.h— modern C11-styleatomic_*operations, preferred for new code.libkern/OSAtomic.h— older XNU-specific names (OSCompareAndSwap,OSAddAtomic), still everywhere in legacy code.
A reference count, for example:
os_atomic_inc(&obj->refcount, relaxed); // increment, no ordering needed
if (os_atomic_dec(&obj->refcount, release) == 0) { // decrement with release barrier
free(obj);
}
The release ordering on the final decrement ensures any stores that produced obj's state happen-before any thread that sees refcount=0 and frees it.
Lock-free programming with atomics is hard to get right — the memory ordering specifiers (relaxed, acquire, release, seq_cst) need careful thinking. The kernel uses them in hot paths (zone allocator's per-CPU caches, scheduler runqueues), but most kernel code reaches for a lck_mtx first.
Spinlocks — for very short critical sections
A lck_spin_t is a busy-wait lock. A thread trying to acquire spins on an atomic flag until it can claim the lock. While spinning:
- The CPU is not available for other work.
- The holder of the lock had better be running on another CPU; if it's been preempted, the spinner is wasting time on a thread that's not making progress.
For these reasons, spinlocks are appropriate only when:
- The critical section is very short (a few dozen instructions).
- The holder won't block, sleep, or take any other lock that might block.
- You're at a priority level where you can't sleep anyway (interrupt context, scheduler internals).
The kernel's top-half interrupt handlers (see the interrupt handling article) often need spinlocks because they run with interrupts disabled and can't sleep. The scheduler itself uses spinlocks to protect runqueue manipulation.
lck_spin_lock(&proc_list_spin);
// short, no-sleep critical section
lck_spin_unlock(&proc_list_spin);
Mutexes — the default
lck_mtx_t is the workhorse. Threads that can't acquire block and sleep; they're woken when the holder releases. Unlike spinlocks, the OS can put the waiting thread to use elsewhere.
lck_mtx_lock(&my_data_lock);
// arbitrarily long critical section, may sleep, may take other locks
lck_mtx_unlock(&my_data_lock);
Most kernel data structures protect themselves with a lck_mtx_t. The proc table, the vnode list, IOKit objects, file descriptors — all guarded by mutexes.
XNU mutexes have a few interesting properties:
- Adaptive spinning: a short spin attempt before blocking, on the theory that the holder might release very soon and avoiding a context switch wins.
- Priority inheritance: a low-priority thread holding a mutex contended by a high-priority thread gets boosted to the higher priority for the duration. Prevents priority inversion.
- Lock ordering tracking (debug builds): the kernel records the order locks are acquired and panics if it ever sees a reverse-order acquisition that could deadlock.
Reader-writer locks — when reads dominate
lck_rw_t allows either:
- Many threads holding the lock as readers simultaneously, or
- One thread holding it as a writer, with no readers.
Useful when a data structure is read often and written rarely. The vnode name cache, the IORegistry tree, the network routing table — all read-heavy and protected by rwlocks.
lck_rw_lock_shared(&cache_rw);
// read-only access; multiple readers can be here at once
lck_rw_unlock_shared(&cache_rw);
lck_rw_lock_exclusive(&cache_rw);
// exclusive write access; readers and other writers blocked
lck_rw_unlock_exclusive(&cache_rw);
The cost of an rwlock is higher than a mutex (more state to track, more complex contention handling). For low-read-frequency data, a mutex is cheaper.
Lock groups — XNU's lock classification system
Every lock in XNU belongs to a lock group (lck_grp_t). The group is created once at module init:
lck_grp_t *my_grp = lck_grp_alloc_init("MySubsystem", LCK_GRP_ATTR_NULL);
lck_mtx_init(&my_mtx, my_grp, LCK_ATTR_NULL);
The group serves several purposes:
- Per-group statistics: contention counts, hold times. Visible via
lockstatand kernel telemetry. - Deadlock detection in debug builds: lock acquisition order is recorded per-group; cross-group ordering violations are panic-able.
- Reset / cleanup: tearing down a subsystem can destroy all its locks via the group.
lockstat(1) is the diagnostic tool — when you're investigating "why is this kernel slow", looking at top contended lock groups is often the answer.
IRQ-safe variants
Spinlocks come in two flavors:
lck_spin_lock— regular. Caller responsible for ensuring no IRQ context.lck_spin_lock_grp_irq_set— disables interrupts while held. Required if both an interrupt handler and a thread context can take the lock; without the IRQ disable, a thread holding the lock that's interrupted by a handler trying to acquire the same lock would deadlock on the same CPU.
This concern doesn't apply to mutexes — they can't be taken in interrupt context anyway, so there's no risk of an interrupt taking a mutex the current thread holds.
The cost ladder
Rough cost on a modern Apple Silicon Mac:
| Primitive | Uncontended | Contended (high load) |
|---|---|---|
| Atomic op | ~1-5 cycles | depends on cache line |
| Spinlock | ~10-20 cycles | spins burning CPU |
| Mutex (no contention) | ~30-50 cycles | block + reschedule |
| RWLock | ~50-100 cycles | many readers OK |
The right choice depends on how often you contend and how long the critical section is. The kernel has thousands of locks; the choice matters at scale.
Lock-free patterns in the wild
A few places where XNU avoids locks via clever data structure design:
- Per-CPU zone caches — each CPU has its own free-list; allocations rarely cross CPUs, so the hot path is lock-free. See the kernel allocators article.
- RCU-style patterns — readers see a consistent view via load-acquire on a pointer; writers swap the pointer with release ordering, deferring deallocation of the old object until no reader can be using it.
- Wait-free queues — for high-frequency producer/consumer paths like the scheduler's runqueue and the IPC message queues.
These patterns are hard to write correctly. XNU uses them strategically — most code is mutex-protected, but the few percent that's hot is lock-free.
What surprises newcomers
- Spinlocks are for very specific cases. Default to
lck_mtx_t; reach for spin only when you can't block (interrupt context, scheduler internals). - Priority inheritance is built into mutexes. A low-QoS thread holding a mutex needed by a high-QoS one gets boosted. This is how QoS overrides propagate through lock chains too.
- Lock groups are mandatory. You can't create a lock without naming a group; the group is what gives
lockstatvisibility. - Reverse-order lock acquisition panics in debug builds. XNU tracks order per group; an "always take A before B" rule, violated, will panic with a clear message in development kernels.
What to read next
apple-oss-distributions/xnuosfmk/kern/lck_attr.cLock attribute objects — debug flags, contention counting, priority inheritance options.View on GitHub(line —) apple-oss-distributions/xnuosfmk/kern/sched_prim.cThe scheduler — heavy user of spinlocks on the runqueue and per-processor structures.View on GitHub(line —)
And the context switch walkthrough — many of the locks XNU's hottest paths take live in the scheduler.