Skip to content

Kernel memory allocators in XNU: zalloc, kalloc, slab

The kernel's own malloc — a hierarchy of zone allocators, the kalloc heap, and slab caches for specific types. Different from user-side VM, and just as important.

Published 5 min read
XNU kernel allocator stackThe kernel allocator hierarchy: raw VM pages at the bottom, zone allocator carving pages into per-type slabs, kalloc as the variable-size facade, IOMalloc and direct zone clients on top.KERNEL CLIENTSIPC subsystemipc_port_zone, ipc_kmsg_zoneVFS / vnodesvnode_zoneVM internalsvm_object_zone, vm_page_zoneIOKit driversIOMalloc / IOMallocContiguousFACADESkallocvariable-size allocations · power-of-two bucketsIOMalloc / IOMallocContiguousdriver-facing wrappers · DMA-suitable contiguityZONE ALLOCATOR (zalloc)zone allocator — per-type slab cachesEach zone holds elements of one fixed size. Allocation = pop the free-list head. Per-CPU caches eliminate lock contention on the hot path.ipc_port_zone · 88 Bvnodes · 272 Bkalloc.128 · 128 Bkalloc.512 · 512 Bthreads · 1024 BPAGE-LEVEL VMraw VM pages — kmem_alloc, kernel_memory_allocateosfmk/vm/vm_kern.c

The VM articles so far have covered how XNU manages memory for userspace — vm_map, pmap, the compressor, jetsam. The kernel needs to allocate its own data structures too: proc records, vm_objects, ipc_kmsgs, every single object you've encountered in this series. That's a separate allocator tier.

This article walks XNU's kernel-side allocator stack: the zone allocator at the bottom, kalloc as the general-purpose heap on top, and the per-type slab caches that make hot-path allocation cheap.

The page is the unit, but no one allocates pages directly

The lowest VM primitive is the page (4 KB or 16 KB depending on architecture). The kernel rarely allocates raw pages for itself — page granularity is too coarse for most kernel data structures (a vnode is a few hundred bytes; an ipc_kmsg even smaller).

So XNU stacks two more layers on top:

raw pages (vm_page)
       ↓ wholesale-to-retail
zone allocator (zalloc) — per-type slab caches
       ↓ general-purpose facade
kalloc — variable-size allocations, backed by zones

Drivers, the IPC subsystem, the scheduler — they all allocate from these layers. Tools like zprint(1) let you see every zone live.

Zone allocator (zalloc) — the per-type slab

A zone is a slab cache for objects of a single size or type. Each zone:

  • Owns a list of pages it has allocated from the VM.
  • Carves those pages into fixed-size elements (the type's size).
  • Maintains a free-list of available elements.
  • Allocates / frees in O(1) by pushing/popping the free-list head.

apple-oss-distributions/xnuosfmk/kern/zalloc.cThe zone allocator — XNU's foundational kernel allocator.View on GitHub(line ) apple-oss-distributions/xnuosfmk/kern/zalloc.hzone_t and the zone_alloc / zone_free / zone_create APIs.View on GitHub(line )

When a subsystem needs to allocate many objects of the same type, it creates a dedicated zone at boot:

ipc_object_zones[IOT_PORT] = zone_create("ipc ports", sizeof(struct ipc_port), …);

Then it uses zalloc(zone) / zfree(zone, ptr) for every allocation. Hot path is a free-list pop — a handful of instructions.

Why a separate zone per type? A few reasons:

  • Predictable size — every element in the zone is exactly N bytes, no fragmentation.
  • Locality — elements of the same type cluster on the same pages; cache behavior is better.
  • Quarantine and use-after-free detection — XNU's zone allocator has a "Guard" mode that doesn't immediately reuse freed elements, helping catch UAF bugs.
  • Per-zone statisticszprint shows you exactly how much memory each subsystem is using.

zprint output looks like:

zone name             elt_size  cur_size  max_size  cur_count  ...
ipc ports                  88   524288   ∞          5957
vnodes                    272  3211264   ∞         11800
threads                  1024  4194304   ∞          4096

This is the most useful kernel-memory observability tool on the system. If a kernel data structure is leaking, you'll see its zone's cur_count climbing.

kalloc — the general-purpose heap

Not every allocation has a dedicated zone. For one-off allocations of arbitrary size — kernel strings, temporary buffers, the kernel's equivalent of void *p = malloc(n) — XNU provides kalloc:

apple-oss-distributions/xnuosfmk/kern/kalloc.ckalloc — the general-purpose kernel allocator. Backed by a pool of zones at power-of-two sizes.View on GitHub(line )

kalloc(n) rounds n up to the next power-of-two-ish bucket and allocates from the bucket's underlying zone. There's a zone for 16-byte allocations, one for 32-byte, 64-byte, all the way up to a megabyte. Beyond that, kalloc_large allocates whole pages from the VM.

char *buf = kalloc(128);  // rounds up to 128-byte bucket
kfree(buf, 128);          // caller has to remember the size!

Note the awkward kfree(ptr, size) — caller passes the original allocation size. This is so kfree knows which bucket to return to. Compared to libc free(ptr), which uses metadata to find the size, kernel kfree opts for "caller pays the bookkeeping" to keep allocation overhead low.

Why two layers?

You might ask: if kalloc is just a facade over zones, why isn't everything kalloc?

The answer is correctness and performance:

  • Type-safety with named zones: when an ipc_port allocation comes out of ipc_object_zones[IOT_PORT], the kernel knows everything in that zone is an ipc_port. UAF or type confusion is detectable.
  • Per-type telemetry: when you ship a kernel with thousands of zones, you can answer "where is memory going?" precisely. zprint shows exactly which subsystem is bloating.
  • No size argument on free: zone allocation knows its element size; zfree doesn't need the caller to track it.

So: subsystems that allocate many instances of a single type use a dedicated zone. Subsystems that allocate one-off buffers use kalloc.

Wired vs unwired

All kernel allocations are wired — they live in physical RAM, never paged out. The kernel's own data structures can't be paged because the pageout path itself needs to allocate kernel memory; circular dependencies would deadlock the system.

This is why kernel memory growth shows up as "wired memory" in Activity Monitor. A kernel leak doesn't compress, doesn't swap, doesn't trigger jetsam (well — eventually it does indirectly, because no memory is left for userspace).

The IOMalloc family

IOKit drivers use a slightly different facade — IOMalloc(n) / IOFree(ptr, n) — which under the hood calls kalloc for small sizes and allocates pages directly for large sizes. The IO* wrappers also exist for contiguous physical memory, DMA-suitable memory, etc.

apple-oss-distributions/xnuiokit/Kernel/IOLib.cppIOMalloc / IOFree / IOMallocContiguous — driver-facing memory APIs.View on GitHub(line )

A DMA buffer needs to be contiguous physical memory (the device's DMA engine sees physical addresses, not virtual). IOMallocContiguous allocates such a buffer; you can't get this from kalloc, which makes no contiguity guarantees.

A leak walked

Suppose a driver leaks IOKit allocations. The path to diagnose:

  1. vm_stat shows wired memory climbing.
  2. zprint shows a specific zone's cur_count growing without bound.
  3. The zone name identifies the subsystem (e.g., IOMemoryDescriptor).
  4. Source code of that subsystem reveals which IOMalloc isn't paired with a IOFree.

Compared to userspace leaks (where the leak is in some app's heap and tools like leaks/malloc_history apply), kernel leaks are easier to attribute to a subsystem but harder to attribute to a call site — Apple's internal kernel has additional KASAN-style instrumentation for this; the shipping kernel doesn't.

Lock-free fast paths

Modern XNU zone allocator has per-CPU caches: each CPU keeps a small free-list of elements it can pop without taking a global lock. Only when the per-CPU cache empties does the slow path take the zone's lock, refill the local cache, and retry.

The result: high-frequency allocations (small ipc_kmsgs during a Mach storm, mbufs during heavy network I/O) are essentially lock-free in the common case.

apple-oss-distributions/xnuosfmk/kern/zcache.cPer-CPU zone caches — the fast path for hot allocators.View on GitHub(line )

What surprises newcomers

  • Most kernel allocations are slab-style, not heap-style. The "kernel heap" is mostly a thin facade over a tier of slab caches.
  • kfree requires the size. This is intentional — saves per-allocation bookkeeping.
  • All kernel memory is wired. The kernel can never page out its own state; that's why kernel leaks are catastrophic.
  • zprint is your friend. If something on your Mac is bloating wired memory, zprint will tell you which subsystem.

apple-oss-distributions/xnuosfmk/kern/zone_internal.hThe internal zone structure — head/tail pointers, locks, statistics counters.View on GitHub(line ) apple-oss-distributions/xnuosfmk/vm/vm_kern.ckmem_alloc — the very-low-level allocator that gives raw page ranges to zones and kalloc.View on GitHub(line )

And re-read the virtual memory overview — once you've seen the kernel-side allocators, the user-side VM machinery is the half that paged-in to the user.

Related

How XNU responds when memory gets tight — the four-stage pressure pipeline from free pages through compression to swap to process termination, and what each stage costs.
Walk a single mmap call from libc, through BSD into Mach VM, the lazy first-touch fault, and the pmap entry that finally makes the file accessible as memory.
Every macOS process gets a private address space it can't possibly afford. Here's how XNU gives it one anyway — pmap, vm_map, the compressor, and jetsam.