Skip to content

IOSurface in depth: zero-copy between every coprocessor

The IOKit object that lets CPU, GPU, ANE, ISP, and media engines share the same physical pages. The unifying primitive behind unified memory's performance story.

Published 5 min read

Every modern macOS framework that moves big data between subsystems uses IOSurface under the hood. Metal renders into it. AVFoundation captures video frames into it. Core ML reads tensors from it. XPC services pass it across processes. The unification — every coprocessor can map the same pages — is what makes unified memory's performance story possible.

This article goes deeper than the IOSurface glossary entry: the kernel implementation, the mapping mechanics, and the operations that cross process boundaries.

What IOSurface actually is

An IOSurface is, fundamentally, a chunk of memory that can be mapped multiple ways:

  • As CPU virtual memory (read/write via IOSurfaceGetBaseAddress).
  • As GPU memory referenced by Metal MTLBuffer / MTLTexture resources.
  • As Neural Engine tensor memory referenced by Core ML operations.
  • As ISP / media engine buffer memory for video pipelines.

It's allocated from the unified DRAM pool — same physical pages, multiple mappings. Each mapping has its own pmap entries; the pages themselves are shared.

apple-oss-distributions/xnuiokit/Kernel/IOMemoryDescriptor.cppIOMemoryDescriptor — the kernel-side memory abstraction IOSurface builds on.View on GitHub(line )

The data structure

In kernel terms, an IOSurface is:

  • An IOMemoryDescriptor describing the physical pages.
  • A set of properties (width, height, pixel format, bytes-per-row, plane info for multi-plane formats like YUV).
  • A reference count.
  • A lock state — readers/writers can lock the surface for exclusive access.
  • A handle id — globally unique across the system, used for sharing.

Properties matter because the surface knows enough about its memory to let consumers pick the right access pattern. A 4K BGRA framebuffer has different optimal access characteristics than a 1080p YUV420 video frame.

Cross-process sharing

A handle to an IOSurface can be sent via Mach IPC. The receiver process gets the same physical pages mapped into its own address space (at a different virtual address). Both processes see the same data.

This is how:

  • WindowServer receives every app's framebuffer — apps render into IOSurfaces and send the handles via XPC.
  • AVFoundation capture sessions deliver frames to your app — the capture daemon writes into an IOSurface, sends the handle.
  • CIImage can be backed by an IOSurface that lives in another process's GPU memory.
  • CGImage instances often wrap IOSurface backing for zero-copy across framework boundaries.

The reference counting works across processes: the kernel tracks every task that holds a reference; the surface is destroyed when the last reference drops. A process crashing doesn't leak the surface if other processes hold references.

Lock domains

IOSurface has a fine-grained locking model. You can lock a surface for:

  • CPU read — multiple CPU readers OK, no writers.
  • CPU write — exclusive.
  • GPU read — multiple GPU readers OK.
  • GPU write — exclusive.
  • Combinations — cross-domain locks for cases like "GPU has written, CPU about to read."

The locks coordinate cache flush semantics. When the GPU finishes writing and the CPU wants to read, the appropriate cache-management operations happen at the lock transitions — the L2 cache invalidate or flush that makes the GPU's writes visible to the CPU happens automatically.

On Apple Silicon's unified memory with cache coherency between CPU and GPU, many of these transitions are free. On older Macs with discrete GPUs, the same code worked but cost real DMA bandwidth.

Plane support

For pixel formats like YUV that have multiple planes (one for luminance, one or two for chrominance), IOSurface natively supports the plane structure:

IOSurfaceGetPlaneCount(surface);                 // e.g. 2 for YUV420
IOSurfaceGetBaseAddressOfPlane(surface, 0);      // Y plane
IOSurfaceGetBaseAddressOfPlane(surface, 1);      // CbCr plane
IOSurfaceGetBytesPerRowOfPlane(surface, 0);

Each plane has its own pointer, stride, and pixel format. A YUV422 surface knows the Y plane is full-resolution 8-bit and the chroma plane is half-width 16-bit interleaved. Codecs and ISP outputs use this structure directly without per-plane allocation overhead.

The IOSurface root and IOKit integration

IOSurfaceRoot is the kernel-side service that owns the global allocation and lookup table. It's a singleton IOService registered at boot.

When userspace calls IOSurfaceCreate:

  1. Calls into the kernel via the IOSurfaceRoot user client.
  2. The root allocates pages from the VM (via IOMallocPageable typically).
  3. Creates the IOSurface kernel object, registers it in a per-process and global lookup table.
  4. Returns the handle to userspace.
apple-oss-distributions/xnuiokit/Kernel/IOService.cppThe IOService base class — IOSurfaceRoot is one.View on GitHub(line )

For sending across processes:

  1. Process A wraps the IOSurface handle in an XPC dictionary value.
  2. The XPC serializer recognizes the IOSurface type and sends not just the handle ID but a port right that the receiver can use to look up the surface.
  3. Process B receives the XPC message, extracts the IOSurface, gets its own mapping of the same pages.

This is much cheaper than a normal data copy. A 4K frame is 33 MB; a memcpy of that takes ~3 ms on a fast SSD-backed Mac. Sending an IOSurface handle is ~10 µs.

What surprises newcomers

  • An IOSurface isn't special memory — it's just well-tracked DRAM with multiple mappings.
  • The "zero-copy" claim is literal. The sender writes, the receiver reads, the same physical bytes are involved.
  • IOSurface handles are globally unique — you can look up a surface from any process if you have its handle id and appropriate entitlements.
  • The pages don't go through the regular VM compressor or swap for in-use IOSurfaces; the IOKit wiring keeps them resident. This shows up as "wired memory" in Activity Monitor for heavy IOSurface users.

Inspecting IOSurface use

  • iosurface_list_all (private SPI in some debug builds) — list every IOSurface in the system.
  • vmmap <pid> — IOSurface mappings appear with type IOSurface.
  • hexdump <pid> <addr> (with vmmap'd info) — read the bytes of a known IOSurface.
  • Activity Monitor → Memory — IOSurface-heavy apps (video editors, render-intensive Mac apps) show high wired memory.
apple-oss-distributions/xnuiokit/Kernel/IOMemoryDescriptor.cppThe kernel memory abstraction layer all IOSurface mappings build on.View on GitHub(line )

And the Apple GPU and Metal article for the GPU's perspective on IOSurface, and the AOP and coprocessors article for why every coprocessor's driver uses IOSurface for buffer interchange.

Related

How macOS's GPU driver hands off work to Apple Silicon's tile-based deferred renderer, and why unified memory makes IOSurface zero-copy across CPU and GPU.
What runs in the first microseconds of a Mac boot — the SoC's Boot ROM, the Apple-signed LLB and iBoot stages, the SEP coming up alongside, and how the chain of trust starts.
From plug-in to working app — IOUSBHost enumeration, IOKit matching, the DriverKit dext load, the user-space SDK. A complete trace of one device's journey through the stack.