Skip to content

libdispatch internals: how GCD actually dispatches your blocks

Dispatch queues, the workqueue thread pool, voucher adoption, async vs sync — what happens between your dispatch_async call and your block running on a worker thread.

Published 5 min read
libdispatch / GCD architectureUserspace dispatch queues route to global queues per QoS, which route to the kernel-managed workqueue thread pool. The kernel allocates workers per QoS bucket based on demand and system load.USER-CREATED QUEUES (per-app, serial or concurrent)main queueserial · UI threadmyCustomSerialserial · customimageProcessingserial · custombackground.tasksconcurrent · customswift.cooperativefor Task / awaitGLOBAL QUEUES (one per QoS class)USER_INTERACTIVEQoS priority bucketDEFAULTQoS priority bucketBACKGROUNDQoS priority bucketKernel-managed workqueue thread poolbsd/kern/pthread_shims.c · libpthread/src/qos.cWORKER THREADS — created/destroyed dynamically by the kernelINTERACTIVEDEFAULTUTILITYBACKGROUNDVoucher propagationEvery dispatch_async captures the calling thread's voucher. The worker adopts it while running the block —QoS, accounting bank, importance, and persona all flow through async work without per-call propagation.A UI thread's dispatch_async to a background queue runs at USER_INTERACTIVE QoS while the block executes.

Grand Central Dispatch (GCD) is the userland concurrency framework everyone on macOS/iOS uses, whether they know it or not. Every dispatch_async, every URLSession.dataTask's completion handler, every Swift Task, every +async Objective-C method — they route through libdispatch.

This article walks what's actually happening underneath: the queue layers, the worker thread pool, voucher adoption, and the kernel-side machinery that makes blocking blocks not block worker threads.

Queues all the way down

A dispatch queue is the basic unit of serialization. Submit blocks to a queue and they run in submission order, one at a time:

dispatch_queue_t q = dispatch_queue_create("com.acme.work", DISPATCH_QUEUE_SERIAL);
dispatch_async(q, ^{ /* runs eventually, after previous blocks */ });

Concurrent queues let many blocks run in parallel:

dispatch_queue_t q = dispatch_queue_create("com.acme.parallel", DISPATCH_QUEUE_CONCURRENT);
dispatch_async(q, ^{ /* may run in parallel with others on the same queue */ });

Every process has a few global queues, one per QoS class:

  • DISPATCH_QUEUE_PRIORITY_HIGH (USER_INITIATED).
  • DISPATCH_QUEUE_PRIORITY_DEFAULT.
  • DISPATCH_QUEUE_PRIORITY_LOW (UTILITY).
  • DISPATCH_QUEUE_PRIORITY_BACKGROUND (BACKGROUND).

User-created queues target one of the global queues. Your serial queue might target default global; submitted blocks run on the target's worker threads but serialized through your queue's mutex.

apple-oss-distributions/libdispatchsrc/queue.cdispatch_queue_create and the queue lifecycle.View on GitHub(line )

The workqueue — kernel-side thread pool

The actual threads that run your blocks come from a kernel-managed pool called the workqueue (sometimes spelled pthread_workqueue). One pool per process, with subsystem buckets keyed by QoS.

When libdispatch needs to run a block:

  1. It tries to acquire a free worker thread from the right QoS bucket.
  2. If none is free and the pool has room, the kernel spawns a new worker thread.
  3. The new (or reused) thread starts running blocks from the queue.

apple-oss-distributions/xnubsd/kern/pthread_shims.cThe kernel side of the workqueue — thread allocation, QoS bucketing, ASR (Apple Sensor Rate?) integration.View on GitHub(line ) apple-oss-distributions/libpthreadsrc/qos.cThe userspace side that requests workqueue threads from the kernel.View on GitHub(line )

The kernel-managed pool is shared across libdispatch and direct pthread_workqueue users; the kernel manages thread counts dynamically based on system load and CPU count.

A key property: workqueue threads are kernel-resident. Userspace can't create or destroy them directly; the kernel does, on libdispatch's behalf.

Why dispatch_sync from a worker is dangerous

dispatch_sync(target, block) blocks the calling thread until block runs on target. From any non-worker thread, this is safe.

From a worker thread, calling dispatch_sync on a queue that's currently executing a block on another worker is a recipe for deadlock. The classic mistake: a UI thread submits work to a serial queue with dispatch_sync; the worker that handles that queue happens to need a result from the UI thread; deadlock.

libdispatch has heuristics to detect some of these — DISPATCH_DEBUG=1 in the environment surfaces warnings about cycles. Modern dispatch APIs (Swift async/await) avoid the problem at the language level.

Voucher adoption

Every dispatch_async implicitly captures the current voucher (Mach voucher) — the QoS, resource accounting bucket, and other context the calling thread is currently using. When the block runs on a worker, the worker adopts the captured voucher.

This is how QoS propagates through async work:

  1. UI thread at USER_INTERACTIVE QoS calls dispatch_async(background_queue, ^{ ... }).
  2. The async captures the current voucher (carrying USER_INTERACTIVE).
  3. A worker thread runs the block. While running, the worker has the USER_INTERACTIVE voucher adopted, even though the queue is "background."
  4. When the block returns, the worker releases the voucher and reverts to its default state.

The result: a user-interactive operation's async followups inherit the right priority. The system stays responsive even when work chains across queues.

apple-oss-distributions/libdispatchsrc/voucher.cVoucher capture, adoption, and release for dispatch blocks.View on GitHub(line )

Dispatch sources — kqueue-backed event handlers

A dispatch source is a libdispatch object that fires a handler when a system event occurs:

  • DISPATCH_SOURCE_TYPE_READ / WRITE — file descriptor ready for I/O.
  • DISPATCH_SOURCE_TYPE_VNODE — vnode changes (file rename, write, delete).
  • DISPATCH_SOURCE_TYPE_PROC — process exit/fork.
  • DISPATCH_SOURCE_TYPE_TIMER — periodic or one-shot timer.
  • DISPATCH_SOURCE_TYPE_MACH_RECV — Mach port has a message.
  • DISPATCH_SOURCE_TYPE_DATA_ADD / OR — user-triggered coalescing.

Under the hood, every dispatch source is a kqueue registration. libdispatch maintains a per-process kqueue, registers the source's event filter, and runs the source's handler block on its target queue when the kqueue fires.

This is why every macOS / iOS framework that needs to react to I/O — URLSession, FileHandle observers, NSNotificationCenter on a dispatch queue — uses dispatch sources internally rather than spawning threads to block in read().

Concurrent work and the wide-vs-narrow tradeoff

Submitting 10,000 blocks to a concurrent queue does NOT spawn 10,000 threads. The workqueue throttles concurrency based on:

  • CPU count (typically a worker per logical CPU plus a small overhead).
  • QoS — higher QoS gets more workers earlier.
  • Memory pressure — under pressure, the pool shrinks.

You can request more concurrency with dispatch_apply (parallel for-loop), but the pool still caps it based on hardware availability. Submitting 10K blocks to a serial queue runs them strictly one at a time on a single worker.

Swift concurrency on top

Swift's async/await and Task are built on top of libdispatch's queue infrastructure. A Task runs on a "cooperative pool" which is itself a workqueue bucket; await suspends without blocking a worker (it's a continuation point); actor-isolated calls dispatch onto the actor's serial executor.

The end result: a Swift await doesn't block a worker thread. It frees the worker to run something else, then resumes on (potentially) a different worker when the awaited operation completes. The worker count stays bounded regardless of how many tasks are in flight.

This is the foundation of Swift Concurrency's scalability claim — "millions of tasks, bounded threads."

Inspecting dispatch behavior

DISPATCH_DEBUG=1 /path/to/your-app                # warn on suspicious patterns
DISPATCH_VOUCHER_DEBUG=1 /path/to/your-app        # trace voucher capture/adoption

In Instruments → System Trace, "Dispatch Activity" shows every block enqueue/dequeue. Useful for diagnosing "why is my UI laggy" — often the answer is "a serial queue is overloaded and the next block is blocked behind a slow earlier one."

What surprises newcomers

  • A serial queue runs blocks on whichever worker thread is free. Same queue, potentially different threads across blocks. (dispatch_get_specific can identify "the" queue if you need that.)
  • dispatch_sync from a worker to its own queue deadlocks instantly. libdispatch debug builds catch this.
  • Voucher adoption is invisible unless you're looking for it. It's what makes the system "feel responsive" without anyone having to manually pass QoS hints through every function call.
  • Dispatch sources are kqueue events. The same kernel event filter that powers low-level network apps powers the high-level NSURLSession.

apple-oss-distributions/libdispatchsrc/source.cDispatch source implementation — kqueue integration, coalescing, handler dispatch.View on GitHub(line ) apple-oss-distributions/libdispatchsrc/queue_internal.hThe queue data structures — atomic state machines that drive enqueue/dequeue.View on GitHub(line )

And the kernel synchronization article — libdispatch uses many of the same primitives the kernel uses internally.

Related

Inside dyld — Loader data structures, the modern chained-fixups format, PrebuiltLoaders in the shared cache, and what dlopen actually does at runtime.
Spinlocks, mutexes, reader-writer locks, and lock-class groups — the synchronization primitives XNU offers, when each is appropriate, and how the per-CPU caches stay fast under contention.
From double-click to first window: LaunchServices, launchd, posix_spawn, AMFI, dyld, the shared cache, sandbox profile installation, the runloop. Six subsystems in three seconds.