Skip to content

The XNU scheduler: bands, QoS, and how Mach decides who runs

Real-time, fixed-priority, timeshare, idle — four scheduling classes, 128 priorities, and a QoS layer on top. Here's how XNU picks a thread to put on a core.

Published 5 min read
XNU scheduling classes and QoS mappingXNU's 128-priority range partitioned into four scheduling classes — real-time at the top, then kernel, then user-mode timeshare, then idle. The QoS classes used by libdispatch and Swift Concurrency map onto bands within the timeshare range.PRIORITY (0 = idle, 127 = real-time)1279663310Real-timeTHREAD_TIME_CONSTRAINT_POLICYCore Audio · window server render threadKernelkthreads — pageout, garbage collection, idle workTimeshare (default)THREAD_PRECEDENCE_POLICYPriorities decay under CPU use,recover when idle. This is whya heavy compiler doesn'tstarve your shell prompt.IdleOnly runs when nothing else is runnableQOS (libdispatch / Swift Concurrency)USER_INTERACTIVEUSER_INITIATEDDEFAULTUTILITYBACKGROUNDQoS overrideWhen a UI thread blocks on a lower-QoSdaemon via IPC, the kernel temporarilyboosts the daemon's QoS so the chaindoesn't priority-invert.

Every Mach thread has a priority. XNU's scheduler keeps a run queue per priority band, picks the highest-priority runnable thread, and gives it a quantum on a core. That's the one-line version. The interesting parts are the four classes that share those 128 priorities, the QoS API that's layered on top, and how the kernel keeps a foreground UI thread from being starved by a background backup job.

The four classes

Threads belong to exactly one scheduling class. The class determines how the scheduler treats the thread's stated priority and how it ages over time.

apple-oss-distributions/xnuosfmk/kern/sched_prim.cThe scheduler entry points — pick the next thread, run it, preempt.View on GitHub(line )
  • Real-time (THREAD_TIME_CONSTRAINT_POLICY). The thread declares "I need N ns of CPU every M ns." If the kernel can honor it, it does — over the timeshare classes. Used by Core Audio, the window server's render thread, and a small number of low-latency userspace daemons.
  • Fixed-priority (THREAD_PRECEDENCE_POLICY with a fixed flag). A priority that doesn't age. The thread stays at the level it asked for until it's reset.
  • Timeshare (THREAD_PRECEDENCE_POLICY standard). The default. Priorities decay under CPU use and recover when idle — the classic Unix "be fair" scheduler.
  • Idle (THREAD_IDLE_POLICY). Only runs when nothing else is runnable. Background indexing, cache warm-ups, deferred maintenance.
apple-oss-distributions/xnuosfmk/kern/thread_policy.cthread_policy_set — the syscall behind every thread_policy_set call from userspace.View on GitHub(line )

The 128-priority range is partitioned: real-time gets the top band, then kernel threads, then user-mode timeshare, then idle at the bottom. A real-time thread is, by construction, always picked over any timeshare thread that's eligible to run.

Where QoS fits in

If you've called dispatch_async or used a Task in Swift Concurrency, you've selected a QoS class without thinking about it. QoS is a userspace-facing abstraction that maps onto Mach's scheduling classes:

QoS classMach mapping (approx.)
USER_INTERACTIVETimeshare, near the top of the band
USER_INITIATEDTimeshare, high
DEFAULTTimeshare, mid
UTILITYTimeshare, low
BACKGROUNDIdle band
MAINTENANCEIdle band, deeper
apple-oss-distributions/libdispatchsrc/queue.cGCD's queue-priority → QoS → thread_policy mapping.View on GitHub(line )

QoS isn't only about the CPU. The kernel propagates it into I/O priority (the disk scheduler honors it), into memory pressure (background QoS makes a thread the first to surrender pages under jetsam), and into IPC (a message from a low-QoS sender to a higher-QoS receiver causes a QoS override so the receiver isn't starved by the sender).

This last bit — QoS override — is the magic that makes the system stay responsive. If a user-interactive thread blocks on a Mach port waiting for a reply from a daemon at UTILITY QoS, the kernel temporarily boosts the daemon to USER_INTERACTIVE for as long as the reply is owed. Without this, every IPC round-trip would risk a priority inversion.

apple-oss-distributions/xnuosfmk/kern/turnstile.cTurnstiles — the kernel data structure that tracks QoS promotion through IPC chains.View on GitHub(line )

What "picking a thread" actually means

When a core wakes up — quantum expired, IPI received, returning from an interrupt — it calls thread_select. The function walks the priority bands top-down, looking for a runnable thread. A few details that matter:

  1. Per-core run queues, but with a global one for affinity-free threads. The scheduler prefers to keep a thread on the core it last ran on (L1/L2 cache warmth) but will steal across cores when imbalance exceeds a threshold.
  2. Recommendation bits. Apple Silicon CPUs have performance cores (P-cores) and efficiency cores (E-cores). The scheduler tags threads with a recommendation — "you should run on P", "you should run on E", "you can run anywhere" — derived from QoS + recent behavior. P-cores prefer high-QoS threads; E-cores prefer background work.
  3. Coalitions. Threads from the same coalition (effectively, the same logical workload — a foreground app and its helpers) share scheduling state, so killing one doesn't unbalance the rest.
apple-oss-distributions/xnuosfmk/kern/coalition.cCoalitions — group threads from the same logical app or daemon for joint scheduling decisions.View on GitHub(line )

How aging works

A timeshare thread doesn't keep its declared priority indefinitely. The scheduler tracks recent CPU usage per thread and decays the effective priority of threads that have been running. The decay is exponential — a few quanta on-CPU costs the thread a few priority points; idling earns them back.

This is the only reason your Mac doesn't lock up when you launch a heavy build. The shell prompt has been mostly idle, so its timeshare priority is high. The compiler suddenly burns CPU, and its priority drops within milliseconds. By the time you Cmd-Tab away, the WindowServer's interactive thread will be picked over the compiler's worker pool.

apple-oss-distributions/xnuosfmk/kern/sched_dualq.cThe dual-queue scheduler — XNU's current implementation of timeshare aging.View on GitHub(line )

Tools to see this on a live system

  • sample <pid> — stack samples a process for a few seconds. Reports which threads ran and how long.
  • spindump — system-wide hung-process analyzer. Useful when "system slowed to a crawl" — it'll show the QoS of every blocked thread.
  • Instruments → System Trace — the actual scheduler events (preemption, wake, QoS change) with sub-microsecond resolution.
  • thread_policy_get from a debugger — read back what class/priority a thread is in.

What surprises newcomers

  • The QoS API isn't an advisory hint. Setting QoS actually changes the Mach scheduling class. A BACKGROUND thread is in the idle band and will be paused by any timeshare work.
  • A higher priority doesn't mean "more CPU", it means "more of the CPU's attention when there's contention". A lone BACKGROUND thread on an otherwise-idle machine gets the same throughput as a USER_INTERACTIVE one.
  • QoS overrides propagate through IPC chains. If A waits on B, B waits on C, and A is USER_INTERACTIVE, all three are temporarily at that QoS until the chain unwinds.
  • The scheduler can override your recommendation. If you ask for E-core only and the system is idle on P-cores, you may end up on P anyway — the scheduler is allowed to break the recommendation when the alternative is leaving a core idle.

Walk a real preemption from interrupt to context switch:

apple-oss-distributions/xnuosfmk/kern/sched_clutch.cThe newer clutch scheduler — bucket-based, used on Apple Silicon for better P/E-core handling.View on GitHub(line ) apple-oss-distributions/xnuosfmk/kern/processor.cPer-core processor structure; where the run queue actually lives.View on GitHub(line )

And read the Mach ports article again, this time noticing the QoS field in every message header. The scheduler is downstream of IPC.

Related

Tasks, ports, messages, and rights — the IPC primitive that quietly carries every IPC on your Mac, from XPC to drag-and-drop.
clonefile, fclonefileat, fs_snapshot — three syscalls that let you copy 50 GB in 50 milliseconds. Here's what happens under each one, and what doesn't get copied.
What changed in XNU when Apple shipped its own ARM silicon — P/E cores, APRR page-permission switching, the AMX matrix coprocessor, and Rosetta 2.