The XNU scheduler: bands, QoS, and how Mach decides who runs
Real-time, fixed-priority, timeshare, idle — four scheduling classes, 128 priorities, and a QoS layer on top. Here's how XNU picks a thread to put on a core.
Every Mach thread has a priority. XNU's scheduler keeps a run queue per priority band, picks the highest-priority runnable thread, and gives it a quantum on a core. That's the one-line version. The interesting parts are the four classes that share those 128 priorities, the QoS API that's layered on top, and how the kernel keeps a foreground UI thread from being starved by a background backup job.
The four classes
Threads belong to exactly one scheduling class. The class determines how the scheduler treats the thread's stated priority and how it ages over time.
apple-oss-distributions/xnuosfmk/kern/sched_prim.cThe scheduler entry points — pick the next thread, run it, preempt.View on GitHub(line —)- Real-time (
THREAD_TIME_CONSTRAINT_POLICY). The thread declares "I need N ns of CPU every M ns." If the kernel can honor it, it does — over the timeshare classes. Used by Core Audio, the window server's render thread, and a small number of low-latency userspace daemons. - Fixed-priority (
THREAD_PRECEDENCE_POLICYwith a fixed flag). A priority that doesn't age. The thread stays at the level it asked for until it's reset. - Timeshare (
THREAD_PRECEDENCE_POLICYstandard). The default. Priorities decay under CPU use and recover when idle — the classic Unix "be fair" scheduler. - Idle (
THREAD_IDLE_POLICY). Only runs when nothing else is runnable. Background indexing, cache warm-ups, deferred maintenance.
The 128-priority range is partitioned: real-time gets the top band, then kernel threads, then user-mode timeshare, then idle at the bottom. A real-time thread is, by construction, always picked over any timeshare thread that's eligible to run.
Where QoS fits in
If you've called dispatch_async or used a Task in Swift Concurrency, you've selected a QoS class without thinking about it. QoS is a userspace-facing abstraction that maps onto Mach's scheduling classes:
| QoS class | Mach mapping (approx.) |
|---|---|
USER_INTERACTIVE | Timeshare, near the top of the band |
USER_INITIATED | Timeshare, high |
DEFAULT | Timeshare, mid |
UTILITY | Timeshare, low |
BACKGROUND | Idle band |
MAINTENANCE | Idle band, deeper |
QoS isn't only about the CPU. The kernel propagates it into I/O priority (the disk scheduler honors it), into memory pressure (background QoS makes a thread the first to surrender pages under jetsam), and into IPC (a message from a low-QoS sender to a higher-QoS receiver causes a QoS override so the receiver isn't starved by the sender).
This last bit — QoS override — is the magic that makes the system stay responsive. If a user-interactive thread blocks on a Mach port waiting for a reply from a daemon at UTILITY QoS, the kernel temporarily boosts the daemon to USER_INTERACTIVE for as long as the reply is owed. Without this, every IPC round-trip would risk a priority inversion.
What "picking a thread" actually means
When a core wakes up — quantum expired, IPI received, returning from an interrupt — it calls thread_select. The function walks the priority bands top-down, looking for a runnable thread. A few details that matter:
- Per-core run queues, but with a global one for affinity-free threads. The scheduler prefers to keep a thread on the core it last ran on (L1/L2 cache warmth) but will steal across cores when imbalance exceeds a threshold.
- Recommendation bits. Apple Silicon CPUs have performance cores (P-cores) and efficiency cores (E-cores). The scheduler tags threads with a recommendation — "you should run on P", "you should run on E", "you can run anywhere" — derived from QoS + recent behavior. P-cores prefer high-QoS threads; E-cores prefer background work.
- Coalitions. Threads from the same coalition (effectively, the same logical workload — a foreground app and its helpers) share scheduling state, so killing one doesn't unbalance the rest.
How aging works
A timeshare thread doesn't keep its declared priority indefinitely. The scheduler tracks recent CPU usage per thread and decays the effective priority of threads that have been running. The decay is exponential — a few quanta on-CPU costs the thread a few priority points; idling earns them back.
This is the only reason your Mac doesn't lock up when you launch a heavy build. The shell prompt has been mostly idle, so its timeshare priority is high. The compiler suddenly burns CPU, and its priority drops within milliseconds. By the time you Cmd-Tab away, the WindowServer's interactive thread will be picked over the compiler's worker pool.
apple-oss-distributions/xnuosfmk/kern/sched_dualq.cThe dual-queue scheduler — XNU's current implementation of timeshare aging.View on GitHub(line —)Tools to see this on a live system
sample <pid>— stack samples a process for a few seconds. Reports which threads ran and how long.spindump— system-wide hung-process analyzer. Useful when "system slowed to a crawl" — it'll show the QoS of every blocked thread.- Instruments → System Trace — the actual scheduler events (preemption, wake, QoS change) with sub-microsecond resolution.
thread_policy_getfrom a debugger — read back what class/priority a thread is in.
What surprises newcomers
- The QoS API isn't an advisory hint. Setting QoS actually changes the Mach scheduling class. A
BACKGROUNDthread is in the idle band and will be paused by any timeshare work. - A higher priority doesn't mean "more CPU", it means "more of the CPU's attention when there's contention". A lone
BACKGROUNDthread on an otherwise-idle machine gets the same throughput as aUSER_INTERACTIVEone. - QoS overrides propagate through IPC chains. If A waits on B, B waits on C, and A is
USER_INTERACTIVE, all three are temporarily at that QoS until the chain unwinds. - The scheduler can override your recommendation. If you ask for E-core only and the system is idle on P-cores, you may end up on P anyway — the scheduler is allowed to break the recommendation when the alternative is leaving a core idle.
What to read next
Walk a real preemption from interrupt to context switch:
apple-oss-distributions/xnuosfmk/kern/sched_clutch.cThe newer clutch scheduler — bucket-based, used on Apple Silicon for better P/E-core handling.View on GitHub(line —) apple-oss-distributions/xnuosfmk/kern/processor.cPer-core processor structure; where the run queue actually lives.View on GitHub(line —)
And read the Mach ports article again, this time noticing the QoS field in every message header. The scheduler is downstream of IPC.