How fork(), exec(), and posix_spawn work on XNU
The Unix-est of Unix calls, implemented on a Mach kernel. Why fork is awkward on macOS, what exec actually replaces, and why posix_spawn is now the preferred way to start a process.
Every Unix book teaches process creation as fork() + exec(). On Linux that's a fine mental model — both are thin, well-loved syscalls. On macOS it's almost true: fork and exec exist and work, but they're the awkward way. The modern, preferred way is posix_spawn, and the reason traces straight back to Mach.
What fork() actually has to do
POSIX fork() duplicates the calling process. The parent and child both return from fork; the child has a fresh PID and inherits a copy of the parent's address space, open file descriptors, signal handlers, working directory, and credentials.
On a pure-BSD system the implementation is simple: clone the proc struct, copy the file descriptor table, set up COW for the VM, return twice. On XNU there's an extra layer — every BSD proc is married to a Mach task, and the Mach side has to be duplicated too:
- Create a new Mach
taskwith the parent's VM map cloned (COW). - Create a new Mach
threadinside that task to be the child's first thread. - Allocate a new BSD
procand wire it to the new task. - Duplicate the file descriptor table.
- Copy signal disposition state, credentials, the working directory, the umask, the controlling tty.
- Wake the child's Mach thread; both return from
fork.
The thread duplication is the gnarly bit. Mach threads aren't trivially clonable; the child needs its own thread state set up so it returns from fork in userspace with the right register values. The BSD-side code reaches into the Mach side to make this happen.
Why fork is awkward on macOS
A multi-threaded program that forks is in immediate trouble. POSIX says only the forking thread exists in the child — the others are silently terminated. Any mutex the dead threads held in the parent is also held in the child, dead — leading to deadlocks when the surviving thread tries to acquire them.
On Linux this is solved by pthread_atfork, which lets a library register handlers to reset its locks across a fork. On macOS:
- libdispatch (GCD) is not fork-safe. The worker threads vanish; the queue state is corrupted; any post-fork call into GCD likely deadlocks.
- Foundation / Cocoa are not fork-safe. Most reach for some shared queue or Mach service connection during init.
- Core Foundation is not fork-safe.
So in practice, a fork on macOS is only safe if you immediately exec — the slate gets wiped — or if you call only async-signal-safe syscalls between fork and exec, which means no Objective-C, no GCD, no NSLog. Most Mac apps that need to spawn helpers can't even use fork safely.
This is why posix_spawn is the preferred path on macOS, not fork+exec.
exec — replacing the address space without changing the process
execve() (and the libc wrappers like execl, execv) replaces the current process's address space with a fresh image:
- Open and validate the new executable. Check Mach-O header, validate code signature, check entitlements.
- Tear down the current task's VM map.
- Build a new VM map: map the executable's segments, set up the stack with argv + envp + auxv.
- Reset signal dispositions to default, close FDs marked
FD_CLOEXEC, reset the umask, drop dropped-on-exec capabilities. - Hand control to
dyld. The kernel doesn't run the user binary directly — it always starts indyld, the dynamic linker, which maps libraries, runs initializers, then jumps tomain.
apple-oss-distributions/xnubsd/kern/kern_exec.cexec_*, exec_mach_imgact — every exec on macOS goes through here.View on GitHub(line —) apple-oss-distributions/xnubsd/kern/mach_loader.cThe Mach-O loader — parses LC_SEGMENT, LC_CODE_SIGNATURE, LC_MAIN, sets up the new VM map.View on GitHub(line —)
The proc survives exec — same PID, same parent, same FDs (minus CLOEXEC), same credentials. The Mach task survives too, but its VM map is fully replaced. The original threads are terminated and a new main thread is created for the new image.
The fact that the new process always starts in dyld (not in the user binary's entry point) is why DYLD_* environment variables work — the kernel sets them up in auxv, dyld reads them on its way to running user code.
posix_spawn — the modern preferred path
posix_spawn is fork+exec collapsed into a single syscall, with a structured way to express the "what should be different in the child" intent up front. Rather than:
pid_t pid = fork();
if (pid == 0) {
// child — do various setup
dup2(pipe_fd, 1);
chdir("/some/path");
execve(prog, argv, envp);
}
you build a posix_spawnattr_t + posix_spawn_file_actions_t describing the changes, then call:
posix_spawn(&pid, prog, &file_actions, &attrs, argv, envp);
The kernel does everything atomically in the parent process's context. No half-broken state in the child. No GCD danger. No need to pretend the parent's locks aren't there.
apple-oss-distributions/xnubsd/kern/kern_exec.cposix_spawn — search for the function in this file; lives right next to exec.View on GitHub(line —)On Apple platforms, posix_spawn is the path that:
launchduses for every service it starts.- NSTask / Process (Swift) uses under the hood.
- xcrun, xcodebuild, swiftc use to spawn compiler invocations.
- Shell builtins like
bash's$( … )resolve to posix_spawn-equivalents throughvfork+execon Apple's libc.
Crucially, posix_spawn also has flags Apple adds beyond POSIX:
POSIX_SPAWN_SETEXEC— replace the current process (act like a plain exec).POSIX_SPAWN_SETSID— start in a new session.POSIX_SPAWN_START_SUSPENDED— child starts paused; you can attach a debugger.- Various
_POSIX_SPAWN_OSX_*flags for sandboxing, jetsam priority, QoS.
These are how launchd configures a service exactly the way it wants it, in one syscall, with no race window.
What persists, what doesn't — quick reference
| Resource | survives fork | survives exec | survives posix_spawn |
|---|---|---|---|
| PID | ❌ | ✅ | ❌ |
| Parent PID | ✅ | ✅ | ✅ |
| File descriptors | ✅ | ✅ (no CLOEXEC) | configurable |
| Mach ports | ❌¹ | ❌ | ❌ |
| VM mappings | ✅ (COW) | ❌ | ❌ |
| Threads (other than caller) | ❌ | ❌ | ❌ |
| Signal handlers | ✅ | reset to default | reset |
| Working directory | ✅ | ✅ | configurable |
| Credentials | ✅ | ✅ | ✅ |
¹ Mach ports are a per-task resource. The child task has its own fresh ipc_space after fork; only ports the parent explicitly handed over via Mach IPC survive — and that's a different mechanism.
What surprises newcomers
- fork is deprecated for app use on Apple platforms. Apple's own dev docs say so. Use posix_spawn or NSTask.
- The kernel never executes a user binary directly. Every exec starts in
dyld. Even/usr/bin/true. - The child of fork doesn't inherit the parent's threads but does inherit the memory those threads were operating on — including mid-flight locks. This is the source of nearly every fork bug.
execon a script doesn't run the script directly — the kernel reads the shebang line andexecs the interpreter, then the interpreter opens the script and reads it.
What to read next
For the loader:
apple-oss-distributions/xnubsd/kern/mach_loader.cMach-O loading — segments, code-signing validation, dyld handoff.View on GitHub(line —) apple-oss-distributions/dylddyld/main.cppdyld's main — where every process actually starts running its first line of user code.View on GitHub(line —)
And re-read the signals article once more — signal disposition is one of the trickier things fork and exec each handle differently.