dyld in depth: chained fixups, prebuilt loaders, and dlopen internals
Inside dyld — Loader data structures, the modern chained-fixups format, PrebuiltLoaders in the shared cache, and what dlopen actually does at runtime.
Every Mach-O binary on macOS — every app, every command-line tool, every helper process — starts execution in dyld, not in its own main. dyld maps its dependencies, resolves symbols, runs initializers, then jumps to your code. This article walks the data structures, the chained-fixup format that replaced classic relocations, the prebuilt-loader optimization that makes cached dylibs O(1), and what dlopen actually does at runtime.
The Loader object
dyld's core abstraction is the Loader — one per loaded image (main executable, framework, dylib). Each Loader holds:
- A pointer to the Mach-O header in memory.
- The image's load address (with ASLR slide applied).
- A list of dependencies (other Loaders this one needs).
- A list of segments (mapped ranges with permissions).
- Initialization state (have static initializers run yet?).
There are two concrete kinds:
PrebuiltLoader— for images in the dyld shared cache. Pre-computed at cache build time; loading is essentially "find the prebuilt loader, use its precomputed dependency graph and fixup data."JustInTimeLoader— for images NOT in the cache (your main binary, third-party frameworks). Constructed fresh at load time; dyld parses load commands, resolves dependencies, computes fixups.
The cache path is dramatically faster — for a typical app, almost all dependencies are PrebuiltLoaders.
The boot sequence inside dyld
When the kernel hands off to dyld (see the boot article and the app launch flow):
- dyld's own bootstrap — dyld is itself a Mach-O. Its first job is to relocate itself (resolve its own fixups) using a tiny bootstrap stub.
- Map the main executable — already mapped by the kernel, but dyld parses its load commands to learn about its dependencies.
- Resolve the dependency closure — for each
LC_LOAD_DYLIB, look up the dylib in the cache; if found, instantiate a PrebuiltLoader. If not, mmap the file and instantiate a JustInTimeLoader, recursively pulling in its dependencies. - Apply fixups to images outside the cache (PrebuiltLoaders' fixups are mostly done; JustInTimeLoaders need work).
- Run static initializers in dependency order — every Objective-C
+ initialize, every C++ static constructor, every Swift global initializer. - Call
main(orNSApplicationMain, etc.).
The total cost varies enormously: a "hello world" CLI tool might do all of this in 10 ms; a Mac app with many third-party Swift frameworks (each outside the cache) can take 100+ ms.
Chained fixups — the modern relocation format
Classic Mach-O used two separate data structures for runtime patching:
- Rebase info — pointers in the binary that need to be adjusted by the ASLR slide.
- Bind info — pointers that need to be patched with the address of an external symbol.
Both were variable-length opcode streams in the binary; dyld interpreted them at load time. Patching could be many MB of opcodes to walk.
Chained fixups — introduced in 2020 — replace both with a single mechanism:
- The pointer at the fixup location itself encodes the fixup metadata. A 64-bit slot holds either a small offset to the next fixup in the chain (plus the actual value to be patched in), or a sentinel ending the chain.
- The binary stores the starts of fixup chains — just a handful of starting offsets per page.
- dyld walks each chain, applying the fixups in O(fixups), not O(opcode stream).
For a typical binary, chained fixups are 5-10x faster to apply than the old opcode-stream format. They also let the shared cache builder pre-link every cached dylib once, encoding the cross-dylib references as chained fixups that just need the cache's base address to be valid (which it always is, mapped at the same address per-boot).
dlopen internals
dlopen(path, mode) at runtime:
- dyld checks if the dylib is already loaded — if yes, increment its load count and return.
- If
pathis a system framework, check the shared cache — if found, use its PrebuiltLoader (instant). - Otherwise, map the dylib file with
mmap, create a JustInTimeLoader. - Recursively load the dylib's own dependencies.
- Apply fixups for the new dylib.
- Run its static initializers.
- Return a handle.
The mode flags affect symbol resolution:
RTLD_NOW— resolve all symbols immediately.RTLD_LAZY— resolve symbols on first use (the default; faster initial load).RTLD_GLOBAL— symbols are available to subsequentdlopens;RTLD_LOCALkeeps them scoped.
dlclose decrements the load count; when zero, dyld unloads the dylib if it can (calls finalizers, unmaps memory).
Mac apps that use plugin architectures (Final Cut Pro plugins, Photoshop plugins, etc.) rely heavily on dlopen. The per-dylib cost is the JustInTimeLoader work — for system frameworks already in the cache, dlopen is essentially free.
The interposing mechanism
dyld supports a deliberate function-replacement mechanism called interposing. A dylib declares a table of (replacement_fn, target_fn) pairs; at load time, dyld rewrites every reference to target_fn in subsequently-loaded images to point at replacement_fn instead.
This is how tools like MallocStackLoggingNoCompact work — they interpose malloc/free/calloc with logging wrappers.
Interposing only affects images loaded after the interposer. The shared cache is generally pre-resolved and can't be interposed; this is one reason DYLD_INSERT_LIBRARIES doesn't work against system frameworks anymore on hardened-runtime apps.
DYLD_* environment variables
A long history of debug/customization knobs:
DYLD_INSERT_LIBRARIES— preload an extra dylib (the old interposing knob).DYLD_FRAMEWORK_PATH/DYLD_LIBRARY_PATH— override search paths.DYLD_PRINT_BINDINGS/DYLD_PRINT_INITIALIZERS/DYLD_PRINT_STATISTICS— debug-print at load time.DYLD_SHARED_REGION— bypass the shared cache (developer-only).
Modern macOS strips these from the environment when launching hardened-runtime apps (most apps now). Library validation (see AMFI) also refuses arbitrary inserted dylibs.
Inspecting a binary's loader
dyld_info -dependents /usr/bin/git # what does it depend on?
dyld_info -shared_cache_info /usr/bin/git # which cache does it use?
DYLD_PRINT_STATISTICS=1 /usr/bin/some-app # print launch timing breakdown
DYLD_PRINT_INITIALIZERS=1 /usr/bin/some-app # see every initializer run
DYLD_PRINT_STATISTICS output is the most useful for app startup performance work — it breaks launch time into "dylib loading", "rebase/binding", "ObjC setup", "initializer time", "slowest initializers."
What surprises newcomers
- dyld is itself a Mach-O — it bootstraps itself, then loads everything else.
- Static initializers can do anything, including reading files and making network calls. They're a frequent source of startup slowness. Modern Swift recommends
+ initializeonly for trivial work. - The shared cache is what makes app startup fast; without it, dyld would have to parse and link hundreds of dylibs per launch.
- Chained fixups are the reason modern binaries load fast even with thousands of cross-dylib references.
What to read next
apple-oss-distributions/dylddyld/PrebuiltLoader.cppThe prebuilt loader path — fast load for cached dylibs.View on GitHub(line —) apple-oss-distributions/dylddyld/JustInTimeLoader.cppThe on-demand loader path for non-cached dylibs.View on GitHub(line —)
And the dyld shared cache article for the cache's role in everything dyld does.