AMX (Apple Matrix Extensions)

AMX — Apple Matrix Extensions — is a matrix-multiply coprocessor present on every Apple Silicon SoC. Apple has never publicly documented its instruction set. Everything we know about it comes from reverse engineering and from Accelerate.framework's assembly.

What's known:

AMX is per-cluster, not per-core. A single coprocessor serves all cores in a cluster.
It has dedicated registers: 8 X registers + 8 Y registers (each 512 bits) for inputs.
It has a 32×32 Z accumulator (1024 bytes) for outputs.
Instructions are emitted via MSR writes to undocumented system registers. Each instruction is a single 32-bit immediate.

You don't program AMX directly. You call Accelerate (specifically BNNS and vDSP) or use Core ML; Accelerate dispatches the right AMX sequence under the hood.

XNU acknowledges AMX exists where the kernel has to save/restore AMX state across context switches.

apple-oss-distributions/xnuosfmk/arm64/cpu_data.hPer-CPU data including the saved AMX state on context switch.View on GitHub

Apple is gradually replacing AMX with SME (Scalable Matrix Extension), the ARM-standard equivalent, on newer silicon.