metal-toolchain - metal compiler toolchain overview
The Metal toolchain consists of a set of programs targeting Apple GPUs. The goal of this document is to provide an overview of the toolchain behavior. Refer to the documentation of individual programs for more specific information.
Metal supports two compilation mode: split-compilation and traditional.
In the split-compilation mode, the toolchain targets the AIR virtual target. Final translation to the actual GPU binary code is performed at runtime. In the more traditional mode, the toolchain directly emits binary code compatible with the selected GPU target.
The architecture of the AIR virtual target is air64. There are different subarchitectures for air64. Each architecture is associated with a platform version.
The currently supported AIR achitectures, together with their native platform versions are:
Native GPU targets are in the <vendor>gpu_<arch> form, where <vendor> can be apple, amd, or intel; <arch> identifies the actual GPU architecture.
Known Apple GPU architectures are:
Known AMD GPU architectures are:
Known Intel GPU architectures are:
Having multiple architectures allows to store inside the same universal binary multiple binaries, each targeting a different version of the same platform.
The AIR toolchain is able to target the following platforms:
Starting with air64_v23, all platforms are compatible with each other. So for instance you can link an air64_v23-apple-iphoneos14 object and an air64_v23-apple-macos11 object together.
There two main inputs of the AIR toolchain are Metal source files and Metal scripts. The canonical extension of Metal source files is .metal. The canonical extension of Metal scripts is .mtlp-json.
Metal scripts are consumed by tools emitting GPU binary code. Depending on the code being emitted, a Metal script might be required or not. For instance, a Metal script is required to emit a pipeline, but it is not required when emitting a dynamic library.
The AIR toolchain emits MetalLibs and MachOs. The former stores AIR binaries. The latter stores GPU binaries.
The AIR toolchain also emits universal binaries, that can contains both MetalLib and MachO slices at the same time.
The AIR toolchain provides two main compiler drivers: metal and metal-tt.
metal primary goal is to translate a bunch of source files into MetalLibs, MachOs, or universal binaries.
What is actually emitted depends on the selected target architectures. If more than one architecture is selected, a universal binary is emitted. Otherwise, if the target architecture is AIR a MetalLib is emitted. If the target architecture is a GPU architecture, a MachO is emitted.
$ metal -arch air64_v23 foo.metal -o foo.metallib
Emits a MetalLib.
$ metal -arch applegpu_g13s foo.metal -N foo.mtlp-json -o foo.metallib
Emits a MachO.
$ metal -arch air64_v23 -arch applegpu_g13s foo.metal -N foo.mtlp-json -o foo.metallib
Emits a universal binary, with one MetalLib slice and one MachO slice.
The most efficient way to use the metal driver is to independently compile a bunch of source files, followed by a link step:
$ metal -arch air64_v23 -c foo.metal -o foo.air $ metal -arch air64_v23 -c bar.metal -o bar.air $ metal -arch air64_v23 foo.air bar.air -o foobar.metallib
Since the emission of GPU binaries starts from MetalLibs, it is only needed to specify a GPU architecture at the link step:
$ metal -arch air64_v23 -c foo.metal -o foo.air $ metal -arch air64_v23 -c bar.metal -o bar.air $ metal -arch applegpu_g13s foo.air bar.air -N foobar.mtlp-json -o foobar.metallib
The metal driver requires to be told what architectures to target, which can be challenging when a large number of GPU architectures has to be targeted. The metal-tt driver solves this problem by automatically targeting all the GPU architectures supported by the toolchain:
$ metal -arch air64_v23 foo.metal -o foo.metallib-air64_v23 $ metal-tt foo.metallib-air64_v23 foo.mtlp-json -o foo.metallib
The produced foo.metallib contains one slice for each supported GPU architecture, plus the air64_v23 slice produced by metal.
A target is composed of a target architecture and a target platform.
Generally speaking, the target used by a compiler driver can be explicitly spelled out in the compiler driver command line. If the target is only partially spelled out -- e.g. the command line only specifies the target architecture -- the remaining components of the target are deduced by the compiler driver.
The deduction process is specific to each compiler driver, but it generally split deduction into two steps: selection of an architecture, followed by selection of a platform.
The default architecture is air64.
The platform is selected starting from the system root. If the system root points to a Darwin SDK, the target platform is set to the one of the SDK.
For instance assuming iPhoneOS16.0.sdk contains a valid iPhoneOS SDK, the target selected by the following command:
$ metal -isysroot iPhoneOS16.0.sdk foo.metal -o foo.metallib
Would be air64-apple-iphoneos16.0.
The system root can also be set using the SDKROOT environment variable. On Darwin, development tools are usually invoked using xcrun, which automatically sets SDKROOT to the selected SDK. Thus this command:
xcrun -sdk iphoneos metal foo.metal -o foo.metallib
Will target air64-apple-iphoneosX.Y, where X.Y is the iPhoneOS SDK target platform found by xcrun.
The metal-arch tool prints information about the architectures of the GPUs available in the current platform.
The metal-config tool prints information about the GPU architectures that can be targeted by the current toolchain.
To report bugs, please visit <https://developer.apple.com/bug-reporting/>.
metal(1), metal-arch(1), metal-config(1), metal-pipelines-script(5), metal-tt(1), xcrun(1)
Metal Shading Language Specification: <https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf>
2014-2023, The Metal Team
|August 2, 2023||32023|