📅 2014-Mar-03 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ cuda, nvcc, ptx, sass ⬩ 📚 Archive
To compile CUDA code, you need to tell NVCC what architecture you want to target and what kind of code you want to generate. Specifying this to NVCC can be confusing because there are up to three options that can be used: -arch
, -code
and -gencode
.
It does not have to be this complicated, but due to historical and practical reasons it just is. The best way to understand how to use these options is to recall the two-level hierarchy of CUDA architecture. First, is the high-level PTX architecture that acts as a virtual machine. Next, are the actual low-level GPU architectures that are designed to work with the features available in a particular PTX architecture.
This is conceptually how a CUDA compiler works too. For the sake of these compiler options, we can break down the compilation phase into two stages. First, the compiler looks at high-level features that are supported by a particular virtual PTX architecture and creates high-level code for that architecture. Next, the compiler translates the PTX code to the SASS code that is optimized for a particular low-level GPU architecture.
This compiler option is used to tell the compiler what PTX architecture to aim for in the first stage of compilation. PTX architectures are specified in the format of compute_xy
, where xy
is the version number of a particular architecture.
For example:
-arch=compute_21
This makes the compiler produce PTX code that the CUDA driver will JIT-compile at runtime to SASS code that will work on all sm_21
and future GPU architectures.
You can also specify a low-level GPU architecture to this option:
-arch=sm_21
This makes the compiler produce SASS code that will work correctly only on sm_21
GPUs. The compiler will pick the PTX architecture that is suitable for sm_21
in the first stage, even though you have not indicated that to it.
Note that only one PTX architecture can be specified using -arch
option.
This compiler option is used to tell the compiler what SASS architecture to aim for in the second stage of compilation. It will pick the PTX architecture that is suitable for the SASS architecture you specified.
For example:
-code=sm_21
This makes the compiler produce SASS code that will work correctly only on sm_21
GPU.
You can specify many SASS architectures, but they should all belong to the same class of PTX architecture.
For example:
-code=sm_20,sm_21
These two options can be combined to be more specific (and also confusing). Let us see some examples.
Produces SASS code for sm_20
:
-arch=compute_20 -code=sm_20
Produces PTX code for compute_20
and SASS code for both sm_20
and sm_21
GPUs:
-arch=compute_20 -code=compute_20,sm_20,sm_21
What if you want to produce PTX or SASS code for many PTX architectures? That is what this option is useful for:
-gencode arch=compute_20,code=sm_20
This usage is similar to earlier. But, the real use is to produce for many PTX architectures, like this:
-gencode arch=compute_20,code=compute_20 -gencode arch=compute_35,code=sm_35
For the full list of virtual architectures supported by NVCC, see the Virtual Architecture Feature List.
For the full list of real architectures supported by NVCC, see the GPU Feature List.
A handy list of how to and how not to specify the above compiler options:
# Generate only PTX
-arch compute_35
-arch compute_35 -code compute_35
-gencode arch=compute_35,code=compute_35
# Generate only SASS
-arch compute_35 -code sm_35
-gencode arch=compute_35,code=sm_35
# Generate both PTX and SASS
-arch sm_35
-gencode arch=compute_35,code=\"sm_35,compute_35\"
# !!!WRONG!!! specifications
# NVCC complains that you need to specify -arch with a virtual code architecture like compute_35
-code compute_35
-code sm_35
-arch sm_35 -code compute_35
-arch sm_35 -code sm_35