Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

How to specify architecture to compile CUDA code

📅 2014-Mar-03 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ cuda, nvcc, ptx, sass ⬩ 📚 Archive

To compile CUDA code, you need to tell NVCC what architecture you want to target and what kind of code you want to generate. Specifying this to NVCC can be confusing because there are up to three options that can be used: -arch, -code and -gencode.

It does not have to be this complicated, but due to historical and practical reasons it just is. The best way to understand how to use these options is to recall the two-level hierarchy of CUDA architecture. First, is the high-level PTX architecture that acts as a virtual machine. Next, are the actual low-level GPU architectures that are designed to work with the features available in a particular PTX architecture.

This is conceptually how a CUDA compiler works too. For the sake of these compiler options, we can break down the compilation phase into two stages. First, the compiler looks at high-level features that are supported by a particular virtual PTX architecture and creates high-level code for that architecture. Next, the compiler translates the PTX code to the SASS code that is optimized for a particular low-level GPU architecture.

-arch

This compiler option is used to tell the compiler what PTX architecture to aim for in the first stage of compilation. PTX architectures are specified in the format of compute_xy, where xy is the version number of a particular architecture.

For example:

-arch=compute_21

This makes the compiler produce PTX code that the CUDA driver will JIT-compile at runtime to SASS code that will work on all sm_21 and future GPU architectures.

You can also specify a low-level GPU architecture to this option:

-arch=sm_21

This makes the compiler produce SASS code that will work correctly only on sm_21 GPUs. The compiler will pick the PTX architecture that is suitable for sm_21 in the first stage, even though you have not indicated that to it.

Note that only one PTX architecture can be specified using -arch option.

-code

This compiler option is used to tell the compiler what SASS architecture to aim for in the second stage of compilation. It will pick the PTX architecture that is suitable for the SASS architecture you specified.

For example:

-code=sm_21

This makes the compiler produce SASS code that will work correctly only on sm_21 GPU.

You can specify many SASS architectures, but they should all belong to the same class of PTX architecture.

For example:

-code=sm_20,sm_21

-arch -code

These two options can be combined to be more specific (and also confusing). Let us see some examples.

Produces SASS code for sm_20:

-arch=compute_20 -code=sm_20

Produces PTX code for compute_20 and SASS code for both sm_20 and sm_21 GPUs:

-arch=compute_20 -code=compute_20,sm_20,sm_21

-gencode

What if you want to produce PTX or SASS code for many PTX architectures? That is what this option is useful for:

-gencode arch=compute_20,code=sm_20

This usage is similar to earlier. But, the real use is to produce for many PTX architectures, like this:

-gencode arch=compute_20,code=compute_20 -gencode arch=compute_35,code=sm_35

Compute_XY

For the full list of virtual architectures supported by NVCC, see the Virtual Architecture Feature List.

SM_XY

For the full list of real architectures supported by NVCC, see the GPU Feature List.

Cheatsheet

A handy list of how to and how not to specify the above compiler options:

# Generate only PTX
-arch compute_35
-arch compute_35 -code compute_35
-gencode arch=compute_35,code=compute_35

# Generate only SASS
-arch compute_35 -code sm_35
-gencode arch=compute_35,code=sm_35

# Generate both PTX and SASS
-arch sm_35
-gencode arch=compute_35,code=\"sm_35,compute_35\"

# !!!WRONG!!! specifications
# NVCC complains that you need to specify -arch with a virtual code architecture like compute_35
-code compute_35
-code sm_35
-arch sm_35 -code compute_35
-arch sm_35 -code sm_35