Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

How to specify architecture to compile CUDA code

📅 2014-Mar-03 ⬩ ✍️ Ashwin Nanjappa ⬩ 📚 Archive

To compile CUDA code, you need to indicate what architecture you want to compile for. And this gets quite confusing because there are three options that can be used: -arch, -code and -gencode.

It does not have to be this complicated, but due to historical and practical reasons it just is. The best way to understand how to use these options is to recall the two-level hierarchy of CUDA architecture. First, is the high-level PTX architecture that acts as a virtual machine. Next, is a class of low-level GPU architectures that are designed to work with the features available in a particular PTX architecture.

This is conceptually how a CUDA compiler works too. For the sake of these compiler options, we can break down the compilation phase into two stages. First, the compiler looks at high-level features that are supported by a particular virtual PTX architecture and creates high-level code for that architecture. Next, the compiler translates the PTX code to the SASS code that is optimized for a particular low-level GPU architecture.

-arch

This compiler option is used to tell the compiler what PTX architecture to aim for in the first stage of compilation. PTX architectures are specified in the format of compute_xy, where xy is the version number of a particular architecture.

For example:

-arch=compute_35

This makes the compiler produce PTX code that the CUDA driver will JIT-compile at runtime to SASS code that will work on all sm_35 GPU architectures.

You can also specify a low-level GPU architecture to this option:

-arch=sm_35

This makes the compiler produce SASS code that will work correctly only on sm_35 GPU. The compiler will pick the PTX architecture that is suitable for sm_35 in the first stage, even though you have not indicated that to it.

Note that only one PTX architecture can be specified using -arch option.

-code

This compiler option is used to tell the compiler what SASS architecture to aim for in the second stage of compilation. It will pick the PTX architecture that is suitable for the SASS architecture you specified.

For example:

-code=sm_21

This makes the compiler produce SASS code that will work correctly only on sm_21 GPU.

You can specify many SASS architectures, but they should all belong to the same class of PTX architecture.

For example:

-code=sm_20,sm_21

-arch -code

These two options can be combined to be more specific (and also confusing). Let us see some examples.

-arch=compute_20 -code=sm_20

Produces SASS code for sm_20.

-arch=compute_20 -code=compute_20,sm_20,sm_21

Produces PTX code for compute_20 and SASS code for both sm_20 and sm_21 GPUs.

-gencode

What if you want to produce PTX or SASS code for many PTX architectures? That is what this option is useful for.

-gencode arch=compute_20,code=sm_20

This usage is similar to earlier. But, the real use is to produce for many PTX architectures, like this:

-gencode arch=compute_20,code=compute_20 -gencode arch=compute_35,code=sm_35

Compute_XY

The different compute_xy values that can be specified are:

compute_10
compute_11
compute_12
compute_13
compute_20
compute_30
compute_35

SM_XY

The different sm_xy values that can be specified are:

sm_10
sm_11
sm_12
sm_13
sm_20
sm_21
sm_30
sm_35

Reference: NVCC Manual, CUDA 5.5