Architecture Identification Macro in CUDA

The __CUDA_ARCH__ macro is defined only for device code. It is a macro that identifies the CUDA architecture version that the NVCC compiler is compiling the code for. When defined, it holds a 3-digit value of the form xy0, which indicates a particular CUDA architecture. For example, sm_20 is stored as 200, sm_13 as 130 and so on.

This macro is useful to compile different code for host and device:

__host__ __device__ void foobar()
{
#ifdef __CUDA_ARCH__
    // Device specific code
#else
    // Host specific code
#endif
}

This macro is also useful to write device code that is specific to particular architectures:

__global__ void fooKernel()
{
#if __CUDA_ARCH__ >= 200
    // Code that uses advanced features
    // introduced in sm_20 and later
#else
    // Fallback code for older architectures
#endif
}

Reference: Section on Virtual Architecture Identification Macro in the NVCC Manual.

Tried with: CUDA 5.5 and Ubuntu 12.04 LTS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.