Pip install error with PyCUDA

Problem

  • I tried to install PyCUDA using pip:
$ sudo pip install pycuda
  • The installation tries to compile a few C++ files and it failed on the very first file with this error:
In file included from src/cpp/cuda.cpp:1:0:
src/cpp/cuda.hpp:14:18: fatal error: cuda.h: No such file or directory
#include <cuda.h>
                ^
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Investigation

  • This error was strange because I had set CUDA_ROOT and had added the bin path of CUDA installation to PATH environment variable. So, the installer should have found cuda.h which I could see was present in $CUDA_ROOT/include

  • To see what was happening, I tried the same command with verbosity:

$ sudo pip -vvv install pycuda
  • Now I could see that it was failing to find nvcc.

  • On downloading the source code of PyCUDA and checking setup.py, I saw that the check for nvcc was used to figure out the CUDA_ROOT and CUDA_INC_DIR.

  • The reason nvcc was not visible was that CUDA_ROOT was set for my user, but this PATH is not visible when a command is run under sudo, as described here. The solution was to make the CUDA bin path visible to sudo.

Solution

To make the $CUDA_ROOT/bin available in PATH for sudo, we can follow the steps described here. For example, on my system with CUDA 7.0 I followed these steps:

  • Created a new file /etc/profile.d/cuda.sh and added this line:
export PATH=/usr/local/cuda-7.0/bin:$PATH
  • Opened root shell without resetting PATH and ran the pip installation:
$ sudo su -
$ pip install pycuda

This worked and PyCUDA was installed successfully! 🙂

Tried with: PyCUDA 2015.1.2, CUDA 7.0 and Ubuntu 14.04

target_include_directories does not work with CUDA

Problem

target_include_directories is an useful CMake directive to specify the include directories for building a particular target, as described here. The FindCUDA module for CMake which handles CUDA compilation seems to completely ignore this directive. The include directories specified for the target are not passed to CUDA compilation by nvcc.

This will most commonly result in errors of the form: someheader.h: No such file or directory.

Solution

This is a well known limitation of the CUDA module of CMake, as documented here. There seems to be no plan currently to support target_include_directories for CUDA compilation.

The only solution is to switch to include_directories to add these directories for all the targets in the CMakeLists.txt file.

Tried with: CMake 2.8.12.2, CUDA 6.5 and Ubuntu 14.04

How to specify architecture to compile CUDA code

To compile CUDA code, you need to indicate what architecture you want to compile for. And this gets quite confusing because there are three options that can be used: -arch, -code and -gencode.

It does not have to be this complicated, but due to historical and practical reasons it just is. The best way to understand how to use these options is to recall the two-level hierarchy of CUDA architecture. First, is the high-level PTX architecture that acts as a virtual machine. Next, is a class of low-level GPU architectures that are designed to work with the features available in a particular PTX architecture.

This is conceptually how a CUDA compiler works too. For the sake of these compiler options, we can break down the compilation phase into two stages. First, the compiler looks at high-level features that are supported by a particular virtual PTX architecture and creates high-level code for that architecture. Next, the compiler translates the PTX code to the SASS code that is optimized for a particular low-level GPU architecture.

-arch

This compiler option is used to tell the compiler what PTX architecture to aim for in the first stage of compilation. PTX architectures are specified in the format of compute_xy, where xy is the version number of a particular architecture.

For example:

-arch=compute_35

This makes the compiler produce PTX code that the CUDA driver will JIT-compile at runtime to SASS code that will work on all sm_35 GPU architectures.

You can also specify a low-level GPU architecture to this option:

-arch=sm_35

This makes the compiler produce SASS code that will work correctly only on sm_35 GPU. The compiler will pick the PTX architecture that is suitable for sm_35 in the first stage, even though you have not indicated that to it.

Note that only one PTX architecture can be specified using -arch option.

-code

This compiler option is used to tell the compiler what SASS architecture to aim for in the second stage of compilation. It will pick the PTX architecture that is suitable for the SASS architecture you specified.

For example:

-code=sm_21

This makes the compiler produce SASS code that will work correctly only on sm_21 GPU.

You can specify many SASS architectures, but they should all belong to the same class of PTX architecture.

For example:

-code=sm_20,sm_21

-arch -code

These two options can be combined to be more specific (and also confusing). Let us see some examples.

-arch=compute_20 -code=sm_20

Produces SASS code for sm_20.

-arch=compute_20 -code=compute_20,sm_20,sm_21

Produces PTX code for compute_20 and SASS code for both sm_20 and sm_21 GPUs.

-gencode

What if you want to produce PTX or SASS code for many PTX architectures? That is what this option is useful for.

-gencode arch=compute_20,code=sm_20

This usage is similar to earlier. But, the real use is to produce for many PTX architectures, like this:

-gencode arch=compute_20,code=compute_20 -gencode arch=compute_35,code=sm_35

Compute_XY

The different compute_xy values that can be specified are:

compute_10
compute_11
compute_12
compute_13
compute_20
compute_30
compute_35

SM_XY

The different sm_xy values that can be specified are:

sm_10
sm_11
sm_12
sm_13
sm_20
sm_21
sm_30
sm_35

Reference: NVCC Manual, CUDA 5.5

nullptr is undefined error with nvcc

Problem

I had a project which had a mix of C++ and CUDA source files. I was using the NVCC compiler to compile both of these types of files. That worked well, until I started using some modern C++ features in the C++ files. To be able to compile them I passed the -std=c++11 flag to the host compiler using the nvcc option -Xcompiler. Then I got compilation errors of this form:

/usr/lib/gcc/x86_64-linux-gnu/4.8/include/stddef.h(432): error: identifier "nullptr" is undefined

This can also occur if you are using NVIDIA Nsight and specifying this option to the host compiler in it.

Solution

NVCC seems to apply the options passed to -Xcompiler to both the C++ (host) and CUDA (device) compiler during the pre-processing stage. The error went away once I specifically used g++ for compiling C++ files and passed the option there. I compiled CUDA files using nvcc, without passing it the option and linked the resulting object files as usual.

Tried with: CUDA 5.5, NVIDIA Nsight 5.5.0, GCC 4.8 and Ubuntu 12.04 LTS

How to specify host compiler for nvcc

Host compiler specified in Nsight
Host compiler specified in Nsight

nvcc is the compiler driver used to compile both .cu and .cpp files. It uses the cl.exe (on Windows) or gcc (on Linux) executable that it can find as the compiler for host code. Sometimes, you may want to specify a different host compiler or a different version of the host compiler to be used to compile the host code.

To do that, specify the full path of the compiler executable to the --compiler-bindir or the -ccbin option of nvcc.

If you are using Nsight, go to Project > Properties > Build > Settings > Tool Settings > NVCC Compiler > Build Stages. Specify the path of the compiler in the field Compiler Path.

Tried with: NSight 5.5 and Ubuntu 12.04