Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

How to do error checking in CUDA

📅 2011-Mar-02 ⬩ ✍️ Ashwin Nanjappa ⬩ 📚 Archive

Error checks in CUDA code can help catch CUDA errors at their source. There are 2 sources of errors in CUDA source code:

  1. Errors from CUDA API calls. For example, a call to cudaMalloc() might fail.

  2. Errors from CUDA kernel calls. For example, there might be invalid memory access inside a kernel.

All CUDA API calls return a cudaError value, so these calls are easy to check:

if ( cudaSuccess != cudaMalloc( &fooPtr, fooSize ) )
    printf( "Error!\n" );

CUDA kernel invocations do not return any value. Error from a CUDA kernel call can be checked after its execution by calling cudaGetLastError():

fooKernel<<< x, y >>>(); // Kernel call
if ( cudaSuccess != cudaGetLastError() )
    printf( "Error!\n" );

These two types of checks can be elegantly wrapped up in two simple error-checking functions like this:

// Define this to turn on error checking
#define CUDA_ERROR_CHECK

#define CudaSafeCall( err ) __cudaSafeCall( err, __FILE__, __LINE__ )
#define CudaCheckError()    __cudaCheckError( __FILE__, __LINE__ )

inline void __cudaSafeCall( cudaError err, const char *file, const int line )
{
#ifdef CUDA_ERROR_CHECK
    if ( cudaSuccess != err )
    {
        fprintf( stderr, "cudaSafeCall() failed at %s:%i : %s\n",
                 file, line, cudaGetErrorString( err ) );
        exit( -1 );
    }
#endif

    return;
}

inline void __cudaCheckError( const char *file, const int line )
{
#ifdef CUDA_ERROR_CHECK
    cudaError err = cudaGetLastError();
    if ( cudaSuccess != err )
    {
        fprintf( stderr, "cudaCheckError() failed at %s:%i : %s\n",
                 file, line, cudaGetErrorString( err ) );
        exit( -1 );
    }

    // More careful checking. However, this will affect performance.
    // Comment away if needed.
    err = cudaDeviceSynchronize();
    if( cudaSuccess != err )
    {
        fprintf( stderr, "cudaCheckError() with sync failed at %s:%i : %s\n",
                 file, line, cudaGetErrorString( err ) );
        exit( -1 );
    }
#endif

    return;
}

Using these error checking functions is easy:

CudaSafeCall( cudaMalloc( &fooPtr, fooSize ) );

fooKernel<<< x, y >>>(); // Kernel call
CudaCheckError();

These functions are actually derived from similar functions which used to be available in the cutil.h in old CUDA SDKs.

Notice that the calls are inline functions, so absolutely no code is produced when CUDA_CHECK_ERROR is not defined. These utility functions can prove their worth to catch errors as close as possible to the error source only if they are used everywhere. So, use them to wrap all CUDA API calls and after all your kernel calls 😊

Tried with: CUDA 5.5