Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

CUDA: Structures as Kernel Parameters

📅 2011-Mar-04 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ cuda, structures ⬩ 📚 Archive

A common problem for beginners in CUDA programming is that the number of parameters passed to a kernel explodes very quickly. For example, a kernel that takes 2 arrays as input and writes the result to another array:

// Kernel
__global__ void fooKernel
(
const int* inArr0,
const int* inArr1,
const int* outArr,
int inArr0Len,
int inArr1Len,
int outArrLen
)
{ /* ... */ }

// Kernel call
fooKernel<<< x, y >>>( inArr0, inArr1, outArr, inArr0Len, inArr1Len, outArrLen );

To handle this situation gracefully, related parameters can be grouped together into a structure.

Typically, there are 2 kinds of parameters passed to the kernel:

  1. Data passed by value, like integers or floats.

  2. Pointers to device memory.

Most of the time, the information in the above parameters are related. For example, an array in device memory and its length:

const int arrLen = 256;
int* dArray      = NULL;
cudaMalloc( &dArray, arrLen * sizeof( *dArray ) );

// 2 related parameters to kernel
fooKernel<<< x, y >>>( dArray, arrLen );

cudaFree( dArray );

In such cases, related parameters can be elegantly combined into a struct:

struct FooInfo
{
    int* arr;
    int  len;
};

Pointers to device memory inside the structure still need to be allocated and freed individually. Yet, using structures reduces the number of kernel parameters and eases the work of the programmer:

FooInfo fooInfo;
fooInfo.len = 256;
cudaMalloc( &( fooInfo.arr ), fooInfo.len * sizeof( *dArray ) );

// 2 related parameters to kernel
fooKernel<<< x, y >>>( fooInfo );

cudaFree( fooInfo.arr );

Tried with: CUDA 3.2