Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

How to pass Thrust device vector to CUDA kernel

📅 2011-Apr-09 ⬩ ✍️ Ashwin Nanjappa ⬩ 📚 Archive

Thrust makes it convenient to handle data with its device_vector. But, things get messy when the device_vector needs to be passed to your own kernel. Thrust data types are not understood by a CUDA kernel and need to be converted back to its underlying pointer. This is done using thrust::raw_pointer_cast:

#include <thrust/device_vector.h>

thrust::device_vector<int> iVec;

int* iArray = thrust::raw_pointer_cast(&iVec[0]);

fooKernel<<<x, y>>>(iArray);

Inside a kernel, I typically need not only a pointer to the array, but also the length of the array. Thus, I find it is useful to convert device_vector to a structure that holds both a pointer to the array and its length. (Very much like a vector itself.) Creating such a structure and its conversion function is easy thanks to templates:

// Template structure to pass to kernel
template <typename T>
struct KernelArray
{
    T*  _array;
    int _size;
};

// Function to convert device_vector to structure
template <typename T>
KernelArray<T> convertToKernel(thrust::device_vector<T>& dVec)
{
    KernelArray<T> kArray;
    kArray._array = thrust::raw_pointer_cast(&dVec[0]);
    kArray._size  = (int) dVec.size();

    return kArray;
}

Passing device_vector to kernels and accessing its array inside the kernel is easy thanks to this infrastructure:

thrust::device_vector<int> iVec;

fooKernel<<<x, y>>>(convertToKernel(iVec)); // Explicit conversion from iVec to KernelArray<int>

__global__ fooKernel(KernelArray<int> inArray)
{
    for (int i = 0; i < inArray._size; ++i)
        something = inArray._array[i];
    // ...
    return;
}

You can take it a notch higher and make the conversion from device_vector to KernelArray to be implicit. This can be done by adding a constructor to KernelArray that takes one input parameter of type device_vector. (See Stephen's comment below the post.)

With such a constructor, you can now pass a device_vector seamlessly to the kernel:

thrust::device_vector<int> iVec;

fooKernel<<<x, y>>>(iVec); // Implicit conversion from iVec to KernelArray<int>

__global__ fooKernel(KernelArray<int> inArray)
{
    // ...
}

Tried with: Thrust 1.3 and CUDA 3.2