How to pass Thrust device vector to CUDA kernel

Thrust makes it convenient to handle data with its device_vector. But, things get messy when the device_vector needs to be passed to your own kernel. Thrust data types are not understood by a CUDA kernel and need to be converted back to its underlying pointer. This is done using thrust::raw_pointer_cast:

#include <thrust/device_vector.h>

thrust::device_vector<int> iVec;

int* iArray = thrust::raw_pointer_cast(&iVec[0]);

fooKernel<<<x, y>>>(iArray);

Inside a kernel, I typically need not only a pointer to the array, but also the length of the array. Thus, I find it is useful to convert device_vector to a structure that holds both a pointer to the array and its length. (Very much like a vector itself.) Creating such a structure and its conversion function is easy thanks to templates:

// Template structure to pass to kernel
template <typename T>
struct KernelArray
{
    T*  _array;
    int _size;
};

// Function to convert device_vector to structure
template <typename T>
KernelArray<T> convertToKernel(thrust::device_vector<T>& dVec)
{
    KernelArray<T> kArray;
    kArray._array = thrust::raw_pointer_cast(&dVec[0]);
    kArray._size  = (int) dVec.size();

    return kArray;
}

Passing device_vector to kernels and accessing its array inside the kernel is easy thanks to this infrastructure:

thrust::device_vector<int> iVec;

fooKernel<<<x, y>>>(convertToKernel(iVec)); // Explicit conversion from iVec to KernelArray<int>

__global__ fooKernel(KernelArray<int> inArray)
{
    for (int i = 0; i < inArray._size; ++i)
        something = inArray._array[i];
    // ...
    return;
}

You can take it a notch higher and make the conversion from device_vector to KernelArray to be implicit. This can be done by adding a constructor to KernelArray that takes one input parameter of type device_vector. (See Stephen’s comment below the post.)

With such a constructor, you can now pass a device_vector seamlessly to the kernel:

thrust::device_vector<int> iVec;

fooKernel<<<x, y>>>(iVec); // Implicit conversion from iVec to KernelArray<int>

__global__ fooKernel(KernelArray<int> inArray)
{
    // ...
}

Tried with: Thrust 1.3 and CUDA 3.2

Advertisements

19 thoughts on “How to pass Thrust device vector to CUDA kernel”

  1. Hi, i used this tutorial to read a device_vector from kernel function, but i had this error:

    error: no suitable user-defined conversion from “thrust::device_vector<float, thrust::device_malloc_allocator>” to “KernelArray” exists

    Could you tell me the cause of the error? Thanks

    Like

      1. thanks for your reply, i understand my error. I found two errors:

        i changed

        tArray._array = thrust::raw_pointer_cast( &dVec[0] );
        tArray._size = ( int ) dVec.size();

        to

        kArray._array = thrust::raw_pointer_cast( &dVec[0] );
        kArray._size = ( int ) dVec.size();

        In this way the code compiles. I tryed to sum all values in the vector without success, this is my code:

        __global__ void sumOfVector( KernelArray inArray , float *ds_GPU){

        float sum = 0.0;

        for ( int i = 0; i < inArray._size; i++ ){

        sum = sum + inArray._array[ i ];
        }

        *ds_GPU = sum;

        }

        Sorry for my requests, but you'd be a great help

        Like

  2. is there any way i can add into device_vector in device function without manually calling resize.

    __global__ void kernel(thrust::pair* p)
    {
    for(int i=10;i<20;i++)
    {
    thrust::pair pr(i,i*i);
    p[i].first = pr.first;
    p[i].second = pr.second;
    }
    }

    int main()
    {
    cudaPrintfInit();

    thrust::device_vector< thrust::pair > *d_vec = new thrust::device_vector< thrust::pair >[10];

    for(int i=0;i<10;i++)
    {
    thrust::pair p(i,i*i);
    d_vec[0].push_back(p);
    }

    d_vec[0].resize(20); //It does not automatically resize.

    thrust::pair* p = thrust::raw_pointer_cast(&d_vec[0][0]);
    kernel<<>>(p);
    thrust::device_vector<thrust::pair > s_vec = d_vec[0];

    for(int i=0;i<d_vec[0].size();i++)
    {
    thrust::pair p = s_vec[i];
    cout<<p.second<<endl;
    }
    cudaPrintfDisplay(stdout, true);
    cudaPrintfEnd();
    }

    Like

      1. i want to insert into device_vector inside the kernel. Can you please tell me how can do that when I have array of device_vector of type thrust::pair.

        Like

        1. Gaurav: If your vector already has space allocated before the kernel call to hold the new insertions, then it is just a matter of writing to the correct vector index from inside the kernel. But, if you want to increase the size of the vector from inside the kernel, that is not possible.

          Like

            1. GM: You cannot increase the size of a vector in the kernel. This expansion cannot be done in-place because there might be other data lying next to the vector in memory space. If you want to find space somewhere else in memory and copy this vector over there, that is too expensive and uncoordinated to be done in each thread of the kernel.

              Like

  3. Hi Ashwin,
    This post helps a lot in understanding how to pass device_vector to user kernel. But what is baffling me at the moment is this –
    When I write to a particular location in the vector using the arithmetic inside a loop where is the loop iterator, I can read back from it correctly only until the next write. What I’m thinking is the next write at corrupts the previous write. This lead me to extend your Template to include (int) iVec.begin(). But it seems I cannot pass that in the same way as passing iVec.size(). The error it gives is –

    error: no suitable conversion function from “thrust::detail::normal_iterator<thrust::device_ptr>” to “double” exists
    detected during instantiation of “KernelArray convertToKernel(thrust::device_vector<T, thrust::device_malloc_allocator> &) [with T=double]”

    Please help.

    Like

    1. Deba: If I’m not wrong, begin() and end() are just pointers to the beginning and some offset location in the memory allocated for the vector. So, you cannot assign it to a double, but you should be able to assign it to a double pointer.

      Like

  4. awesome. years later and this is still an excellent solution, doing exactly what I need it to do!!!! THANK YOU ASHWIN 🙂

    if anybody was having trouble with the implicit conversion, you just need to do

    // Template structure to pass to kernel
    template < typename T >
    struct KernelArray
    {
    T* _array;
    int _size;

    // constructor allows for implicit conversion
    KernelArray(thrust::device_vector<T>& dVec) {
        _array = thrust::raw_pointer_cast( &dVec[0] );
        _size  = ( int ) dVec.size();
    }
    

    };

    @Ashwin, in your original convertToKernel, you had a small typo

    tArray._array = thrust::raw_pointer_cast( &dVec[0] );
    tArray._size = ( int ) dVec.size();

    those need to be kArray xD

    Thanks again! This was well done, and I probably never would have thought of that!

    Like

      1. No problem! Almost 6 months later I found myself in a similar situation. I knew I had done something like this before, but couldn’t remember when / where / how. Trolled the internet for a bit, and ended up back here finding the exact same solution being the most ideal 😀

        Thanks again!

        Like

  5. Hey! This is a very useful code, Thanks.
    One question though. how would you manage a 2d Thrust vector with this scenario??

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s