Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

CUDPP: Compacting a Structure Array

📅 2011-Mar-09 ⬩ ✍️ Ashwin Nanjappa ⬩ 📚 Archive

Problem

The cudppCompact() function from the CUDPP library can be used to compact device arrays of simple types like integer or float. However, this function cannot be used directly to compact a device array of structures.

Solution

One solution for this problem is to create an index array, compact the index array and use the compacted index array to manually compact the device array of structures.

Here is a simple example for illustration. Assume we are dealing with data of type Foo:

struct Foo
{
    // Composed of something
};

// Arrays are assumed to be allocated and filled
Foo* devFooArray;             // Device array with content to be compacted
int Num;                      // Size of array
unsigned int* devValidArrary; // Device array with 1/0 flags already set

Create an integer array on the device and set each element of index array using a simple kernel:

__global__ void setIndexKernel( int* indexArr, int num )
{
    const int threadNum         = gridDim.x * blockDim.x;
    const int curThreadIndex    = ( blockIdx.x * blockDim.x ) + threadIdx.x;

    for ( int index = curThreadIndex; index < num; index += threadNum )
        indexArr[ index ] = index;

    return;
}

int* devIndexArray; // Device index array
setIndexKernel<<< BlocksPerGrid, ThreadsPerBlock >>>( devIndexArray, Num );

Now, use this array as input to cudppCompact() and obtain a compacted index array from it:

cudppCompact( compactPlan, devCompactIndexArray, devCompactNum, devIndexArr, devValidArray, Num );

Allocate for a compacted size Foo-type array and compact the elements using another simple kernel:

__global__ void compactKernel
(
const Foo*  srcFooArray,
Foo*        destFooArray,
const int*  compactIndexArray,
int         compactNum
)
{
    const int threadNum         = gridDim.x * blockDim.x;
    const int curThreadIndex    = ( blockIdx.x * blockDim.x ) + threadIdx.x;

    for ( int index = curThreadIndex; index < compactNum; index += threadNum )
    {
        const int fromIndex   = compactIndexArray[ index ];
        destFooArray[ index ] = srcFooArray[ fromIndex ];
    }

    return;
}

Foo* devCompactFooArray; // Allocated to size devCompactNum
compactKernel<<< BlocksPerGrid, ThreadsPerBlock >>>( devFooArray, devCompactFooArray, devCompactIndexArray, devCompactNum );

Tried with: CUDPP 1.1.1 and CUDA 3.2