📅 2011-Mar-09 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ cuda, cudpp ⬩ 📚 Archive
Problem
The cudppCompact()
function from the CUDPP library can be used to compact device arrays of simple types like integer or float. However, this function cannot be used directly to compact a device array of structures.
Solution
One solution for this problem is to create an index array, compact the index array and use the compacted index array to manually compact the device array of structures.
Here is a simple example for illustration. Assume we are dealing with data of type Foo
:
struct Foo
{// Composed of something
};
// Arrays are assumed to be allocated and filled
// Device array with content to be compacted
Foo* devFooArray; int Num; // Size of array
unsigned int* devValidArrary; // Device array with 1/0 flags already set
Create an integer array on the device and set each element of index array using a simple kernel:
void setIndexKernel( int* indexArr, int num )
__global__
{const int threadNum = gridDim.x * blockDim.x;
const int curThreadIndex = ( blockIdx.x * blockDim.x ) + threadIdx.x;
for ( int index = curThreadIndex; index < num; index += threadNum )
indexArr[ index ] = index;
return;
}
int* devIndexArray; // Device index array
setIndexKernel<<< BlocksPerGrid, ThreadsPerBlock >>>( devIndexArray, Num );
Now, use this array as input to cudppCompact()
and obtain a compacted index array from it:
cudppCompact( compactPlan, devCompactIndexArray, devCompactNum, devIndexArr, devValidArray, Num );
Allocate for a compacted size Foo-type array and compact the elements using another simple kernel:
void compactKernel
__global__
(const Foo* srcFooArray,
Foo* destFooArray,const int* compactIndexArray,
int compactNum
)
{const int threadNum = gridDim.x * blockDim.x;
const int curThreadIndex = ( blockIdx.x * blockDim.x ) + threadIdx.x;
for ( int index = curThreadIndex; index < compactNum; index += threadNum )
{const int fromIndex = compactIndexArray[ index ];
destFooArray[ index ] = srcFooArray[ fromIndex ];
}
return;
}
// Allocated to size devCompactNum
Foo* devCompactFooArray; compactKernel<<< BlocksPerGrid, ThreadsPerBlock >>>( devFooArray, devCompactFooArray, devCompactIndexArray, devCompactNum );
Tried with: CUDPP 1.1.1 and CUDA 3.2