📅 2011-Apr-28 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ cuda, timer ⬩ 📚 Archive
To measure the performance of CUDA code it is better to use events provided by the CUDA Runtime API rather than using CPU-based timers like clock or high-resolution timers.
As a simple illustration, consider that we want to measure the time taken for the execution of a kernel named
fooKernel. First, we create CUDA events to handle the begin and end times of the kernel execution:
cudaEvent_t beginEvent; cudaEvent_t endEvent; cudaEventCreate( &beginEvent );cudaEventCreate( &endEvent );
We then record the begin and end times of the kernel execution:
0 ); cudaEventRecord( beginEvent, fooKernel<<< x, y >>>( z, w );0 );cudaEventRecord( endEvent,
Finally, we wait for the completion of the recording of the end event in the CUDA stream. After that, we compute the time elapsed between the recorded begin and end times to obtain the kernel execution time:
cudaEventSynchronize( endEvent ); float timeValue; cudaEventElapsedTime( &timeValue, beginEvent, endEvent ); "Time: " << timeValue << endl;cout <<
The begin and end events can be reused any number of times in the program. When we do not need them anymore, they should be destroyed:
cudaEventDestroy( beginEvent );cudaEventDestroy( endEvent );
CudaTimer class that implements this usage can be seen here.
Tried with: CUDA 3.2