The Compute Visual Profiler 4.0.17 that ships with CUDA 4.0 has a new Analysis feature. This is enabled on any session and is always active when using the Summary Table. I found this analysis to be too slow for my kernels since it hogs the CPU, making it difficult to interact with the profiler. And there seems to be no way to turn off this Analysis feature in this version of the profiler! 😐
Use the Compute Visual Profiler 3.2.0 from CUDA 3.2. I found that is works fine on CUDA executables compiled with CUDA 4.0. Copy over the
computeprof directory found in
%CUDA_PATH% from your older CUDA installation or from another computer. Use the
computeprof.exe found in its
bin directory. No additional DLLs are required.
Tried with: Compute Visual Profiler 4.0.17 and CUDA 4.0