If you work with CUDA programs, you will use the Visual Profiler regularly. Another tool that can be useful is the commandline profiler, named nvprof. This does not have as many features of the Visual Profiler, but is very easy and quick to use.
Pass the name of your program to nvprof:
$ nvprof ./my-cuda-program
nvprof runs the program and gives a summary of results that is similar to the default output in Visual Profiler. It shows the kernels sorted in decreasing order of total execution times. The columns show the percentage of execution time, the actual time, the number of calls, the average-min-max of a single call for every kernel.
Tried with: NVProf 5.5, CUDA 5.5 and Ubuntu 12.04