nvprof in nvidia-docker permissions warning

Problem

I was running a CUDA application inside a nvidia-docker container. Wanting to profile it, I ran the application with nvprof and got a permissions warning and no profile information was generated:

==616== Warning: The user does not have permission to profile on the target device. See the following link for instructions to enable permissions and get more information: https://developer.nvidia.com/NVSOLN1000
==616== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

For another application, the error looked like this:

==643== NVPROF is profiling process 643, command: foobar
==643== Warning: The user does not have permission to profile on the target device. See the following link for instructions to enable permissions and get more information: https://developer.nvidia.com/NVSOLN1000
==643== Profiling application: foobar
==643== Profiling result:                                                                                                     
No kernels were profiled.                                                         
No API activities were profiled.

Solution

The warning message has a link, but perusing that documentation is not relevant to this docker problem. Solution turned out to be that I needed to add the --privileged option to my nvidia-docker command invocation.

Advertisements

NVProf

Output of nvprof
Output of nvprof

If you work with CUDA programs, you will use the Visual Profiler regularly. Another tool that can be useful is the commandline profiler, named nvprof. This does not have as many features of the Visual Profiler, but is very easy and quick to use.

Pass the name of your program to nvprof:

$ nvprof ./my-cuda-program

nvprof runs the program and gives a summary of results that is similar to the default output in Visual Profiler. It shows the kernels sorted in decreasing order of total execution times. The columns show the percentage of execution time, the actual time, the number of calls, the average-min-max of a single call for every kernel.

Tried with: NVProf 5.5, CUDA 5.5 and Ubuntu 12.04