nvprof in nvidia-docker permissions warning

Problem

I was running a CUDA application inside a nvidia-docker container. Wanting to profile it, I ran the application with nvprof and got a permissions warning and no profile information was generated:

==616== Warning: The user does not have permission to profile on the target device. See the following link for instructions to enable permissions and get more information: https://developer.nvidia.com/NVSOLN1000
==616== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

For another application, the error looked like this:

==643== NVPROF is profiling process 643, command: foobar
==643== Warning: The user does not have permission to profile on the target device. See the following link for instructions to enable permissions and get more information: https://developer.nvidia.com/NVSOLN1000
==643== Profiling application: foobar
==643== Profiling result:                                                                                                     
No kernels were profiled.                                                         
No API activities were profiled.

Solution

The warning message has a link, but perusing that documentation is not relevant to this docker problem. Solution turned out to be that I needed to add the --privileged option to my nvidia-docker command invocation.

Advertisements

NVIDIA Docker no such file error

Problem

NVIDIA Docker makes it easy to use Docker containers across machines with differing NVIDIA graphics drivers. After installing it, I ran a sample NVIDIA Docker command and got this error:

$ nvidia-docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: Post http://%2Frun%2Fdocker%2Fplugins%2Fnvidia-docker.sock/VolumeDriver.Mount: dial unix /run/docker/plugins/nvidia-docker.sock: connect: no such file or direct

Investigating the log files showed this:

$ cat /tmp/nvidia-docker.log 
nvidia-docker-plugin | 2017/10/11 10:10:07 Loading NVIDIA unified memory
nvidia-docker-plugin | 2017/10/11 10:10:07 Loading NVIDIA management library
nvidia-docker-plugin | 2017/10/11 10:10:07 Discovering GPU devices
nvidia-docker-plugin | 2017/10/11 10:10:13 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2017/10/11 10:10:13 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2017/10/11 10:10:13 Serving remote API at localhost:3476
nvidia-docker-plugin | 2017/10/11 10:10:13 Error: listen tcp 127.0.0.1:3476: bind: address already in use

That 3476 port turned out to be owned by no process. So what’s the problem?

Solution

I gave up and restarted Docker and everything worked fine after that (haha!):

$ sudo service docker restart

Tried with: NVIDIA Docker 1.x and Docker 1.11