NVCC argument redefinition error

Problem

I was trying to build a CUDA project that worked on one computer on a second computer. Though this project built without problems on the first computer, nvcc gave an argument redefinition error on the second computer:

nvcc fatal   : redefinition of argument 'std'

Solution

At first, I thought that the compilation error was arising from the source code. Actually, the error is being reported about a compilation argument to nvcc. It is saying that a std argument has been provided more than once.

This project was using CMake, so the actual compilation commands invoked by make are hidden. I used the VERBOSE trick, described here, to view the actual commands issued to the C++ and CUDA compilers. Not surprisingly, I found that the -std=c++11 argument was being passed twice. But why?

I checked the CMakeLists.txt file and found that indeed the C++11 argument was being passed to the compiler twice by setting it in both CMAKE_CXX_FLAGS and CUDA_NVCC_FLAGS. How and why was this working fine on the first computer? Turns out that the older CMake used on the first computer would not pass the C++ flag to NVCC, so it had to be specifically redefined for CUDA compiler flags.

The second computer was using a newer version of Ubuntu, with newer version of CMake. This was intelligent enough to pass a C++ flag to the CUDA NVCC compiler too. But since the NVCC flag of the same name also existed, it was causing the argument redefinition error. Removing the flag from the CUDA flags made the problem go away.

Advertisements

urlopen got multiple values error

Problem

I tried to run a Python script given by a friend. It ended with this error:

  File "/usr/lib/python2.7/dist-packages/urllib3/request.py", line 79, in request
    **urlopen_kw)
  File "/usr/lib/python2.7/dist-packages/urllib3/request.py", line 142, in request_encode_body
    **urlopen_kw)
TypeError: urlopen() got multiple values for keyword argument 'body'

Solution

The script used a Python package, which in turn used urllib3. This strange error has nothing to do with my code, but with the urllib3 package. The urllib3 package installed by Ubuntu was pretty old: 1.7.1. Updating to a more recent version of the package will fix the error. Either upgrade using sudo pip install urllib3 for system-wide update or update inside your virtualenv.

add-apt-repository command not found

Problem

I was trying to add an PPA repository and got this error on a server:

$ sudo add-apt-repository ppa:blah/foobar
sudo: add-apt-repository: command not found

Solution

The add-apt-repository comes from the software-properties-common package. Installing it solved the problem:

$ sudo apt install software-properties-common

Tried with: Ubuntu 16.04

NVIDIA Docker no such file error

Problem

NVIDIA Docker makes it easy to use Docker containers across machines with differing NVIDIA graphics drivers. After installing it, I ran a sample NVIDIA Docker command and got this error:

$ nvidia-docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: Post http://%2Frun%2Fdocker%2Fplugins%2Fnvidia-docker.sock/VolumeDriver.Mount: dial unix /run/docker/plugins/nvidia-docker.sock: connect: no such file or direct

Investigating the log files showed this:

$ cat /tmp/nvidia-docker.log 
nvidia-docker-plugin | 2017/10/11 10:10:07 Loading NVIDIA unified memory
nvidia-docker-plugin | 2017/10/11 10:10:07 Loading NVIDIA management library
nvidia-docker-plugin | 2017/10/11 10:10:07 Discovering GPU devices
nvidia-docker-plugin | 2017/10/11 10:10:13 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2017/10/11 10:10:13 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2017/10/11 10:10:13 Serving remote API at localhost:3476
nvidia-docker-plugin | 2017/10/11 10:10:13 Error: listen tcp 127.0.0.1:3476: bind: address already in use

That 3476 port turned out to be owned by no process. So what’s the problem?

Solution

I gave up and restarted Docker and everything worked fine after that (haha!):

$ sudo service docker restart

Tried with: NVIDIA Docker 1.x and Docker 1.11

Docker no pull access error

Problem

A Docker pull command to Docker hub that worked fine for all users failed only for one user. When she tried to pull, it threw up this error:

repository some_org/some_image not found: does not exist or no pull access
The push refers to a repository [docker.io/some_org/some_image]

Docker hub authentication and other details seemed to be in order.

Solution

Turns out this user had run some docker commands before adding the Docker hub authentication. Those early commands had created a ~/.docker directory. Details from this were conflicting with Docker hub authentication that was introduced later. Removing this directory fixed the issue.

Tried with: Docker 1.11 and Ubuntu 14.04

Undefined reference to Boost copy_file

Problem

C++ code that uses Boost Filesystem was compiling fine with Boost 1.49. However, when I switched to Boost 1.55, the same code gave this error:

foobar.cpp.o: In function `do_something()':
foobar.cpp:(.text+0xb78): undefined reference to `boost::filesystem::detail::copy_file(boost::filesystem::path const&, boost::filesystem::path const&, boost::filesystem::copy_option, boost::system::error_code*)'

This was surprising since the declaration of the copy_file method was present in filesystem.hpp and libboost_filesystem.so was linked during compilation.

Solution

Turns out this is a known bug with Boost Filesystem as described here. Apparently, this happens only if the C++ code is compiled with the -std=c++11 option. I was indeed using that option.

The current fix for this bug is to temporarily disable the C++ scoped enums when the Boost Filesystem header file is included, like this:

#define BOOST_NO_CXX11_SCOPED_ENUMS
#include <boost/filesystem.hpp>
#undef BOOST_NO_CXX11_SCOPED_ENUMS

Tried with: Boost 1.55, GCC 4.8.4 and Ubuntu 14.04

NFS error access denied by server while mounting

Problem

I wanted to share a directory that was on my local computer to a remote computer named medusa using NFS. I did the setup of the NFS server on my local computer and NFS client on the remote computer as described here. However, when I mounted the NFS share on the remote computer, I got this error:

$ sudo mount /mnt/my_nfs_share
mount.nfs: access denied by server while mounting my-local-computer:/path/i/shared

Solution

This turned out to be tricky one! The error message is quite a bit misleading because it turned out that the problem had nothing to do with access. Turns out this remote computer medusa had two Ethernet cards and thus two IP addresses. One of the IP addresses was set to the hostname medusa, but the NFS client was using the second IP address.

The solution was to add the second IP address 192.168.0.100 as an entry in the /etc/exports file:

/path/i/shared medusa(rw,sync,no_subtree_check)
/path/i/shared 192.168.0.100(rw,sync,no_subtree_check)

The rest of the steps I repeated as described in this post. Everything worked fine and I was able to mount the NFS share on the remote computer.

Reference: https://unix.stackexchange.com/questions/106122/

Tried with: Ubuntu 14.04

dlopen: cannot load any more object with static TLS

Problem

I had a Python script that used Caffe2. It worked fine on one computer. On another computer with same setup, it would fail at the import caffe2.python line with this error:

WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: dlopen: cannot load any more object with static TLS
CRITICAL:root:Cannot load caffe2.python. Error: dlopen: cannot load any more object with static TLS

As I mentioned above, the GPU support warning is a red herring cause this Caffe2 Python was built with GPU support. The real error is the dlopen.

Solution

The only solution from Googling that gave a clue was this. As suggested there, I placed the import caffe2.python line at the top above all other imports. The error disappeared.

Tried with: Ubuntu 14.04

Could not load OGRE dynamic library

Problem

I had an binary compiled on Ubuntu 14.04. I tried to run it on Ubuntu 16.04 and got this error:

OGRE EXCEPTION(7:InternalErrorException): Could not load dynamic library /usr/lib/x86_64-linux-gnu/OGRE-1.8.0/RenderSystem_GL

Solution

From the error, I could see that it was looking for an OGRE library file. I installed the OGRE library available on Ubuntu 16.04:

$ sudo apt install libogre-1.9-dev

The error still persisted, because Ubuntu 16.04 only has OGRE 1.9, while this binary was looking for OGRE 1.8 library files.

I tried to create a symbolic link of an OGRE 1.8 directory to the existing OGRE 1.9 directory:

$ cd /usr/lib/x86_64-linux-gnu/
$ ln -s OGRE-1.9.0 OGRE-1.8.0

This worked! The executable ran without problems. This saved me from having to build OGRE 1.8 from source on this computer.