Stub library warning on libnvidia-ml.so

Problem

I tried to run a program compiled with CUDA 9.0 inside a Docker container and got this error:

WARNING:

You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn't have
to have Display Driver installed).

Solution

Let us first try to understand the error and where it is coming from. The program compiled with CUDA 9.0 has been linked to libnvidia-ml.so. This is the shared library file of the NVIDIA Management Library (NVML). During execution, libnvidia-ml.so is throwing this error. Why?

From the error message, we get an indication that there are two libnvidia-ml.so files. One is a stub that is used during compilation and linking. I guess it just provides the necessary function symbols and signatures. But that library cannot be used to execute the compiled executable. If we do try to execute with that stub shared library file, it will throw this warning.

So, there is a second libnvidia-ml.so, the real shared library file. It turns out that the management library is provided by the NVIDIA display driver. So, every version of display driver will have its own libnvidia-ml.so file. I had NVIDIA display driver 384.66 on my machine and I found libnvidia-ml.so under /usr/lib/nvidia-384. The stub library file allows you to compile on machines where the NVIDIA display driver is not installed. In our case, for some reason, the loader is picking up the stub instead of the real library file during execution.

By using the chrpath tool, described here, I found that the compiled binary did indeed have the stub library directory in its path:/usr/local/cuda/lib64/stubs. That directory did have a libnvidia-ml.so. Using the strings tool on that shared library, confirmed that it was the origin of the above message:

$ strings libnividia-ml.so | grep "You should always run with"

Since the binary has an RPATH, described here, with the stubs path, the stub library was getting picked up with high preference over the actual libnvidia-ml.so, which was present in . The solution I came up with for this problem was to add a command to the docker run invocation to delete the stubs directory:

$ rm -rf  /usr/local/cuda/lib64/stubs

That way, it was still available outside Docker for compilation. It would just appeared deleted inside the Docker container, thus forcing the loader to pick up the real libnvidia-ml.so during execution.

Advertisements

How to change RPATH or RUNPATH of executable

RPATH or RUNPATH is a colon-separated list of directories embedded in an executable. This list of directories play an important role when shared library file locations are determined at the time when the executable is loaded for running. This process is described in this post. Note that RPATH has highest priority in the shared library search, compared to RUNPATH. We can change RPATH or RUNPATH of a binary file by using the chrpath tool.

  • Installing this tool is easy:
$ sudo apt install chrpath
  • To view if the binary has RPATH or RUNPATH and to list its colon-separated list of directories:
$ chrpath ./some_binary
  • To remove RPATH or RUNPATH from the binary:
$ chrpath -d ./some_binary
  • To convert RPATH of a binary to a RUNPATH:
$ chrpath -c ./some_binary

Note that you cannot convert a RUNPATH back to RPATH.

  • To replace RPATH or RUNPATH paths with a different set of paths:
$ chrpath -r /home/joe:/home/foobar/lib64 ./some_binary

Note that the string of the new set of paths should be smaller or equal to the length of what was stored earlier in the binary.

Tried with: chrpath 0.14 and Ubuntu 16.04

How shared library locations are found at runtime

You have successfully compiled an executable that is linked with one or more external shared libraries. You can view the shared libraries that the executable is dependent on by using the ldd tool. When you actually run the executable, the dynamic linker-loader ld-linux looks for each dependent shared library in the following locations, in order:

  • Using RPATH, if it exists, that is hard-coded in the executable. This is a colon-separated list of directories from where the shared libraries were linked into the executable by the linker during the linking stage of compilation. If this exists, you can view it using this command: readelf -d ./your_binary | grep RPATH
  • Using LD_LIBRARY_PATH, if it is set. This is a colon-separated list of directories set as an environment variable by the user.
  • Using RUNPATH, it is exists, that is hard-coded in the executable. This is a colon-separated list of directories, just like RPATH. If this exists, you can view it using this command: readelf -d ./your_binary | grep RUNPATH
  • Checks the /etc/ld.so.cache. This cache is populated by running the ldconfig program. This program is usually run when libraries are installed. You can view the shared libraries in the cache using this command: ldconfig -p
  • Check in /lib
  • Check in /usr/lib

See it in action

You can actually witness the loader searching directories to find the location of each shared library. To see this in action, try this command:

$ LD_DEBUG=libs ldd ./some_executable

In the output of this command, you will see that:

  • Each shared library listed in the executable is picked up in order.
  • For each shared library, the locations listed above (RPATH, LD_LIBRARY_PATH, RUNPATH, cache, lib and user lib) are tried in order.
  • For each directory listed in the above colon-separated list, the shared library filename is appended and tried to see if the file path exists.
  • The first instance where such a file path exists, that is noted as the location of the shared library.

References:

How to set regex breakpoint for shared library in GDB

Problem

Assume you are running a program under GDB and it is linked to shared library files. Not all the shared libraries are loaded at the beginning when you start the program with GDB. They are loaded when needed. So, how to set a regex breakpoint in a source file that belongs to one of the shared library files?

Solution

  • Setting a normal breakpoint will work if the shared library having that file has already been loaded. You can check the currently loaded shared libraries using the command: info shared

  • If you set a breakpoint for a file which belongs to a shared library that is not yet loaded, GDB will warn you that the breakpoint will only be set once the library is loaded. This is kinda okay.

  • However, if you try to set regex breakpoints (rbreak) that will fail silently if the shared library is not yet loaded. So how to know when is the earliest point when you can set such breakpoints?

  • I find it useful to configure GDB to stop whenever a shared library is loaded. This can be done by setting this option: set stop-on-solib-events 1

  • Now GDB stops every time at the point where one or more shared libraries need to be loaded. If I realize that the shared library I am interested in has now loaded, I run the regex breakpoint command at that point to set the breakpoints. Voila!

Tried with: GDB 7.11.1 and Ubuntu 14.04

How to view hierarchy of shared library dependencies using lddtree

Tree of library dependencies shown by lddtree
Tree of library dependencies shown by lddtree

ldd is one of the key tools that should be familiar to all programmers. It shows the list of all shared libraries that an executable binary depends on.

A binary, say ELF file, lists only the immediate shared libraries it depends on in its header. Each of these shared libraries could further depend on other libraries and so on. It would be very useful if you could see the hierarchical tree of these dependencies between the shared libraries. Thankfully, there is a tool that does exactly that named lddtree.

  • It can be installed easily:
$ sudo apt install pax-utils
  • Usage is straightforward:
$ lddtree foo
  • By default, the tool does not show duplicated dependencies. That is, the dependency between a library A and B is shown only once and skipped after that, though it may occur many times. To view all the dependencies, including duplicates:
$ lddtree -a foo

Beware that this can result in a very big tree! This is because dependencies like libc, the ld-linux loader and such will appear for almost all shared libraries.

Note: The Dependency Walker tool does something similar on Windows and can be used as described here.

Tried with: PaX-Utils 0.2.3 and Ubuntu 14.04

Skype error on libGL shared library

Problem

I installed Skype from here using the package for Ubuntu 12.04 Multiarch. When I ran Skype from the Dash, nothing happened. When I ran Skype from the shell, I found that it quit with this error:

$ skype
skype: error while loading shared libraries: libGL.so.1: cannot open shared object file: No such file or directory

Solution

  • Let us check the shared library dependencies of this Skype executable. This actually uses the dynamic linker and loader to compare what is required by the executable with what shared libraries are available in the cache:
$ ldd /usr/bin/skype | grep libGL
libGL.so.1 => not found
libGL.so.1 => not found

So yes, the shared library file is not found.

  • I first checked if the libGL.so.1 was available in the shared library cache:
$ ldconfig -p | grep libGL.so.1
libGL.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1

So, a 64-bit shared library of the required name was present in the cache!

  • I next checked the Skype program itself:
$ file /usr/bin/skype
/usr/bin/skype: ELF 32-bit LSB  shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, stripped

This was a 32-bit executable. Might it require a 32-bit shared library and could this be causing the problem?

  • The 32-bit GL library files and directories updated in cache are controlled by the configuration in /etc/ld.so.conf.d/i386-linux-gnu_GL.conf. This in turn can be easily switched between libraries provided by different providers using update-alternatives:
$ sudo update-alternatives --config i386-linux-gnu_gl_conf

I found that currently the GL library files provided by /usr/lib/nvidia-352/alt_ld.so.conf, that is, by my NVIDIA drivers was being used. I picked the /usr/lib/i386-linux-gnu/mesa/ld.so.conf which is by MESA. This only sets the symbolic link for the /etc/ld.so.conf.d/i386-linux-gnu_GL.conf from the NVIDIA conf file to the MESA conf file.

  • Next we update the cache so that the NVIDIA library files are removed and the MESA library files are symbolically linked to as the default GL library files:
$ sudo ldconfig
  • Let us check what the cache holds now:
$ ldconfig -p | grep libGL.so.1
libGL.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
libGL.so.1 (libc6) => /usr/lib/i386-linux-gnu/mesa/libGL.so.1

We can see that we now have an additional entry, which has no architecture specified, but it is for 32-bit by default.

  • Finally, let us check if the shared dependencies of the Skype exectuable are met now:
$ ldd /usr/bin/skype | grep libGL
libGL.so.1 => /usr/lib/i386-linux-gnu/mesa/libGL.so.1 (0xf16b5000)

Yes, they are! I ran Skype after this and it worked fine! πŸ˜„

Tried with: Skype 4.3 (multiarch) and Ubuntu 14.04

How to fix shared object file error

Problem

One of the most common errors a programmer faces is when an executable is run and it fails to find a required shared library. The error is usually of this form:

hello-world-program: error while loading shared libraries: libFoobar.so.1: cannot open shared object file: No such file or directory

However, you might know that the shared library file libFoobar.so.1 actually exists, say in a directory named /opt/foobar/lib. But, for some reason the ld-linux dynamic loader-linker is not looking in this directory.

Solution

It is important to know which are the locations that the loader searches for a given shared library file. This is described in this post.

After reading the above post, you can see that the error is caused because the shared library file is not found in the locations listed in that post. Of all those locations, the two places which are easy to modify are at the user-level (LD_LIBRARY_PATH) and at the system-level (shared library cache).

Add to LD_LIBRARY_PATH

Set this at shell for temporary use or add to the shell initialization file for permanent effect:

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/foobar/lib

If your shared library is located in the same directory as the executable, then you can add . to LD_LIBRARY_PATH.

Update system cache

  • Open the /etc/ld.so.conf as sudo and add a new line with the library directory. In this case, we add /opt/foobar/lib.

  • Rerun ldconfig to rebuild the cache:

$ sudo ldconfig
  • Check if the shared library cache now includes the shared libraries from the new directory:
$ ldconfig -p

Your program should now execute without any errors πŸ™‚

Tried with: Ubuntu 14.04

How to add library in Eclipse CDT

The source code of a C/C++ project may need to be linked with an external sharedΒ library file. When compiling from the commandline, this is typically linked using the compiler option -l. For example, to link with a library file named libfoobar.so, which is placed in one of the standard library paths, you use the option -lfoobar. As you can see, the lib prefix and the file extension .so do not need to be specified.

Similarly, to add a library file to be linked in Eclipse CDT:

  1. Right-click on the project name in Project Explorer, choose Properties > C/C++ Build > Settings > Tool Settings

  2. Go to Cross G++ Linker > Libraries > Libraries.

  3. Click the + button and add the name of the library file, omit the file extension. For example, to add the library file libfoobar.so, just add foobar. To add multiple library files, add them separately like this.

Tried with: Eclipse 3.7.2, Eclipse CDT 8.0.2 and Ubuntu 12.04 LTS

ELF Library Viewer

Dependency Walker is a popular tool on Windows to view the dependent modules of a EXE or DLL file. ELF Library Viewer is a similar program for Linux. It shows the tree of shared libraries that a program or shared library is dependent on.

The source code of ELF Library Viewer can be downloaded here. It can be compiled easily by typing cmake followed by make.

Tried with: ELF Library Viewer 0.9 and Ubuntu 12.04 LTS

How to view modules loaded by a Python program

The Python program has a useful parameter: -v When Python is launched with this parameter to run a Python program, it prints out all the Python modules and dynamic libraries that are loaded during the executing. This can be useful to see exactly which Python files and shared library files are loaded and in which order. This parameter also prints out the modules as they are cleaned up at the end of program execution.

$ python -v foo.py

Tried with: Python 2.7.3 and Ubuntu 12.04 LTS