xdiskusage

The du command is a useful tool to check the size of directories and files on a filesystem or inside a directory. However, its output is not the most intuitive, especially if you want to probe further into child directories trying to find what is occupying space. There are ncurses console tools like ncdu and GUI tools like Baobab that help with this. Another good alternative I have discovered is xdiskusage.

xdiskusage is a GUI application that parses the output of du to show the disk space occupied by directories and files. Written in the FLTK toolkit, it is super-light and is possibly the lightest and fastest disk usage tool I have used. That makes it great for using over SSH sessions too. It uses the treemap visualization, so it is much easier to investigate disk usage than looking at du output.

  • Installing it is easy:
$ sudo apt install xdiskusage
  • Invoke without any arguments if you want to be presented with a list of all the filesystems on your computer. Double-click on any to view treemap visualization of its contents.

  • To look under a particular directory, pass it as the first argument to the program.

  • In the treemap visualization, you can keep probing further by double-clicking on any rectangle.

  • Right-click in the GUI to get menu of all possible actions using mouse. You can also see the keyboard bindings available for browsing using keyboard.

Tried with: xdiskusage 1.48 and Ubuntu 16.04

Advertisements

Stub library warning on libnvidia-ml.so

Problem

I tried to run a program compiled with CUDA 9.0 inside a Docker container and got this error:

WARNING:

You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn't have
to have Display Driver installed).

Solution

Let us first try to understand the error and where it is coming from. The program compiled with CUDA 9.0 has been linked to libnvidia-ml.so. This is the shared library file of the NVIDIA Management Library (NVML). During execution, libnvidia-ml.so is throwing this error. Why?

From the error message, we get an indication that there are two libnvidia-ml.so files. One is a stub that is used during compilation and linking. I guess it just provides the necessary function symbols and signatures. But that library cannot be used to execute the compiled executable. If we do try to execute with that stub shared library file, it will throw this warning.

So, there is a second libnvidia-ml.so, the real shared library file. It turns out that the management library is provided by the NVIDIA display driver. So, every version of display driver will have its own libnvidia-ml.so file. I had NVIDIA display driver 384.66 on my machine and I found libnvidia-ml.so under /usr/lib/nvidia-384. The stub library file allows you to compile on machines where the NVIDIA display driver is not installed. In our case, for some reason, the loader is picking up the stub instead of the real library file during execution.

By using the chrpath tool, described here, I found that the compiled binary did indeed have the stub library directory in its path:/usr/local/cuda/lib64/stubs. That directory did have a libnvidia-ml.so. Using the strings tool on that shared library, confirmed that it was the origin of the above message:

$ strings libnividia-ml.so | grep "You should always run with"

Since the binary has an RPATH, described here, with the stubs path, the stub library was getting picked up with high preference over the actual libnvidia-ml.so, which was present in . The solution I came up with for this problem was to add a command to the docker run invocation to delete the stubs directory:

$ rm -rf  /usr/local/cuda/lib64/stubs

That way, it was still available outside Docker for compilation. It would just appeared deleted inside the Docker container, thus forcing the loader to pick up the real libnvidia-ml.so during execution.

How to find the init system used by Linux

init is the first process that is started by an Unix operating system. It has PID 1 and handles creation of all other processes and daemon processes required by the OS. It acts as the ancestor of all processes. There are many init systems that have been used in Linux distributions over the years. Some of the popular init systems are the classic System V init, upstart (Ubuntu used to use this) and systemd (popular right now for all Linux distros).

So, you are sitting at a Linux computer, do you know which init system it is using? There is no straightforward method to find this out.

Here are a few methods, one of which should work:

  • Check the init version, it should report the init system name too:
$ init --version
init (upstart 1.12.1)
Copyright (C) 2006-2014 Canonical Ltd., 2011 Scott James Remnant

This was on a Ubuntu 14.04 system. You can see that it uses upstart.

  • Sometimes the /sbin/init is a symbolic link to the actual init:
$ stat /sbin/init
  File: '/sbin/init' -> '/lib/systemd/systemd'

This was on a Ubuntu 16.04 system. You can see that it uses systemd.

How tmp directory is cleaned

The /tmp directory in Linux is where temporary files and directories are created by applications. If you did not do anything special during installation, it will be a directory under the root directory partition and hence has the same filesystem as that of the root filesystem. If you specifically created a partition for /tmp, then it might be of the tmpfs filesystem type.

If the root partition is running out of space, you might want to check how big tmp directory is:

$ cd /tmp
$ du --summarize .

On my Ubuntu, I find that tmp directory is cleaned up only at the time of system startup or reboot. The init scripts in /etc/rcS.d check the TMPTIME value set in /etc/default/rcS. This value indicates which files and directories in the tmp directory are up for deletion.

You can change this TMPTIME value to your preference:

  • If TMPTIME=0, then everything inside the tmp directory is deleted at reboot.
  • If TMPTIME=1, then only files and directories older than a day are deleted and so on.
  • If TMPTIME=-1, then nothing is deleted.

You will notice that the tmp directory is cleaned up only when you reboot the system. If you want a periodic cleanup of the tmp directory, then use the tmpreaper tool as described here.

Tried with: Ubuntu 14.04

How to install and use tmpreaper

The files and directories in the tmp directory are cleaned only during a reboot. This can be problematic if you have applications that write a lot to the tmp directory and you do not want to reboot your Linux system. A good solution for periodic cleanup of the tmp directory is the tmpreaper tool.

  • Installing it is easy:
$ sudo apt install tmpreaper
  • On installation, tmpreader adds a cron job that runs once a day. This can be seen in /etc/cron.daily/tmpreaper. It calls the tmpreaper program with the options you set in /etc/tmpreaper.conf

  • The tmpreaper tool will work on any directory passed to it, including the root directory. Since deleting the root directory is catastrophic, tmpreaper shows a warning all the time. You can disable this warning by setting SHOWWARNING=false in /etc/tmpreaper.conf

  • When tmpreaper is run once a day by cron, it uses the TMPTIME value set for the tmp directory init scripts as described here to decide which files and directories to delete. For example, if TMPTIME=1, then tmpreaper will delete everything in tmp directory that is older than a day.

  • If you want to apply tmpreaper on directories other than /tmp, then add them to the TMPREAPER_DIRS value in /etc/tmpreaper.conf

  • You can call the tmpreaper program directly at the shell to reap your own directories. For example, to reap all files and directories older than 2 days from the tmp directory:

$ tmpreaper 2d /tmp

Note that you might need to run that command as sudo to be able to delete files created by other users.

  • To be sure that you are not deleting important file, make sure you use the --test option to do a dry run:
$ tmpreaper --test 2d /tmp

Tried with: tmpreaper 1.6.13 and Ubuntu 16.04

How to view GPU topology

The NVIDIA System Management Interface tool is the easiest way to explore the GPU topology on your system. This tool is available as nvidia-smi and is installed as part of the NVIDIA display driver. GPU topology describes how one or more GPUs in the system are connected to each other and to the CPU and other devices in the system. The topology is important to know how data is copied between GPUs or between a GPU and CPU or other device.

  • To view the available commands related to GPU topology:
$ nvidia-smi topo -h
  • To view the connection matrix between the GPUs and the CPUs they are close to (CPU affinities):
$ nvidia-smi topo -m

Some examples of GPU topologies can be seen here.

How to change RPATH or RUNPATH of executable

RPATH or RUNPATH is a colon-separated list of directories embedded in an executable. This list of directories play an important role when shared library file locations are determined at the time when the executable is loaded for running. This process is described in this post. Note that RPATH has highest priority in the shared library search, compared to RUNPATH. We can change RPATH or RUNPATH of a binary file by using the chrpath tool.

  • Installing this tool is easy:
$ sudo apt install chrpath
  • To view if the binary has RPATH or RUNPATH and to list its colon-separated list of directories:
$ chrpath ./some_binary
  • To remove RPATH or RUNPATH from the binary:
$ chrpath -d ./some_binary
  • To convert RPATH of a binary to a RUNPATH:
$ chrpath -c ./some_binary

Note that you cannot convert a RUNPATH back to RPATH.

  • To replace RPATH or RUNPATH paths with a different set of paths:
$ chrpath -r /home/joe:/home/foobar/lib64 ./some_binary

Note that the string of the new set of paths should be smaller or equal to the length of what was stored earlier in the binary.

Tried with: chrpath 0.14 and Ubuntu 16.04

How shared library locations are found at runtime

You have successfully compiled an executable that is linked with one or more external shared libraries. You can view the shared libraries that the executable is dependent on by using the ldd tool. When you actually run the executable, the dynamic linker-loader ld-linux looks for each dependent shared library in the following locations, in order:

  • Using RPATH, if it exists, that is hard-coded in the executable. This is a colon-separated list of directories from where the shared libraries were linked into the executable by the linker during the linking stage of compilation. If this exists, you can view it using this command: readelf -d ./your_binary | grep RPATH
  • Using LD_LIBRARY_PATH, if it is set. This is a colon-separated list of directories set as an environment variable by the user.
  • Using RUNPATH, it is exists, that is hard-coded in the executable. This is a colon-separated list of directories, just like RPATH. If this exists, you can view it using this command: readelf -d ./your_binary | grep RUNPATH
  • Checks the /etc/ld.so.cache. This cache is populated by running the ldconfig program. This program is usually run when libraries are installed. You can view the shared libraries in the cache using this command: ldconfig -p
  • Check in /lib
  • Check in /usr/lib

See it in action

You can actually witness the loader searching directories to find the location of each shared library. To see this in action, try this command:

$ LD_DEBUG=libs ldd ./some_executable

In the output of this command, you will see that:

  • Each shared library listed in the executable is picked up in order.
  • For each shared library, the locations listed above (RPATH, LD_LIBRARY_PATH, RUNPATH, cache, lib and user lib) are tried in order.
  • For each directory listed in the above colon-separated list, the shared library filename is appended and tried to see if the file path exists.
  • The first instance where such a file path exists, that is noted as the location of the shared library.

References:

How to use strace

strace is a tool that shows you the system calls and signals called by a program. It is a very useful tool, especially to check what files and libraries are opened, read or written by a program.

  • Installing strace is easy:
$ sudo apt install strace
  • To view the system calls made by the execution of a program named foobar:
$ strace ./foobar

You will see that strace prints out every system call made by the program, with its input arguments and its output. However, since this verbose listing is printed to the console, you will find it difficult to view the actual output of the program or to interact with it.

  • Usually, strace is asked to write its output to a log file:
$ strace -o strace.log ./foobar
  • Anything but the simplest program will usually fork itself into child processes. By default, strace only traces the parent process launched initially. To request it to trace all child processes, use the -f option:
$ strace -f -o strace.log ./foobar
  • To trace only a few specific system calls, say open and close:
$ strace -e trace=open -o strace.log ./foobar
$ strace -e trace=open,close -o strace.log ./foobar
  • To trace only system calls from a specific category, say those calls that take filename as argument:
$ strace -e trace=file -o strace.log ./foobar

The other categories include process, network, signal, ipc, desc and memory. See the strace manpage for more details on these categories.

  • To trace only specific signals:
$ strace -e signal=sigkill,sigint -o strace.log ./foobar

The full list of signals can be seen in man 7 signal.

  • A very useful option is to trace calls that access a particular path. This can be done using the -P option:
$ strace -P /home/joe/somefile -o strace.log ./foobar

Note that strace is clever enough to show all calls related to the file descriptor produced by the particular path too.

  • By default, the input argument structures to the calls are abbreviated. To view the full structures, use the verbose option:
$ strace -v -o strace.log ./foobar
  • By default, all strings that are read or written are displayed, but only the first 32 characters. To view more of the strings, specify how many characters you want to see to the -s option:
$ strace -s 100 -o strace.log ./foobar
  • Another formatting option that I find useful is to align the output values of all calls to a particular column, say 100:
$ strace -a 100 -o strace.log ./foobar
  • Looking at file descriptors in strace output can be confusing. To ask strace to show the path associated with each file descriptor whenever it prints a file descriptor, use the -y option:
$ strace -y -o strace.log ./foobar

Tried with: strace 4.11 and Ubuntu 16.04

Signify plugin for Vim

If you use version control systems like Git or Mercurial and you use Vim to edit your source files, then you might find the Signify plugin very useful. When you edit a version controlled file, Signify shows which lines are changed using signs (or markers) in the gutter on the left side.

  • Signify can be installed using your favorite plugin manager from: https://github.com/mhinz/vim-signify

  • By default, Signify works with a large number of version controls systems. If you only use one or a few of those, then you can speed up Signify a bit by informing it to only check those VCS. For example, I only use Git and Mercurial, so I add this line to my vimrc:

let g:signify_vcs_list = [ 'git', 'hg' ]
  • You can also override the default marker characters it uses for its signs. For example, to change its delete character:
let g:signify_sign_delete = "-"

Tried with: Vim 7.4 and Ubuntu 16.04