How to read YAML file in Python with ordered keys

It is very easy to read a YAML file in Python as a combination of dict and lists using PyYAML. However, the YAML format does not require PyYAML to read the keys of any dict in the YAML file to be read in the order it appears in the file. In addition, Python dict also does not have any order to the keys in it. However, in certain situations it might be necessary to read the keys in YAML in the order they appear in the file. This can be done by using the yamlordereddictloader.

  • Installing this Python package is easy using pip:
$ sudo pip install yamlordereddictloader
  • Read YAML files by providing the loader from this package to PyYAML:
import yaml
import yamlordereddictloader

with open("foobar.yaml") as f:
    yaml_data = yaml.load(f, Loader=yamlordereddictloader.Loader)

This returns the data in the YAML file as a combination of lists and OrderedDict (instead of dict). So, almost all of the rest of the your code should work the same as before after this change.

Tried with: yamlordereddictloader 0.4 and Ubuntu 16.04

Advertisements

How to monitor all traffic on a port

There are many times when you run a server or service on a host at a certain network port. It would be useful to monitor if clients are able to connect to the port.

tcpdump can be used to watch all the traffic to a certain port on your current computer:

$ sudo tcpdump -i any port 9988

pstree cheatsheet

The ps command in Linux has been horribly mauled by having to support Unix, BSD and GNU options and innumerable eccentric back-compatible necessities that I find it almost unusable. Instead, I like to use the pstree command for almost all my process exploration tasks. pstree shows the processes on a system as a tree.

  • Installing it is easy:
$ sudo apt install psmisc
  • By default, pstree shows the processes of all users as a tree, where each link connects the parent process on the left to the child process on the right:
$ pstree
  • To only view the processes of a particular user joe:
$ pstree joe
  • To view the command-line arguments of each process:
$ pstree -a
  • To view the PID of each process:
$ pstree -p
  • To view all the children or descendants of a process with PID 900:
$ pstree 900
  • To view both the ancestors and descendants of a process with PID 900:
$ pstree -s 900

Tried with: pstree 22.21 and Ubuntu 16.04

urlopen got multiple values error

Problem

I tried to run a Python script given by a friend. It ended with this error:

  File "/usr/lib/python2.7/dist-packages/urllib3/request.py", line 79, in request
    **urlopen_kw)
  File "/usr/lib/python2.7/dist-packages/urllib3/request.py", line 142, in request_encode_body
    **urlopen_kw)
TypeError: urlopen() got multiple values for keyword argument 'body'

Solution

The script used a Python package, which in turn used urllib3. This strange error has nothing to do with my code, but with the urllib3 package. The urllib3 package installed by Ubuntu was pretty old: 1.7.1. Updating to a more recent version of the package will fix the error. Either upgrade using sudo pip install urllib3 for system-wide update or update inside your virtualenv.

xdiskusage

The du command is a useful tool to check the size of directories and files on a filesystem or inside a directory. However, its output is not the most intuitive, especially if you want to probe further into child directories trying to find what is occupying space. There are ncurses console tools like ncdu and GUI tools like Baobab that help with this. Another good alternative I have discovered is xdiskusage.

xdiskusage is a GUI application that parses the output of du to show the disk space occupied by directories and files. Written in the FLTK toolkit, it is super-light and is possibly the lightest and fastest disk usage tool I have used. That makes it great for using over SSH sessions too. It uses the treemap visualization, so it is much easier to investigate disk usage than looking at du output.

  • Installing it is easy:
$ sudo apt install xdiskusage
  • Invoke without any arguments if you want to be presented with a list of all the filesystems on your computer. Double-click on any to view treemap visualization of its contents.

  • To look under a particular directory, pass it as the first argument to the program.

  • In the treemap visualization, you can keep probing further by double-clicking on any rectangle.

  • Right-click in the GUI to get menu of all possible actions using mouse. You can also see the keyboard bindings available for browsing using keyboard.

Tried with: xdiskusage 1.48 and Ubuntu 16.04

Stub library warning on libnvidia-ml.so

Problem

I tried to run a program compiled with CUDA 9.0 inside a Docker container and got this error:

WARNING:

You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn't have
to have Display Driver installed).

Solution

Let us first try to understand the error and where it is coming from. The program compiled with CUDA 9.0 has been linked to libnvidia-ml.so. This is the shared library file of the NVIDIA Management Library (NVML). During execution, libnvidia-ml.so is throwing this error. Why?

From the error message, we get an indication that there are two libnvidia-ml.so files. One is a stub that is used during compilation and linking. I guess it just provides the necessary function symbols and signatures. But that library cannot be used to execute the compiled executable. If we do try to execute with that stub shared library file, it will throw this warning.

So, there is a second libnvidia-ml.so, the real shared library file. It turns out that the management library is provided by the NVIDIA display driver. So, every version of display driver will have its own libnvidia-ml.so file. I had NVIDIA display driver 384.66 on my machine and I found libnvidia-ml.so under /usr/lib/nvidia-384. The stub library file allows you to compile on machines where the NVIDIA display driver is not installed. In our case, for some reason, the loader is picking up the stub instead of the real library file during execution.

By using the chrpath tool, described here, I found that the compiled binary did indeed have the stub library directory in its path:/usr/local/cuda/lib64/stubs. That directory did have a libnvidia-ml.so. Using the strings tool on that shared library, confirmed that it was the origin of the above message:

$ strings libnividia-ml.so | grep "You should always run with"

Since the binary has an RPATH, described here, with the stubs path, the stub library was getting picked up with high preference over the actual libnvidia-ml.so, which was present in . The solution I came up with for this problem was to add a command to the docker run invocation to delete the stubs directory:

$ rm -rf  /usr/local/cuda/lib64/stubs

That way, it was still available outside Docker for compilation. It would just appeared deleted inside the Docker container, thus forcing the loader to pick up the real libnvidia-ml.so during execution.

How to find the init system used by Linux

init is the first process that is started by an Unix operating system. It has PID 1 and handles creation of all other processes and daemon processes required by the OS. It acts as the ancestor of all processes. There are many init systems that have been used in Linux distributions over the years. Some of the popular init systems are the classic System V init, upstart (Ubuntu used to use this) and systemd (popular right now for all Linux distros).

So, you are sitting at a Linux computer, do you know which init system it is using? There is no straightforward method to find this out.

Here are a few methods, one of which should work:

  • Check the init version, it should report the init system name too:
$ init --version
init (upstart 1.12.1)
Copyright (C) 2006-2014 Canonical Ltd., 2011 Scott James Remnant

This was on a Ubuntu 14.04 system. You can see that it uses upstart.

  • Sometimes the /sbin/init is a symbolic link to the actual init:
$ stat /sbin/init
  File: '/sbin/init' -> '/lib/systemd/systemd'

This was on a Ubuntu 16.04 system. You can see that it uses systemd.

How tmp directory is cleaned

The /tmp directory in Linux is where temporary files and directories are created by applications. If you did not do anything special during installation, it will be a directory under the root directory partition and hence has the same filesystem as that of the root filesystem. If you specifically created a partition for /tmp, then it might be of the tmpfs filesystem type.

If the root partition is running out of space, you might want to check how big tmp directory is:

$ cd /tmp
$ du --summarize .

On my Ubuntu, I find that tmp directory is cleaned up only at the time of system startup or reboot. The init scripts in /etc/rcS.d check the TMPTIME value set in /etc/default/rcS. This value indicates which files and directories in the tmp directory are up for deletion.

You can change this TMPTIME value to your preference:

  • If TMPTIME=0, then everything inside the tmp directory is deleted at reboot.
  • If TMPTIME=1, then only files and directories older than a day are deleted and so on.
  • If TMPTIME=-1, then nothing is deleted.

You will notice that the tmp directory is cleaned up only when you reboot the system. If you want a periodic cleanup of the tmp directory, then use the tmpreaper tool as described here.

Tried with: Ubuntu 14.04

How to install and use tmpreaper

The files and directories in the tmp directory are cleaned only during a reboot. This can be problematic if you have applications that write a lot to the tmp directory and you do not want to reboot your Linux system. A good solution for periodic cleanup of the tmp directory is the tmpreaper tool.

  • Installing it is easy:
$ sudo apt install tmpreaper
  • On installation, tmpreader adds a cron job that runs once a day. This can be seen in /etc/cron.daily/tmpreaper. It calls the tmpreaper program with the options you set in /etc/tmpreaper.conf

  • The tmpreaper tool will work on any directory passed to it, including the root directory. Since deleting the root directory is catastrophic, tmpreaper shows a warning all the time. You can disable this warning by setting SHOWWARNING=false in /etc/tmpreaper.conf

  • When tmpreaper is run once a day by cron, it uses the TMPTIME value set for the tmp directory init scripts as described here to decide which files and directories to delete. For example, if TMPTIME=1, then tmpreaper will delete everything in tmp directory that is older than a day.

  • If you want to apply tmpreaper on directories other than /tmp, then add them to the TMPREAPER_DIRS value in /etc/tmpreaper.conf

  • You can call the tmpreaper program directly at the shell to reap your own directories. For example, to reap all files and directories older than 2 days from the tmp directory:

$ tmpreaper 2d /tmp

Note that you might need to run that command as sudo to be able to delete files created by other users.

  • To be sure that you are not deleting important file, make sure you use the --test option to do a dry run:
$ tmpreaper --test 2d /tmp

Tried with: tmpreaper 1.6.13 and Ubuntu 16.04

How to view GPU topology

The NVIDIA System Management Interface tool is the easiest way to explore the GPU topology on your system. This tool is available as nvidia-smi and is installed as part of the NVIDIA display driver. GPU topology describes how one or more GPUs in the system are connected to each other and to the CPU and other devices in the system. The topology is important to know how data is copied between GPUs or between a GPU and CPU or other device.

  • To view the available commands related to GPU topology:
$ nvidia-smi topo -h
  • To view the connection matrix between the GPUs and the CPUs they are close to (CPU affinities):
$ nvidia-smi topo -m

Some examples of GPU topologies can be seen here.