5 million views

I am happy to note that this blog CodeYarns.com has passed another milestone today: 5 million views! 😊

The last one million views have arrived in 10 months since Sep 2016. I have not actually been writing new blog posts as much as I want to. There have been only about 60 new posts since the last million milestone. But, the monthly visit counts have been slowly but steadily increasing despite this. I will try to have a more regular writing schedule in what is left of this year. Let us see how long before we cross the 6 million mark! 😊

PS: If you have been following my Twitter account, it is now @codeyarns 😈


A dissection of C++ virtual functions

Virtual functions are a key feature of C++ to enable runtime polymorphism. This post is my attempt in understanding how they are implemented and executed at runtime. The compiler used is GCC 5.4.0 on Ubuntu 16.04.

Here is a simple program that uses virtual functions that we will use as an example:

To aid us in understanding what this code is compiled into, we request GCC to add debugging information (using option -g) when we compile it:

$ g++ -g virtual_function_example.cpp
$ ./a.out
In B

Almost all C++ compilers implement virtual functions by using virtual tables, more commonly called as vtables. This is a table of function addresses, one for each virtual function in the class. One virtual table is created for each class that has virtual functions.

We can see the existence of the methods and virtual tables of each class and their addresses by examining the binary:

$ readelf --symbols a.out | c++filt | grep -E "vtable|A::|B::"

    86: 0000000000400936    11 FUNC    WEAK   DEFAULT   14 A::do_something()
    81: 0000000000400942    30 FUNC    WEAK   DEFAULT   14 A::do_something2()
    87: 0000000000400960    11 FUNC    WEAK   DEFAULT   14 B::do_something()
    84: 000000000040096c    30 FUNC    WEAK   DEFAULT   14 B::do_something2()
    60: 000000000040098a    23 FUNC    WEAK   DEFAULT   14 A::A()
    69: 00000000004009a2    39 FUNC    WEAK   DEFAULT   14 B::B()
    92: 0000000000400a68    32 OBJECT  WEAK   DEFAULT   16 vtable for B
    63: 0000000000400a88    32 OBJECT  WEAK   DEFAULT   16 vtable for A

Here we use the readelf program to extract the symbols from the binary. The symbols are in mangled form that is difficult to decipher for humans. So, we pipe it through a demangler.

Here is the output I got on my computer:

(Click to enlarge)

We can check which sections of virtual memory the class methods and virtual tables will be loaded into by examining the sections of the binary:

$ readelf --sections a.out
There are 37 section headers, starting at offset 0x6b78:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [14] .text             PROGBITS        00000000004007a0 0007a0 0002a2 00  AX  0   0 16
  [16] .rodata           PROGBITS        0000000000400a50 000a50 00008b 00   A  0   0  8

Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

We can cross-examine the addresses of the class methods and virtual tables with the starting addresses and sizes of the sections. We see that the class methods will be loaded into the .text section and the virtual tables into the .rodata segment. The flags of these sections indicate that only the .text section is executable, as it should be.

(Click to enlarge)

Finally, let us examine how the virtual tables are used at runtime to determine which method to execute. To do this, we disassemble the binary instructions in the binary:

$ objdump --disassemble --demangle --source a.out

int main()
  400896:       55                      push   %rbp
  400897:       48 89 e5                mov    %rsp,%rbp
  40089a:       53                      push   %rbx
  40089b:       48 83 ec 18             sub    $0x18,%rsp
    A* a = new B();
  40089f:       bf 08 00 00 00          mov    $0x8,%edi
  4008a4:       e8 d7 fe ff ff          callq  400780 <operator new(unsigned long)@plt>
  4008a9:       48 89 c3                mov    %rax,%rbx
  4008ac:       48 c7 03 00 00 00 00    movq   $0x0,(%rbx)
  4008b3:       48 89 df                mov    %rbx,%rdi
  4008b6:       e8 e7 00 00 00          callq  4009a2 <B::B()>
  4008bb:       48 89 5d e8             mov    %rbx,-0x18(%rbp)
  4008bf:       48 8b 45 e8             mov    -0x18(%rbp),%rax
  4008c3:       48 8b 00                mov    (%rax),%rax
  4008c6:       48 83 c0 08             add    $0x8,%rax
  4008ca:       48 8b 00                mov    (%rax),%rax
  4008cd:       48 8b 55 e8             mov    -0x18(%rbp),%rdx
  4008d1:       48 89 d7                mov    %rdx,%rdi
  4008d4:       ff d0                   callq  *%rax

  4008d6:       b8 00 00 00 00          mov    $0x0,%eax
    return 0;
  4008db:       48 83 c4 18             add    $0x18,%rsp
  4008df:       5b                      pop    %rbx
  4008e0:       5d                      pop    %rbp
  4008e1:       c3                      retq

From the output of objdump, only the disassembly of the main function is shown above. In the above command, we have requested objdump to --disassemble the binary code to assembly code, to --demangle the symbol names to human readable form and to annotate the disassembly with the original C++ --source statements.

By examining the disassembled code, the runtime mystery is revealed. We need to note that every object of a class, that has virtual methods, stores a pointer to its class virtual table. On a 64-bit computer, this means that objects of such classes need extra space of 8 bytes. This pointer is placed at the beginning of the memory layout of the object, even before other members of the object.

When you call a virtual method in C++ code, the compiler generates these instructions:

  • Jump to the beginning of the object. This is a location on the heap or stack, depending on how the object was created. This is where a pointer to its class virtual table is stored.
  • Jump to the start of the class virtual table. This is a location in the .rodata section of the process virtual memory, as we noted earlier.
  • Depending on which virtual method is needed, jump to that entry in the virtual table. This entry has the address of that virtual method.
  • Finally, jump to the address of the virtual method and start executing its instructions. This is in the .text section of the process virtual memory.

Here is an illustration of the code disassembly:

(Click to enlarge)

dlopen: cannot load any more object with static TLS


I had a Python script that used Caffe2. It worked fine on one computer. On another computer with same setup, it would fail at the import caffe2.python line with this error:

WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: dlopen: cannot load any more object with static TLS
CRITICAL:root:Cannot load caffe2.python. Error: dlopen: cannot load any more object with static TLS

As I mentioned above, the GPU support warning is a red herring cause this Caffe2 Python was built with GPU support. The real error is the dlopen.


The only solution from Googling that gave a clue was this. As suggested there, I placed the import caffe2.python line at the top above all other imports. The error disappeared.

Tried with: Ubuntu 14.04

How to build and install Bluez

Bluez is the default Bluetooth protocol stack on Linux. It should be present and installed on your Linux distribution. If not, building and installing from source is not too difficult:

  • Download the latest stable source release of Bluez from here. Unzip the compressed file you downloaded.

  • Install the headers and libraries required for Bluez compilation:

$ sudo apt install libdbus-1-dev libudev-dev libical-dev libreadline-dev

If you do not install the libdbus-1-dev, you will later get this strange error:

configure: error: D-Bus >= 1.6 is required
  • Next configure the Makefile:
$ ./configure

This gave an error about systemd, which is not present on the relatively old Ubuntu I was on:

checking systemd system unit dir... configure: error: systemd system unit directory is required

So, I ran configure disabling systemd:

$ ./configure --disable-systemd
  • After that build and install as usual:
$ make
$ sudo make install

Tried with: Bluez 5.45 and Ubuntu 14.04

How to build Caffe2 from source

I followed these steps to build and use Caffe2 from source:

  • If you have a GPU, install CUDA and cuDNN as described here.

  • Install the package prerequisites:

$ sudo apt install build-essential cmake git libgoogle-glog-dev libprotobuf-dev protobuf-compiler python-dev python-pip libgflags2 libgtest-dev libiomp-dev libleveldb-dev liblmdb-dev libopencv-dev libopenmpi-dev libsnappy-dev openmpi-bin openmpi-doc python-pydot
  • Install the required Python packages using PIP:
$ sudo pip install numpy protobuf flask graphviz hypothesis jupyter matplotlib pydot python-nvd3 pyyaml requests scikit-image scipy setuptools tornado
  • Clone Caffe2 source code:
$ git clone git@github.com:caffe2/caffe2.git
  • Caffe2 is under rapid deployment, so I find that the master branch may sometimes not compile. It is better to check the available release tags and checkout the latest release:
$ git tag
$ git co -b v_0_7_0 v0.7.0
  • The install guide suggests running make to build. Note that this in turn creates a build directory, runs CMake from there and later runs make in a subshell. The child make running from inside a Makefile will not get the MAKEFLAGS of the parent make. So, you cannot make in parallel by using --jobs or -j settings. And believe me building Caffe2 without parallel make takes extremely long! So, I prefer doing the CMake and make myself:
$ mkdir build
$ cd build
$ cmake ..
$ make
  • After Caffe2 is built, you need to install the Caffe2 headers, libraries and Python files to a different location. If you do not configure anything, CMake will try to install Caffe2 to /usr/local, which requires superuser privileges. I prefer installing Caffe2 to a local directory, say /home/joe/caffe2_deploy. To be able to to this:
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=/home/joe/caffe2_deploy ..
$ make install
  • Add the library path to LD_LIBRARY_PATH:
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/joe/caffe2_deploy/lib
  • Add the install directory to PYTHONPATH to be able to use Caffe2 from Python:
$ export PYTHONPATH=$PYTHONPATH:/home/joe/caffe2_deploy
  • Use Caffe2 and enjoy:
$ python -m caffe2.python.operator_test.relu_op_test

Tried with: Ubuntu 14.04

Strange Bluetooth headset problem


My Jabra Move Wireless Bluetooth headset connects without any problem with Kubuntu 16.04. When I try to play any video or audio in any player or even Youtube in a browser, their play button itself does not work! If I disconnect the Bluetooth headset, everything starts working correctly.

Looking up the error logs in /var/log/syslog shows this error:

[pulseaudio] bluez5-util.c: Transport TryAcquire() failed for transport /org/bluez/hci0/dev_00_18_09_24_DD_95/fd3 (Operation Not Authorized)

This only happens in the high-fidelity A2DP mode. If I switch to the terrible sounding lower fidelity mode, everything starts working again. But who would want to listen in low fidelity mode?


Turns out this is a well known bug that falls at the intersection of bluez (Bluetooth module) and PulseAudio as reported here. The only solution seems to be to download this script and running it whenever you see this problem. That is what I did and my headset is back to working again!

How to install cuDNN

cuDNN provides primitives for deep learning networks that have been accelerated for GPUs by NVIDIA.

  • To download cuDNN head over to the cuDNN page here. cuDNN is not directly available for download. NVIDIA requires you to create a login. After that it presents cuDNN downloads in different formats (.tgz or .deb).

  • I prefer to install from .tgz since it gives more control. Unzip the file and it will create a cuda directory which has the required include and lib directories.

  • I like to rename this directory and keep it at /usr/local:

$ mv cuda cudnn
$ mv cudnn /usr/local
  • Remember to add the path to the cuDNN libraries to your LD_LIBRARY_PATH. For my case, that would be /usr/local/cudnn/lib64

  • For CMake in Caffe to automatically find cuDNN while building, export an environment variable named CUDNN_DIR pointing to the directory. For me, that directory would be /usr/local/cudnn

That is it! Caffe should be able to find and link with cuDNN now.

Tried with: cuDNN 6.0 and Ubuntu 16.04

How to install CUDA for NVIDIA GTX 1050 (Notebook)

Installing NVIDIA graphics drivers on Linux has never been easy for me! I bought a notebook with NVIDIA GTX 1050 GPU recently and installed Kubuntu 16.04. I had to wait for more than a month for NVIDIA to release drivers that supported the notebook 1050 variant.

  • Once the driver was released, I downloaded the .run file directly from NVIDIA’s website here. I ran the installation:
$ sudo sh NVIDIA-Linux-x86_64-381.22.run

When I rebooted, I got a black screen! Not surprising with NVIDIA and Linux! I had to uninstall it to get back to work:

$ sudo sh NVIDIA-Linux-x86_64-381.22.run --uninstall
  • After another month, I found that the latest NVIDIA driver supporting the notebook 1050 was available from Ubuntu. So, I tried installing that:
$ sudo apt install nvidia-381

Reboot and I got a new error message in a GUI dialog box:

The system is running in low-graphics mode
Your screen, graphics card, and input device settings could not be detected correctly.
You will need to configure these yourself.

I had to uninstall it to get back to work:

$ sudo apt purge nvidia-381
  • It finally dawned on me that what I really wanted was to be able to run CUDA programs on the GPU. I did not really care about X or games being able to use the GPU. So, I went back to the .run driver and installed it without OpenGL:
$ sudo sh NVIDIA-Linux-x86_64-381.22.run --no-opengl-files

After rebooting, I found that I still had a desktop. That was a big relief! I proceeded to download and install CUDA:

$ sudo sh cuda_8.0.61_375.26_linux.run

I took care to not install the graphics driver that comes along with the CUDA installer. That is it! I was able to compile and run the CUDA samples. Running ./deviceQuery from the samples showed the GTX 1050 and that is all I wanted! 🙂

The HWE stack for Ubuntu LTS

Many Ubuntu LTS users have a problem. They might want to stick with their LTS installation for the full 5 years of support due to stability of applications, libraries and software. But they might also wish for newer versions of Linux kernel and X, so that they can take advantage of newer hardware or fixes for their existing hardware.

Ubuntu provides Hardware Enablement Stack (HWE) that pulls in more recent versions of Linux kernel and X and provides them to Ubuntu LTS release users. The Linux kernel and X that came with the original LTS release is called the General Availability (GA) stack. The latest available HWE stack is also installed by default if you install an updated point release of a LTS release. For example, if you install Ubuntu 16.04, that installs the GA stack. If you install a more recent Ubuntu 16.04.3, that will install a HWE stack.

I faced a problem recently, where my Intel Wireless-AC 3165 hardware was not detected by the Linux kernel in Ubuntu 16.04, as described here. To install the HWE stack on my Ubuntu 16.04, I used this command:

$ sudo apt-get install --install-recommends linux-generic-hwe-16.04 xserver-xorg-hwe-16.04

After switching to the Linux 4.8.x kernel this installed, my wireless hardware worked fine. I was saved by the HWE stack!

For more details on HWE, see LTS Enablement Stack page.

How to build OGRE in Ubuntu

Ubuntu ships with a particular version of OGRE that can be installed easily. For example, on Ubuntu 16.04:

$ sudo apt install libogre-1.9-dev

However, if you need to use an older or newer version of OGRE then you might need to build it yourself. This is not so hard on Linux, as it is on Windows.

  • Install the essential packages needed for building OGRE:
$ sudo apt install build-essential automake libtool libfreetype6-dev libfreeimage-dev libzzip-dev libxrandr-dev libxaw7-dev freeglut3-dev libgl1-mesa-dev libglu1-mesa-dev libpoco-dev libtbb-dev doxygen libcppunit-dev
  • Download the OGRE version you want as a ZIP file from its Mercurial repository on Bitbucket here.

  • Unzip the file and build OGRE:

$ mkdir build
$ cd build
$ cmake ..
$ make
  • To install OGRE to /usr/local/lib/OGRE:
$ sudo make install

Tried with: OGRE 1.8 and Ubuntu 16.04