I am happy to note that this blog CodeYarns.com has passed another milestone today: 5 million views! 😊
The last one million views have arrived in 10 months since Sep 2016. I have not actually been writing new blog posts as much as I want to. There have been only about 60 new posts since the last million milestone. But, the monthly visit counts have been slowly but steadily increasing despite this. I will try to have a more regular writing schedule in what is left of this year. Let us see how long before we cross the 6 million mark! 😊
PS: If you have been following my Twitter account, it is now @codeyarns 😈
Virtual functions are a key feature of C++ to enable runtime polymorphism. This post is my attempt in understanding how they are implemented and executed at runtime. The compiler used is GCC 5.4.0 on Ubuntu 16.04.
Here is a simple program that uses virtual functions that we will use as an example:
To aid us in understanding what this code is compiled into, we request GCC to add debugging information (using option -g) when we compile it:
$ g++ -g virtual_function_example.cpp
Almost all C++ compilers implement virtual functions by using virtual tables, more commonly called as vtables. This is a table of function addresses, one for each virtual function in the class. One virtual table is created for each class that has virtual functions.
We can see the existence of the methods and virtual tables of each class and their addresses by examining the binary:
Here we use the readelf program to extract the symbols from the binary. The symbols are in mangled form that is difficult to decipher for humans. So, we pipe it through a demangler.
Here is the output I got on my computer:
We can check which sections of virtual memory the class methods and virtual tables will be loaded into by examining the sections of the binary:
$ readelf --sections a.out
There are 37 section headers, starting at offset 0x6b78:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
 .text PROGBITS 00000000004007a0 0007a0 0002a2 00 AX 0 0 16
 .rodata PROGBITS 0000000000400a50 000a50 00008b 00 A 0 0 8
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
We can cross-examine the addresses of the class methods and virtual tables with the starting addresses and sizes of the sections. We see that the class methods will be loaded into the .text section and the virtual tables into the .rodata segment. The flags of these sections indicate that only the .text section is executable, as it should be.
Finally, let us examine how the virtual tables are used at runtime to determine which method to execute. To do this, we disassemble the binary instructions in the binary:
From the output of objdump, only the disassembly of the main function is shown above. In the above command, we have requested objdump to --disassemble the binary code to assembly code, to --demangle the symbol names to human readable form and to annotate the disassembly with the original C++ --source statements.
By examining the disassembled code, the runtime mystery is revealed. We need to note that every object of a class, that has virtual methods, stores a pointer to its class virtual table. On a 64-bit computer, this means that objects of such classes need extra space of 8 bytes. This pointer is placed at the beginning of the memory layout of the object, even before other members of the object.
When you call a virtual method in C++ code, the compiler generates these instructions:
Jump to the beginning of the object. This is a location on the heap or stack, depending on how the object was created. This is where a pointer to its class virtual table is stored.
Jump to the start of the class virtual table. This is a location in the .rodata section of the process virtual memory, as we noted earlier.
Depending on which virtual method is needed, jump to that entry in the virtual table. This entry has the address of that virtual method.
Finally, jump to the address of the virtual method and start executing its instructions. This is in the .text section of the process virtual memory.
I had a Python script that used Caffe2. It worked fine on one computer. On another computer with same setup, it would fail at the import caffe2.python line with this error:
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: dlopen: cannot load any more object with static TLS
CRITICAL:root:Cannot load caffe2.python. Error: dlopen: cannot load any more object with static TLS
As I mentioned above, the GPU support warning is a red herring cause this Caffe2 Python was built with GPU support. The real error is the dlopen.
The only solution from Googling that gave a clue was this. As suggested there, I placed the import caffe2.python line at the top above all other imports. The error disappeared.
Caffe2 is under rapid deployment, so I find that the master branch may sometimes not compile. It is better to check the available release tags and checkout the latest release:
$ git tag
$ git co -b v_0_7_0 v0.7.0
The install guide suggests running make to build. Note that this in turn creates a build directory, runs CMake from there and later runs make in a subshell. The child make running from inside a Makefile will not get the MAKEFLAGS of the parent make. So, you cannot make in parallel by using --jobs or -j settings. And believe me building Caffe2 without parallel make takes extremely long! So, I prefer doing the CMake and make myself:
$ mkdir build
$ cd build
$ cmake ..
After Caffe2 is built, you need to install the Caffe2 headers, libraries and Python files to a different location. If you do not configure anything, CMake will try to install Caffe2 to /usr/local, which requires superuser privileges. I prefer installing Caffe2 to a local directory, say /home/joe/caffe2_deploy. To be able to to this:
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=/home/joe/caffe2_deploy ..
$ make install
My Jabra Move Wireless Bluetooth headset connects without any problem with Kubuntu 16.04. When I try to play any video or audio in any player or even Youtube in a browser, their play button itself does not work! If I disconnect the Bluetooth headset, everything starts working correctly.
Looking up the error logs in /var/log/syslog shows this error:
[pulseaudio] bluez5-util.c: Transport TryAcquire() failed for transport /org/bluez/hci0/dev_00_18_09_24_DD_95/fd3 (Operation Not Authorized)
This only happens in the high-fidelity A2DP mode. If I switch to the terrible sounding lower fidelity mode, everything starts working again. But who would want to listen in low fidelity mode?
Turns out this is a well known bug that falls at the intersection of bluez (Bluetooth module) and PulseAudio as reported here. The only solution seems to be to download this script and running it whenever you see this problem. That is what I did and my headset is back to working again!
cuDNN provides primitives for deep learning networks that have been accelerated for GPUs by NVIDIA.
To download cuDNN head over to the cuDNN page here. cuDNN is not directly available for download. NVIDIA requires you to create a login. After that it presents cuDNN downloads in different formats (.tgz or .deb).
I prefer to install from .tgz since it gives more control. Unzip the file and it will create a cuda directory which has the required include and lib directories.
I like to rename this directory and keep it at /usr/local:
$ mv cuda cudnn
$ mv cudnn /usr/local
Remember to add the path to the cuDNN libraries to your LD_LIBRARY_PATH. For my case, that would be /usr/local/cudnn/lib64
For CMake in Caffe to automatically find cuDNN while building, export an environment variable named CUDNN_DIR pointing to the directory. For me, that directory would be /usr/local/cudnn
That is it! Caffe should be able to find and link with cuDNN now.
Installing NVIDIA graphics drivers on Linux has never been easy for me! I bought a notebook with NVIDIA GTX 1050 GPU recently and installed Kubuntu 16.04. I had to wait for more than a month for NVIDIA to release drivers that supported the notebook 1050 variant.
Once the driver was released, I downloaded the .run file directly from NVIDIA’s website here. I ran the installation:
$ sudo sh NVIDIA-Linux-x86_64-381.22.run
When I rebooted, I got a black screen! Not surprising with NVIDIA and Linux! I had to uninstall it to get back to work:
$ sudo sh NVIDIA-Linux-x86_64-381.22.run --uninstall
After another month, I found that the latest NVIDIA driver supporting the notebook 1050 was available from Ubuntu. So, I tried installing that:
$ sudo apt install nvidia-381
Reboot and I got a new error message in a GUI dialog box:
The system is running in low-graphics mode
Your screen, graphics card, and input device settings could not be detected correctly.
You will need to configure these yourself.
I had to uninstall it to get back to work:
$ sudo apt purge nvidia-381
It finally dawned on me that what I really wanted was to be able to run CUDA programs on the GPU. I did not really care about X or games being able to use the GPU. So, I went back to the .run driver and installed it without OpenGL:
$ sudo sh NVIDIA-Linux-x86_64-381.22.run --no-opengl-files
After rebooting, I found that I still had a desktop. That was a big relief! I proceeded to download and install CUDA:
$ sudo sh cuda_8.0.61_375.26_linux.run
I took care to not install the graphics driver that comes along with the CUDA installer. That is it! I was able to compile and run the CUDA samples. Running ./deviceQuery from the samples showed the GTX 1050 and that is all I wanted! 🙂
Many Ubuntu LTS users have a problem. They might want to stick with their LTS installation for the full 5 years of support due to stability of applications, libraries and software. But they might also wish for newer versions of Linux kernel and X, so that they can take advantage of newer hardware or fixes for their existing hardware.
Ubuntu provides Hardware Enablement Stack (HWE) that pulls in more recent versions of Linux kernel and X and provides them to Ubuntu LTS release users. The Linux kernel and X that came with the original LTS release is called the General Availability (GA) stack. The latest available HWE stack is also installed by default if you install an updated point release of a LTS release. For example, if you install Ubuntu 16.04, that installs the GA stack. If you install a more recent Ubuntu 16.04.3, that will install a HWE stack.
I faced a problem recently, where my Intel Wireless-AC 3165 hardware was not detected by the Linux kernel in Ubuntu 16.04, as described here. To install the HWE stack on my Ubuntu 16.04, I used this command: