Command-line options for Google Test program

Google Test can be used to write unit tests for C++ programs. Some of the useful command-line options and flags available to such a program are:

  • --gtest_list_tests: Lists the names of all the tests available.

  • --gtest_filter=pattern: Run only the tests whose names match the regex pattern. For example: `–gtest_filter=”ConvolutionLayerTest.*”

  • --gtest_shuffle: Run unit tests in random order.

Advertisements

Invalid MEX-file error in Matlab

Problem

Running a Matlab script threw this error:

Invalid MEX-file '/home/joe/do_work.mexa64':/usr/local/matlabR2014b/bin/glnxa64/../../sys/os/glnxa64/libstdc++.so.6: version 'GLIBCXX_3.4.19' not found (required by /home/joe/do_work.mexa64).

This error is surprising since do_work.mexa64 was built using Matlab using the same system and same environment settings.

Solution

There is a difference in the glib version between the compiled and running environment. When the do_work.mexa64 was built, the libstdc++.so.6 from /usr/lib/x86_64-linux-gnu/ was used since it was first in LD_LIBRARY_PATH. However, when the script is run by Matlab, it picks up its own libstdc++.so.6 file as explained here!

A simple solution to this problem is to remove or rename the Matlab version of libstdc++.so.6, so that it is not picked up.

Tried with: Ubuntu 14.04

How to implement trie in C++

Trie, also known as prefix tree, is an elegant and efficient data structure to use as a symbol table for keys that are strings. In a trie, the number of accesses required to search or insert a string is linear in the string length. In fact, it is just one more than the number of characters in the string.

Sedgewick’s Algorithms textbook has a particularly elegant trie implementation in Java. I created a similar C++ version just for fun here:

Uniform initialization in C++

Initializing in C++98 used to be a pain:

  • There were multiple ways to initialize: using = or () or {}.

  • Some types of initialization of user types was either difficult or not possible to support.

  • You could not initialize variable of a builtin type to its default value (like in Java). C or C++ only default-initialize in global scope, not in non-global scope.

  • No way to initialize a container (like std::vector or your own) with a series of non-default values.

C++11 introduced an uniform initialization syntax using {} that makes writing and reading C++ a joy! In my opinion, this is a feature that will make everyone’s code easier to write and read. Though it does have a few rough edges (since C++ has to support all previous syntaxes), it almost always works in a way that is most intuitive to the viewer.

It is called uniform initialization because it is applicable in every place where you initialize anything. That includes: variables, constants, objects, in the initializer list of a constructor and even in the member declaration of a struct or class. Everywhere!

I am sharing a code sample below that enumerates the various ways to apply this syntax:

Tried with: GCC 5.1 and Ubuntu 14.04

How to print structure of C++ unordered container

The unordered containers in C++ are implemented as hash tables with separate chaining. The standard does not specify this implementation explicitly. But, by examining the requirements of the container, this is and will be the implementation in any C++ standard library.

Given this implementation, it would be an instructive exercise to be able to print out the actual structure of the hash table, with its buckets and linked list of keys. Thankfully, this is possible since the standard provides a bucket_count method and a begin method that takes a bucket index as input.

Here is simple code that prints buckets of unordered_set and its output on my computer:

Tried with: GCC 5.1 and Ubuntu 14.04

How to find included header files using showIncludes

One of the irritating problems in practical C or C++ programming is finding out which are all the header files included when compiling a source file. This is true both when using the standard library or any other large library of header files.

Starting from a given source file, the header files it includes is a n-ary tree that can be many levels deep. When there is a compilation error due to something in one of these header files, the compiler only points to the file and line number. The programmer needs to figure out the path of header files from the source file being compiled to the final erroneous header file.

One solution is to index all the files automatically (Visual Studio or Eclipse) or manually (ctags) or use a documentation tool (Doxygen). You will still need to sift through their output and they might ignore compile-time macros that are defined or undefined during compilation.

A better solution is the /showIncludes compiler option in Visual C++ which greatly aids this investigation. Turn it on and it will print all the header files that are being processed while compiling a source file. It prints them out with indenting, so the nesting of header files is easy to see.

This option can also be turned on from Project Properties. Go to C++ -> Advanced.

Tried with: Visual Studio 2015 and Windows 7 x64

Pre-defined Compiler Macros for OS

C or C++ code that needs to be ported across multiple operating systems will inevitably end up using pre-processor directives. These directives will look for the pre-defined macros defined by the compilers or OSes on which the code needs to be compiled or run.

  • A comprehensive and updated list of the compiler macros for operating systems is here.

  • For Windows: _WIN32

  • For Linux: __linux__
  • For UNIX: __unix__

Prime number buckets in hash table implementations

Prime numbers are very important in the implementation of hash tables. They might be used in computing a hash for a given key using a hash function. Also, they seem to be commonly used as the size of the hash table, i.e., the number of buckets in the table.

Consider, a hash table with separate chaining. When there is a collision, i.e., multiple keys map to the same bucket, they are maintained in a list at that bucket. The load factor of the hash table is the average number of keys per bucket in the table. When the load factor increases beyond a certain threshold, a new hash table with larger number of buckets is chosen, the existing keys are rehashed and inserted anew into the new hash table.

A key part of a hash table implementation is: what number of buckets to choose when increasing or decreasing the size of a hash table? In containers like vectors, this is easy: in most implementations it grows by 2x the size. But, for hash tables, I found that implementations seem to prefer prime numbers. So, how do hash table implementations pick this prime number?

Hash table in C++ STL of GCC

The containers based on hash tables in C++ STL are unordered_set, unordered_multiset, unordered_map and unordered_multimap. All of them are based on the same underlying hash table implementation. Computing an optimal prime number for a given load factor and number of keys is not easy.

Not surprisingly, the GCC 5.1 implementation of STL has a pre-computed lookup table of prime numbers. I found it in /usr/include/c++/5/ext/pb_ds/detail/resize_policy/hash_prime_size_policy_imp.hpp. Here is the array of prime numbers it uses:

static const std::size_t g_a_sizes[num_distinct_sizes_64_bit] =
{
  /* 0     */              5ul,
  /* 1     */              11ul, 
  /* 2     */              23ul, 
  /* 3     */              47ul, 
  /* 4     */              97ul, 
  /* 5     */              199ul, 
  /* 6     */              409ul, 
  /* 7     */              823ul, 
  /* 8     */              1741ul, 
  /* 9     */              3469ul, 
  /* 10    */              6949ul, 
  /* 11    */              14033ul, 
  /* 12    */              28411ul, 
  /* 13    */              57557ul, 
  /* 14    */              116731ul, 
  /* 15    */              236897ul,
  /* 16    */              480881ul, 
  /* 17    */              976369ul,
  /* 18    */              1982627ul, 
  /* 19    */              4026031ul,
  /* 20    */              8175383ul, 
  /* 21    */              16601593ul, 
  /* 22    */              33712729ul,
  /* 23    */              68460391ul, 
  /* 24    */              139022417ul, 
  /* 25    */              282312799ul, 
  /* 26    */              573292817ul, 
  /* 27    */              1164186217ul,
  /* 28    */              2364114217ul, 
  /* 29    */              4294967291ul,
  /* 30    */ (std::size_t)8589934583ull,
  /* 31    */ (std::size_t)17179869143ull,
  /* 32    */ (std::size_t)34359738337ull,
  /* 33    */ (std::size_t)68719476731ull,
  /* 34    */ (std::size_t)137438953447ull,
  /* 35    */ (std::size_t)274877906899ull,
  /* 36    */ (std::size_t)549755813881ull,
  /* 37    */ (std::size_t)1099511627689ull,
  /* 38    */ (std::size_t)2199023255531ull,
  /* 39    */ (std::size_t)4398046511093ull,
  /* 40    */ (std::size_t)8796093022151ull,
  /* 41    */ (std::size_t)17592186044399ull,
  /* 42    */ (std::size_t)35184372088777ull,
  /* 43    */ (std::size_t)70368744177643ull,
  /* 44    */ (std::size_t)140737488355213ull,
  /* 45    */ (std::size_t)281474976710597ull,
  /* 46    */ (std::size_t)562949953421231ull, 
  /* 47    */ (std::size_t)1125899906842597ull,
  /* 48    */ (std::size_t)2251799813685119ull, 
  /* 49    */ (std::size_t)4503599627370449ull,
  /* 50    */ (std::size_t)9007199254740881ull, 
  /* 51    */ (std::size_t)18014398509481951ull,
  /* 52    */ (std::size_t)36028797018963913ull, 
  /* 53    */ (std::size_t)72057594037927931ull,
  /* 54    */ (std::size_t)144115188075855859ull,
  /* 55    */ (std::size_t)288230376151711717ull,
  /* 56    */ (std::size_t)576460752303423433ull,
  /* 57    */ (std::size_t)1152921504606846883ull,
  /* 58    */ (std::size_t)2305843009213693951ull,
  /* 59    */ (std::size_t)4611686018427387847ull,
  /* 60    */ (std::size_t)9223372036854775783ull,
  /* 61    */ (std::size_t)18446744073709551557ull,
};

You can create a simple C++ program that inserts keys into an unordered_set and then check the number of buckets using the bucket_count method. You will find that it will be one of the above listed prime numbers.

Hash table in .Net (C#)

Now that the .Net source code is available, I also checked out its System.Collections.HashTable implementation. It too seems to be using a lookup table of prime numbers for the table size. Here is the list from its source code:

// Table of prime numbers to use as hash table sizes. 
// A typical resize algorithm would pick the smallest prime number in this array
// that is larger than twice the previous capacity. 
// Suppose our Hashtable currently has capacity x and enough elements are added 
// such that a resize needs to occur. Resizing first computes 2x then finds the 
// first prime in the table greater than 2x, i.e. if primes are ordered 
// p_1, p_2, ..., p_i, ..., it finds p_n such that p_n-1 < 2x < p_n. 
// Doubling is important for preserving the asymptotic complexity of the 
// hashtable operations such as add.  Having a prime guarantees that double 
// hashing does not lead to infinite loops.  IE, your hash function will be 
// h1(key) + i*h2(key), 0 <= i < size.  h2 and the size must be relatively prime.
public static readonly int[] primes = {
    3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
    1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
    17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
    187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
    1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369};

The prime numbers in .Net are different from that in GCC STL. I’m guessing they are tuned for the .Net load factors, languages and virtual machine.

A conclusion we can draw from these observations is that though hash tables easily beat balanced binary search tree (BST) in lookup performance, they are not any easier to implement. Especially a hash table that is built for general purpose applications and data sizes.

How to warn on shadow declaration

Shadow declarations are a major source of misunderstanding and bugs in C or C++ code. C and C++ allow shadow declarations, but this is one of those features that is better not used in production code that needs to be understood and maintained by multiple people.

  • Use the -Wshadow compiler option to be warned about shadow declarations.

  • Note that this option is not part of -Wall or -Wextra, so you will need to explicitly add this to your build.

  • You might find that header files you included from external libraries might throw this warning. (I have faced this with the OpenNI library.) In such a case, ignore this warning only for that particular header file inclusion by using diagnostic pragma, as described here .

Tried with: GCC 5.1 and Ubuntu 14.04

How to change compiler options used by indexer in Eclipse CDT

Eclipse CDT understands the C++ code in a window by running it through an indexer. This indexer is nothing but an invocation of the GCC C++ compiler with certain compilation options. Sometimes, you might want to change the compiler options used by this indexer.

For example, I recently found that C++11 containers and classes (like future) were not resolved by the indexer and were underlined with red squiggles. This is because the compiler options used by the indexer does not have -std=c++11.

To change the compiler options of the indexer:

  1. Open Preferences and go to C/C++ -> Build -> Settings.
  2. Click the Discovery tab and choose CDT GCC Built-in Compiler Settings.
  3. Modify the command string shown below it as you wish. For example, here I added -std=c++11.
  4. Eclipse CDT will automatically re-index your C++ files after this is saved. However, I found that this did not remove the unresolved items.
  5. I manually re-indexed by right-clicking the project and choosing Index -> Rebuild. This worked!

Tried with: Eclipse 4.5 (Mars) and Ubuntu 14.04