Asynchronous threads in C++

It is extremely simple to do asynchronous threading in C++ now. It is straightforward to spawn threads to perform some work and collect back the results from them later. This is an important step forward for C++ due to the increasing importance of multithreading and the hairy platform-specific threading mess C++ had in the past.

Threading is easy:

  • Include the header future

  • Create asynchronous thread by calling std::async and passing the function to use and its parameters.

  • On Linux, threads will execute sequentially by default! Yeah, I know! To force them to execute in parallel, pass the std::launch::async as the first parameter.

  • If passing a class method, then pass pointer to it and pointer to class object as first parameter.

  • std::future structure makes it easy to control the thread. Its template parameter is the same as the result type returned by your thread.

  • Call get on the future to wait for thread to finish and return back the result.

  • Whether the threads will immediately run or later, the order they will run in and other such concerns cannot be relied on.

For example, to spawn off 10 threads to do some work and gather back their results:

How many threads should you spawn off to maximize usage of the CPU and memory resources on your computer? In other words, how big should your thread pool be? The only way to answer this is by measuring the performance for different pool sizes on your particular application and decide. The right answer for your application will depend on how much compute or memory bound the computation you are doing inside the thread might be.

For example, for a particular function that I was trying to parallelize, it was both doing a lot of compute and also accessing quite a few data locations in memory. I tried different thread pool sizes and compared them with how long the computation was taking on average per thread. Here are the results:

Size of thread pool versus thread performance on GCC 4.9.2, Linux 3.13.0-45 and Intel i7-4790
Size of thread pool versus thread performance on GCC 4.9.2, Linux 3.13.0-45 and Intel i7-4790

So, in my case it looks like a pool of 32 threads gives close to the optimum performance. Something seems to happen at 64 threads, from which point on the performance drops. I am guessing that this is the point where the Linux kernel finds the overhead of managing that number of threads starts to hit the thread computation performance.

Tried with: GCC 4.9.2, Linux 3.13.0-45, Ubuntu 14.04 and Intel i7-4790

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s