Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

Asynchronous threads in C++

📅 2015-Apr-24 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ async, cpp, thread ⬩ 📚 Archive

It is extremely simple to do asynchronous threading in C++ now. It is straightforward to spawn threads to perform some work and collect back the results from them later. This is an important step forward for C++ due to the increasing importance of multithreading and the hairy platform-specific threading mess C++ had in the past.

Threading is easy:

For example, to spawn off 10 threads to do some work and gather back their results:

#include <future>

float DoWork(int idx)
{
    // Do some hard computation using idx
    // and internal read-only data structures
    // Return the float result
}

void DoAsync()
{
    // To control async threads and their results
    std::vector<std::future<float>> fut_vec;

    // Create 10 async threads
    for (int i = 0; i < 10; ++i)
        fut_vec.push_back(
                          std::async(DoWork, i));

    // Collect results from 10 async threads
    float result = 0;
    for (int i = 0; i < 10; ++i)
        result += fut_vec[i].get();
    std::cout << "Result: " << result << std::endl;
}

// Note 1: On Linux use std::launch::async launch policy
// Else threads will execute sequentially
// fut_vec.push_back(
//                   std::async(std::launch::async, DoWork, i));
//
// Note 2: To call method of a class, pass pointer to object as
// first parameter:
//
//struct DoWorkClass
//{
//    float DoWork(int idx) {}
//
//    void DoAsync()
//    {
//        // ...
//        fut_vec.push_back(
//                          std::async(std::launch::async, &DoWorkClass::DoWork, this, i);
//};

How many threads should you spawn off to maximize usage of the CPU and memory resources on your computer? In other words, how big should your thread pool be? The only way to answer this is by measuring the performance for different pool sizes on your particular application and decide. The right answer for your application will depend on how much compute or memory bound the computation you are doing inside the thread might be.

For example, for a particular function that I was trying to parallelize, it was both doing a lot of compute and also accessing quite a few data locations in memory. I tried different thread pool sizes and compared them with how long the computation was taking on average per thread. Here are the results:

[caption id="attachment_6824" align="alignnone" width="660"][Size of thread pool versus thread performance on GCC 4.9.2, Linux 3.13.0-45 and Intel i7-4790](https://codeyarns.files.wordpress.com/2015/04/20150417_cpp_thread_performance.png) Size of thread pool versus thread performance on GCC 4.9.2, Linux 3.13.0-45 and Intel i7-4790[/caption]

So, in my case it looks like a pool of 32 threads gives close to the optimum performance. Something seems to happen at 64 threads, from which point on the performance drops. I am guessing that this is the point where the Linux kernel finds the overhead of managing that number of threads starts to hit the thread computation performance.

Tried with: GCC 4.9.2, Linux 3.13.0-45, Ubuntu 14.04 and Intel i7-4790