It is extremely simple to do asynchronous threading in C++ now. It is straightforward to spawn threads to perform some work and collect back the results from them later. This is an important step forward for C++ due to the increasing importance of multithreading and the hairy platform-specific threading mess C++ had in the past.
Threading is easy:
- Include the header
Create asynchronous thread by calling
std::asyncand passing the function to use and its parameters.
On Linux, threads will execute sequentially by default! Yeah, I know! To force them to execute in parallel, pass the
std::launch::asyncas the first parameter.
If passing a class method, then pass pointer to it and pointer to class object as first parameter.
std::futurestructure makes it easy to control the thread. Its template parameter is the same as the result type returned by your thread.
futureto wait for thread to finish and return back the result.
Whether the threads will immediately run or later, the order they will run in and other such concerns cannot be relied on.
For example, to spawn off 10 threads to do some work and gather back their results:
How many threads should you spawn off to maximize usage of the CPU and memory resources on your computer? In other words, how big should your thread pool be? The only way to answer this is by measuring the performance for different pool sizes on your particular application and decide. The right answer for your application will depend on how much compute or memory bound the computation you are doing inside the thread might be.
For example, for a particular function that I was trying to parallelize, it was both doing a lot of compute and also accessing quite a few data locations in memory. I tried different thread pool sizes and compared them with how long the computation was taking on average per thread. Here are the results:
So, in my case it looks like a pool of 32 threads gives close to the optimum performance. Something seems to happen at 64 threads, from which point on the performance drops. I am guessing that this is the point where the Linux kernel finds the overhead of managing that number of threads starts to hit the thread computation performance.
Tried with: GCC 4.9.2, Linux 3.13.0-45, Ubuntu 14.04 and Intel i7-4790