Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

How to execute function in multiple threads in Python

📅 2014-Dec-31 ⬩ ✍️ Ashwin Nanjappa ⬩ 📚 Archive

One of the most common sources of performance optimization is to find sections of code where a function executes on multiple data items independently. This type of parallelization can be easily achieved by using multithreading in Python. Alex Martelli provides a MultiThread class in his book Python Cookbook (2 Ed) that does this:

# Class to execute a function in parallel across multiple data
# Adapted from code in Sec 9.5 of book Python Cookbook (2 Ed)

import threading
import time
import Queue

class MultiThread(object):

    def __init__(self, function, argsVector, commonArgs, maxThreads=5, queue_results=False):
        self._function = function
        self._lock = threading.Lock( )
        self._nextArgs = iter(argsVector).next
        self._commonArgs = commonArgs
        self._threadPool = [ threading.Thread(target=self._doSome) for i in range(maxThreads) ]
        if queue_results:
            self._queue = Queue.Queue()
        else:
            self._queue = None

    def _doSome(self):
        while True:
            self._lock.acquire( )
            try:
                try:
                    args = self._nextArgs( )
                except StopIteration:
                    break
            finally:
                self._lock.release( )
            result = self._function(args, self._commonArgs)
            if self._queue is not None:
                self._queue.put((args, result))

    def get(self, *a, **kw):
        if self._queue is not None:
            return self._queue.get(*a, **kw)
        else:
            raise ValueError, 'Not queueing results'

    def start(self):
        for thread in self._threadPool:
            time.sleep(0) # necessary to give other threads a chance to run
            thread.start( )

    def join(self, timeout=None):
        for thread in self._threadPool:
            thread.join(timeout)

if __name__=="__main__":

    import random

    def recite_n_times_table(n, _):
        for i in range(2, 11):
            print "%d * %d = %d" % (n, i, n * i)
            time.sleep(0.3 + 0.3*random.random( ))
        return

    argVector = range(2, 4)
    mt = MultiThread(recite_n_times_table, argVector, None)
    mt.start( )
    mt.join( )

I was disappointed after using this on my data. Not only was there no speedup, even with 8 threads, you might sometimes even see a regression from the single-threaded speed! The reason for this is that the Python interpreter cannot actually perform multithreading. It uses an internal Global Interpreter Lock (GIL) to prevent corruption of all data structures. So, the only times you might want to use multithreading is when it makes programming easier: for GUI programs or for networking programs.

Tried with: Python 2.7.6 and Ubuntu 14.04