One of the most common sources of performance optimization is to find sections of code where a function executes on multiple data items independently. This type of parallelization can be easily achieved by using multithreading in Python. Alex Martelli provides a
MultiThread class in his book Python Cookbook (2 Ed) that does this:
# Class to execute a function in parallel across multiple data # Adapted from code in Sec 9.5 of book Python Cookbook (2 Ed) import threading import time import Queue class MultiThread(object): def __init__(self, function, argsVector, commonArgs, maxThreads=5, queue_results=False): self._function = function self._lock = threading.Lock( ) self._nextArgs = iter(argsVector).next self._commonArgs = commonArgs self._threadPool = [ threading.Thread(target=self._doSome) for i in range(maxThreads) ] if queue_results: self._queue = Queue.Queue() else: self._queue = None def _doSome(self): while True: self._lock.acquire( ) try: try: args = self._nextArgs( ) except StopIteration: break finally: self._lock.release( ) result = self._function(args, self._commonArgs) if self._queue is not None: self._queue.put((args, result)) def get(self, *a, **kw): if self._queue is not None: return self._queue.get(*a, **kw) else: raise ValueError, 'Not queueing results' def start(self): for thread in self._threadPool: time.sleep(0) # necessary to give other threads a chance to run thread.start( ) def join(self, timeout=None): for thread in self._threadPool: thread.join(timeout) if __name__=="__main__": import random def recite_n_times_table(n, _): for i in range(2, 11): print "%d * %d = %d" % (n, i, n * i) time.sleep(0.3 + 0.3*random.random( )) return argVector = range(2, 4) mt = MultiThread(recite_n_times_table, argVector, None) mt.start( ) mt.join( )
I was disappointed after using this on my data. Not only was there no speedup, even with 8 threads, you might sometimes even see a regression from the single-threaded speed! The reason for this is that the Python interpreter cannot actually perform multithreading. It uses an internal Global Interpreter Lock (GIL) to prevent corruption of all data structures. So, the only times you might want to use multithreading is when it makes programming easier: for GUI programs or for networking programs.
Tried with: Python 2.7.6 and Ubuntu 14.04