📅 2014-Dec-31 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ multithreading, python ⬩ 📚 Archive
One of the most common sources of performance optimization is to find sections of code where a function executes on multiple data items independently. This type of parallelization can be easily achieved by using multithreading in Python. Alex Martelli provides a MultiThread
class in his book Python Cookbook (2 Ed) that does this:
# Class to execute a function in parallel across multiple data
# Adapted from code in Sec 9.5 of book Python Cookbook (2 Ed)
import threading
import time
import Queue
class MultiThread(object):
def __init__(self, function, argsVector, commonArgs, maxThreads=5, queue_results=False):
self._function = function
self._lock = threading.Lock( )
self._nextArgs = iter(argsVector).next
self._commonArgs = commonArgs
self._threadPool = [ threading.Thread(target=self._doSome) for i in range(maxThreads) ]
if queue_results:
self._queue = Queue.Queue()
else:
self._queue = None
def _doSome(self):
while True:
self._lock.acquire( )
try:
try:
= self._nextArgs( )
args except StopIteration:
break
finally:
self._lock.release( )
= self._function(args, self._commonArgs)
result if self._queue is not None:
self._queue.put((args, result))
def get(self, *a, **kw):
if self._queue is not None:
return self._queue.get(*a, **kw)
else:
raise ValueError, 'Not queueing results'
def start(self):
for thread in self._threadPool:
0) # necessary to give other threads a chance to run
time.sleep(
thread.start( )
def join(self, timeout=None):
for thread in self._threadPool:
thread.join(timeout)
if __name__=="__main__":
import random
def recite_n_times_table(n, _):
for i in range(2, 11):
print "%d * %d = %d" % (n, i, n * i)
0.3 + 0.3*random.random( ))
time.sleep(return
= range(2, 4)
argVector = MultiThread(recite_n_times_table, argVector, None)
mt
mt.start( ) mt.join( )
I was disappointed after using this on my data. Not only was there no speedup, even with 8 threads, you might sometimes even see a regression from the single-threaded speed! The reason for this is that the Python interpreter cannot actually perform multithreading. It uses an internal Global Interpreter Lock (GIL) to prevent corruption of all data structures. So, the only times you might want to use multithreading is when it makes programming easier: for GUI programs or for networking programs.
Tried with: Python 2.7.6 and Ubuntu 14.04