Python Parallel Computing (in 60 Seconds or less)

By Dan Bader — Get free updates of new posts here.

If your Python programs are slower than you’d like you can often speed them up by parallelizing them. In this short primer you’ll learn the basics of parallel processing in Python 2 and 3.

Basically, parallel computing allows you to carry out many calculations at the same time, thus reducing the amount of time it takes to run your program to completion.

I know, this sounds fairly vague and complicated somehow…but bear with me for the next 50 seconds or so.

Here’s an end-to-end example of parallel computing in Python 2/3, using only tools built into the Python standard library—

Ready? Go!

First, we need to do some setup work. We’ll import the collections and the multiprocessing module so we can use Python’s parallel computing facilities and define the data structure we’ll work with:

import collections
import multiprocessing

Second, we’ll use collections.namedtuple to define a new (immutable) data type we can use to represent our data set, a collection of scientists:

Scientist = collections.namedtuple('Scientist', [
    'name',
    'born',
])

scientists = (
    Scientist(name='Ada Lovelace', born=1815),
    Scientist(name='Emmy Noether', born=1882),
    Scientist(name='Marie Curie', born=1867),
    Scientist(name='Tu Youyou', born=1930),
    Scientist(name='Ada Yonath', born=1939),
    Scientist(name='Vera Rubin', born=1928),
    Scientist(name='Sally Ride', born=1951),
)

Third, we’ll write a “data processing function” that accepts a scientist object and returns a dictionary containing the scientist’s name and their calculated age:

def process_item(item):
    return {
        'name': item.name,
        'age': 2017 - item.born
    }

The process_item() function just represents a simple data transformation to keep this example short and sweet—but you could easily swap it out with a more complex computation.

(20 seconds remaining)

Fourth, and this is where the real parallelization magic happens, we’ll set up a multiprocessing pool that allows us to spread our calculations across all available CPU cores.

Then we call the pool’s map() method to apply our process_item() function to all scientist objects, in parallel batches:

pool = multiprocessing.Pool()
result = pool.map(process_item, scientists)

Note how batching and distributing the work across multiple CPU cores, performing the work, and collecting the results are all handled by the multiprocessing pool. How great is that?

The only caveat is that the function you pass to map() must be picklable. That is, it must be possible to serialize the function using Python’s built-in pickle module, otherwise the map() call will fail.

Fifth, we’re all done here with about 5 seconds remaining—

Let’s print the results of our data transformation to the console so we can make sure the program did what it was supposed to:

print(tuple(result))

That’s the end of our little program. And here’s what you should expect to see printed out on your console:

({'name': 'Ada Lovelace', 'age': 202},
 {'name': 'Emmy Noether', 'age': 135},
 {'name': 'Marie Curie', 'age': 150},
 {'name': 'Tu Youyou', 'age': 87},
 {'name': 'Ada Yonath', 'age': 78},
 {'name': 'Vera Rubin', 'age': 89},
 {'name': 'Sally Ride', 'age': 66})

Isn’t Python just lovely?

Now, obviously I took some shortcuts here and picked an example that made parallelization seem effortless—

But, I stand by the lessons learned here:

If you know how to structure and represent your data, parallelization is convenient and feels completely natural. Any Pythonista should pick up the basics of functional programming for this reason.
Python is a joy to work with and eminently suitable for these kinds of programming tasks.

Additional Learning Resources

We only scratched the surface here with this quick primer on parallel processing using Python. If you’d like to dig deeper into this subject, then check out the following two videos in my “Functional Programming in Python” tutorial series:

Full Example Source Code

Here’s the complete source code for this example if you’d like to use it as a basis for your own experiments.

Please note that you might encounter some issues running this multiprocessing example from inside a Jupyter notebook. The best way to get around that is to save this code in a standalone .py file and to run it from the command-line using the Python interpreter.

"""
Python Parallel Processing (in 60 seconds or less)
https://dbader.org/blog/python-parallel-computing-in-60-seconds
"""
import collections
import multiprocessing

Scientist = collections.namedtuple('Scientist', [
    'name',
    'born',
])

scientists = (
    Scientist(name='Ada Lovelace', born=1815),
    Scientist(name='Emmy Noether', born=1882),
    Scientist(name='Marie Curie', born=1867),
    Scientist(name='Tu Youyou', born=1930),
    Scientist(name='Ada Yonath', born=1939),
    Scientist(name='Vera Rubin', born=1928),
    Scientist(name='Sally Ride', born=1951),
)

def process_item(item):
    return {
        'name': item.name,
        'age': 2017 - item.born
    }

pool = multiprocessing.Pool()
result = pool.map(process_item, scientists)

print(tuple(result))

<strong><em>Improve Your Python</em></strong> with a fresh 🐍 <strong>Python Trick</strong> 💌 every couple of days

Improve Your Python with a fresh 🐍 Python Trick 💌 every couple of days

🔒 No spam ever. Unsubscribe any time.

This article was filed under: functional-programming, optimization, parallel-computing, and python.

Related Articles:

Debugging memory usage in a live Python web app – I worked on a Python web app a while ago that was struggling with using too much memory in production. A helpful technique for debugging this issue was adding a simple API endpoint that exposed memory stats while the app was running.
Python’s Functions Are First-Class – Python’s functions are first-class objects. You can assign them to variables, store them in data structures, pass them as arguments to other functions, and even return them as values from other functions.
Fundamental Data Structures in Python – In this article series we’ll take a tour of some fundamental data structures and implementations of abstract data types (ADTs) available in Python’s standard library.
Understanding Asynchronous Programming in Python – How to use Python to write asynchronous programs, and why you’d want to do such a thing.
Extending Python With C Libraries and the “ctypes” Module – An end-to-end tutorial of how to extend your Python programs with libraries written in C, using the built-in “ctypes” module.

Latest Articles:

Interfacing Python and C: The CFFI Module – How to use Python’s built-in CFFI module for interfacing Python with native libraries as an alternative to the “ctypes” approach.
Write More Pythonic Code by Applying the Things You Already Know – There’s a mistake I frequently make when I learn new things about Python… Here’s how you can avoid this pitfall and learn something about Python’s “enumerate()” function at the same time.
Working With File I/O in Python – Learn the basics of working with files in Python. How to read from files, how to write data to them, what file seeks are, and why files should be closed.
How to Reverse a String in Python – An overview of the three main ways to reverse a Python string: “slicing”, reverse iteration, and the classic in-place reversal algorithm. Also includes performance benchmarks.
Mastering Click: Writing Advanced Python Command-Line Apps – How to improve your existing Click Python CLIs with advanced features like sub-commands, user input, parameter types, contexts, and more.
Working with Random Numbers in Python – An overview for working with randomness in Python, using only functionality built into the standard library and CPython itself.

← Browse All Articles

Additional Learning Resources

Full Example Source Code

🐍 Python Tricks 💌