dbader.org - Python

Interfacing Python and C: The CFFI Module

Tue, 27 Feb 2018 00:00:00 GMT

Interfacing Python and C: The CFFI Module

How to use Python’s built-in CFFI module for interfacing Python with native libraries as an alternative to the “ctypes” approach.

In previous tutorials, we covered the basics of ctypes and some advanced ctypes usage. This tutorial will cover the CFFI module. CFFI is a richer environment than ctypes, allowing several different options for how you want to interface with a native library.

In this tutorial we will be covering:

‘Out-of-line’ vs ‘in-line’ interfaces
Building and running CFFI-based scripts on Linux
Creating simple Python classes to mirror C structures
Passing structures by reference
Working around some CFFI limitations

As with previous tutorials, let’s start by taking a look with the simple C library we will be using and how to build it, and then jump into loading a C library and calling functions in it.

The C Library Code

All of the code to build and test the examples discussed here (as well as the Markdown for this article) are committed to my GitHub repository.

The library consists of two data structures; Point and Line. A Point is a pair of (x,y) coordinates while a Line has a Start and End Point. There are also a handful of functions which modify each of these types.

Let’s take a closer look at the Point structure and its associated functions.

/* Point.h */
/* Simple structure for ctypes example */
typedef struct {
    int x;
    int y;
} Point;

/* Point.c */
/* display a Point value */
void show_point(Point point) {
    printf("Point in C      is (%d, %d)\n", point.x, point.y);
}

/* Increment a Point which was passed by value */
void move_point(Point point) {
    show_point(point);
    point.x++;
    point.y++;
    show_point(point);
}

/* Increment a Point which was passed by reference */
void move_point_by_ref(Point *point) {
    show_point(*point);
    point->x++;
    point->y++;
    show_point(*point);
}

/* Return by value */
Point get_default_point(void) {
    static int x_counter = 0;
    static int y_counter = 100;
    x_counter++;
    y_counter--;
    return get_point(x_counter, y_counter);
}

Point get_point(int x, int y) {
    Point point = { x, y };
    printf("Returning Point    (%d, %d)\n", point.x, point.y);
    return point;
}

I won’t go into each of these functions in detail as they are fairly simple. The only interesting bit is the difference between move_point and move_point_by_ref. We’ll talk a bit later about pass-by-value and pass-by-reference semantics.

We’ll also be using a Line structure, which is composed of two Points:

/* Line.h */
typedef struct {
    Point start;
    Point end;
} Line;

/* Line.c */
void show_line(Line line) {
    printf("Line in C      is (%d, %d)->(%d, %d)\n", line.start.x, line.start.y,
            line.end.x, line.end.y);
}

void move_line_by_ref(Line *line) {
    show_line(*line);
    move_point_by_ref(&line->start);
    move_point_by_ref(&line->end);
    show_line(*line);
}

Line get_line(void) {
    Line l = { get_default_point(), get_default_point() };
    return l;
}

The Point structure and its associated functions will allow us to show how to set up and build this example and how to deal with memory references in ctypes. The Line structure will allow us to work with nested structures and the complications that arise from that.

The Makefile in the repo is set up to completely build and run the demo from scratch:

all: point line

clean:
    rm -f *.o *.so *.html _point.c _line.c Line.h.preprocessed

libpoint.so: Point.o
    gcc -shared $^ -o $@

libline.so: Point.o Line.o
    gcc -shared $^ -o $@

%.o: %.c
    gcc -c -Wall -Werror -fpic $^

point: export LD_LIBRARY_PATH = $(shell pwd)
point: libpoint.so
    ./build_point.py
    ./testPoint.py

line: export LD_LIBRARY_PATH = $(shell pwd)
line: libline.so
    # hack to get around cffi not supporting #include directives
    gcc -E Line.h > Line.h.preprocessed
    ./build_line.py
    ./testLine.py

doc:
    pandoc ctypes2.md > ctypes2.html
    firefox ctypes2.html

To build and run the demo you only need to run the following command in your shell:

$ make

‘Out-of-line’ vs ‘in-line’ interfaces

Before we dive into what the Python code looks like, let’s step back and discuss what CFFI does and some of the options you have using it. CFFI is a Python module which will read C function prototypes automatically generate some of the marshalling to and from these C functions. I’m going to quote the CFFI docs, as they describe the options much better than I could:

CFFI can be used in one of four modes: ‘ABI’ versus ‘API’ level, each with ‘in-line’ or ‘out-of-line’ preparation (or compilation).

The ABI mode accesses libraries at the binary level, whereas the faster API mode accesses them with a C compiler. This is described in detail below.

In the in-line mode, everything is set up every time you import your Python code. In the out-of-line mode, you have a separate step of preparation (and possibly C compilation) that produces a module which your main program can then import.

In this tutorial we’ll be writing an API level, out-of-line system. This means we will have to talk about some system requirements before we dive into the Python code.

Building and running CFFI-based scripts on Linux

The examples in this tutorial have been worked through on Linux Mint 18.3. They should work on most Linux systems. Windows and Mac users will need to solve similar problems, but with obviously different solutions.

To start, your system will need to have:

a C compiler (this is fairly standard on Linux distros)
make (again, this is fairly standard)
Python (the examples here were tested on 3.5.2)
CFFI module (pip install cffi)

Now, if we look at the section of the Makefile that builds and runs the tests for the Point class, we see:

point: export LD_LIBRARY_PATH = $(shell pwd)
point: libpoint.so
    ./build_point.py
    ./testPoint.py

There’s a lot going on here. The LD_LIBRARY_PATH is needed because the CFFI module is going to be loading a library we have built in the local directory. Linux will not, by default, search the current directory for shared libraries so we need to tell it to do so.

Next, we’re making point dependent on libpoint.so, which causes make to go build that library.

Once the library is built, we need to do our ‘out-of-line’ processing to build the C code to interface to our library. We’ll dive into that code in a minute.

Finally, we run our Python script which actually talks to the library and does the real work (in our case, runs tests).

Building the C Interface

As we just saw, ‘out-of-line’ processing is done to allow CFFI to use the header file from C to build an interface module.

That code looks like this:

ffi = cffi.FFI()

with open(os.path.join(os.path.dirname(__file__), "Point.h")) as f:
    ffi.cdef(f.read())

ffi.set_source("_point",
    '#include "Point.h"',
    libraries=["point"],
    library_dirs=[os.path.dirname(__file__),],
)

ffi.compile()

This code reads in the header file and passes it to a CFFI FFI module to parse. (NOTE: FFI is a library on top of which CFFI was written)

Once the FFI has the header information, we then set the source information. The first parameter to the set_source function is the name of the .c file you want it to generate. Next is the custom C source you want to insert. In our case, this custom code is simply including the Point.h file from the library we are talking to. Finally you need to tell it some information about which libraries you want it to link against.

After we’ve read in and processed the headers and set up the source file, we tell CFFI to call the compiler and build the interface module. On my system, this step produces three files:

_point.c
_point.o
_point.cpython-35m-x86_64-linux-gnu.so

The _point.c file is over 700 lines long and, like most generated code, can be difficult to read. The .o file is the output from the compiler and the .so file is the interface module we want.

Now that we’ve got the interface module, we can go ahead and write some Python to talk to our C library!

Creating simple Python classes to mirror C structures

We can build a simple Python class to wrap around the C struct we use in this library. Like our ctypes tutorials, this is fairly simple as CFFI does the data marshalling for us. To use the generated code we must first import the module that CFFI generated for us:

import _point

Then we define our class, __init__ method of which simply calls the C library to get us a point object:

class Point():
    def __init__(self, x=None, y=None):
        if x:
            self.p = _point.lib.get_point(x, y)
        else:
            self.p = _point.lib.get_default_point()

You can see that the CFFI library allows us to access the functions in the C library directly and allows us to store the struct Point that is returned. If you add a print(self.p) line to the end of the init function, you’ll see that it stores this in a named cdata object:

<cdata 'Point' owning 8 bytes>

However, that cdata 'Point' still has the x and y data members, so you can get and set those values quite easily, as you can see in the repr function for our class:

def __repr__(self):
    return '({0}, {1})'.format(self.p.x, self.p.y)

We can quite easily wrap the show_point and move_point methods in our library in class methods as well:

def show_point(self):
    _point.lib.show_point(self.p)

def move_point(self):
    _point.lib.move_point(self.p)

Passing structures by reference

When we pass values by reference in the move_point_by_ref function, we need to do a little extra work to help CFFI create an object so it can take the address of it and pass that. This requires a little code, but not much. The prototype for the C function we’re trying to call is:

void move_point_by_ref(Point *point);

To call that, we need to call the ffi.new() function with two parameters. The first is a string indicating the type of the object to be created. This type has to match a “known” type in that FFI instance. In our case, it knows about the Point type because of the call to cffi.cdef we did during our out-of-line processing. The second parameter to ffi.new() is an initial value for the object. In this case we want the created object to start with our self.p Point.

def move_point_by_ref(self):
    ppoint = _point.ffi.new("Point*", self.p)
    _point.lib.move_point_by_ref(ppoint)
    self.p = ppoint

We end by simply copying the new value from the Point* back to our self.p cdata member.

The memory created by ffi.new() will be garbaged collected for us unless we need to do something special with it (see the ffi.gc() function if you need that).

Working around some CFFI limitations

We also have a Line struct, which holds two Points. This struct, while quite simple, shows a limitation in CFFI that’s worth discussing. In the out-of-line processing script for the Point library, build_point.py, we simply read the Point.h header file directly and handed that to cffi.cdef(). This model breaks down when we get to the build_line.py script due to a limitation of CFFI. CFFI, for some quite good reasons I won’t go into here, does not allow preprocessor directives (i.e. ‘lines starting with #’). This prevents us from passing it Line.h directly as the very first line is:

#include "Point.h"

There are a couple of common solutions that I saw while researching this tutorial. One is to custom write the C header information, possibly directly into the build_line.py file. Another, which I think respects the DRY principle, is to use the C preprocessor to generate the file we read in. This shows up in the Makefile as:

line: libline.so
    # Hack to get around cffi not supporting #include directives
    gcc -E Line.h > Line.h.preprocessed
    ./build_line.py
    ./testLine.py

The gcc line runs the preprocessor on Line.h and we store the output in Line.h.preprocessed. In the build_line.py script, instead of reading from Line.h we read Line.h.preprocessed and pass that to the cffi.cdef() function instead.

Note: This trick will not always work, there are many cases where compiler-specific extensions are used in the standard headers (like “stdio.h”) which will cause cffi to fail.

The rest of the Line example follows the concepts we learned in the Point code above.

Conclusion

In this tutorial we covered some of the basics about the CFFI module and how to use it to interface native C libraries. I found several resources out there while researching. The python-cffi-example is a full on code example of using CFFI. It creates custom function prototypes rather than calling the preprocessor as we did in the last section.

If you’re interested in passing pointers through the CFFI interface, you should start by reading this section of the documentation carefully. I found it quite worthwhile.

If you’re dying to read more about why C preprocessor directives are not supported, I’d recommend starting with this thread. The description there covers the issue in some detail.

And, finally, if you’d like to see and play with the code I wrote while working on this, please visit my GitHub repository. This tutorial is in the ‘cffi’ directory.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Write More Pythonic Code by Applying the Things You Already Know

Thu, 25 Jan 2018 00:00:00 GMT

Write More Pythonic Code by Applying the Things You Already Know

There’s a mistake I frequently make when I learn new things about Python… Here’s how you can avoid this pitfall and learn something about Python’s “enumerate()” function at the same time.

When I learn a new trick for my “Python coding toolbox” I often sense some benefit right away.

It’s like I know this thing is useful for something—

And yet I’m sitting here banging my head against the table trying to find a practical application for it.

How do you take a new function or module you heard about, and turn it into a sprinkling of Pythonic fairy dust that gets you a “ooh, nice!” comment in your next code review?

The other day I got this question from newsletter reader Paul, in response to a piece I wrote about Python’s enumerate() function:

Yesterday I needed to write a dictionary that reversed the enumeration (so, {'Bob': 0}, etc). I used the length of the list plus zip in a dictionary comprehension.

Is there a more Pythonic way to do this?

To give you some more context, this is what Paul wants to do:

input = ['Duration', 'F0', 'F1', 'F2', 'F3']
output = {'Duration': 0, 'F0': 1, 'F1': 2, 'F2': 3, 'F3': 4}

The goal is to create a dictionary that maps each item in the input list to the item’s index in that very list. This dictionary can then be used to look up indices using items as keys.

Here’s how he implemented this transformation:

>>> {f:i for f, i in zip(input, range(len(input)))}
{'Duration': 0, 'F0': 1, 'F1': 2, 'F2': 3, 'F3': 4}

So far so good—but as Paul suspects, we can clean this up some more.

This is exactly the kind of situation I find myself in all the time. Paul knows intuitively that there’s a way to make his code more Pythonic with the enumerate() built-in…

But how should he put it in practice?

My first thought was that we could shorten this code a bit by avoiding the dict comprehension:

>>> dict(zip(input, range(len(input))))
{'Duration': 0, 'F0': 1, 'F1': 2, 'F2': 3, 'F3': 4}

That’s slightly cleaner (because it has less visual noise), but just like Paul I’m still not very fond of that range(len(...)) construct.

Let’s try playing with enumerate():

>>> list(enumerate(input))
[(0, 'Duration'), (1, 'F0'), (2, 'F1'), (3, 'F2'), (4, 'F3')]

Okay, so I can use enumerate to pair each input key with its index in the list. Let’s turn that into a dictionary:

>>> dict(enumerate(input))
{0: 'Duration', 1: 'F0', 2: 'F1', 3: 'F2', 4: 'F3'}

We’re so close! This is basically what we want, but “in the wrong direction.” Instead of mapping keys to indices it’s mapping the index to the key.

How can we reverse it?

Let’s bring back the dict comprehension:

>>> {f: i for i, f in enumerate(input)}
{'Duration': 0, 'F0': 1, 'F1': 2, 'F2': 3, 'F3': 4}

And there you go, that’s it! It’s a beauty!

Now, what’s the takeaway from all of this?

With this sort of thing, it often pays off to go with your gut.

You see, Paul was right all along. There really was a way this code could be cleaned up by using enumerate(). It was just a little unclear of how it would work specifically.

So, when you find yourself in the same situation, keep digging!

Python is an excellent programming language to do this sort of hands-on experimentation with: When I sat down to reply to Paul’s email, the first thing I did was to fire up a Python interpreter session for some explorative code golf.

You can’t really do this with a compiled language like C++. And it’s one of Python’s great features that you should use to your advantage.

That “ooh, nice!” code review comment is waiting for you.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Working With File I/O in Python

Tue, 16 Jan 2018 00:00:00 GMT

Working With File I/O in Python

Learn the basics of working with files in Python. How to read from files, how to write data to them, what file seeks are, and why files should be closed.

In this tutorial you’ll learn how to work with files using Python.

Reading and writing to files in any programming language is an important feature. Without it, all variables and information are stored on volatile memory that is lost when the computer is shut down or the program ends. When you save data to a permanent file, you can retrieve it at a later date without worry.

Here’s what we’ll cover:

The difference between binary and text files
Where to find Python’s built-in file I/O functions and tools
How to open and close files in Python
The various ways to read data from a file in Python
How to write data to a file object in Python
File seeks in Python and moving the read/write pointer
Editing an existing text file with Python

Let’s get started!

Binary vs Text Files in Python

There are two separate types of files that Python handles: binary and text files. Knowing the difference between the two is important because of how they are handled.

Most files that you use during your normal computer use are actually binary files, not text. That’s right, that Microsoft Word .doc file is actually a binary file, even if it just has text in it. Other examples of binary files include:

Image files including .jpg, .png, .bmp, .gif, etc.
Database files including .mdb, .frm, and .sqlite
Documents including .doc, .xls, .pdf, and others.

That’s because these files all have requirements for special handling and require a specific type of software to open it. For example, you need Excel to open an .xls file, and a database program to open a .sqlite file.

A text file on the other hand, has no specific encoding and can be opened by a standard text editor without any special handling. Still, every text file must adhere to a set of rules:

Text files have to be readable as is. They can (and often do) contain a lot of special encoding, especially in HTML or other markup languages, but you’ll still be able to tell what it says
Data in a text file is organized by lines. In most cases, each line is a distinct element, whether it’s a line of instruction or a command.

Additionally, text files all have an unseen character at the end of each line which lets the text editor know that there should be a new line. When interacting with these files through programming, you can take advantage of that character. In Python, it is denoted by the “\n”.

Where to Find Python’s File I/O Tools

When working in Python, you don’t have to worry about importing any specific external libraries to work with files. Python comes with “batteries included” and the file I/O tools and utilties are a built-in part of the core language.

In other languages like C++, to work with files you have to enable the file I/O tools by including the correct header file, for example #include <fstream>. And if you are coding in Java, you need the import java.io.* statement.

With Python, this isn’t necessary—

Instead, Python has a built in set of functions that handle everything you need to read and write to files. We’ll now take a closer look at them.

Opening a File in Python

The first function that you need to know is open(). In both Python 2 and Python 3, this command will return a file object as specified in the parameters. The basic function usage for open() is the following:

file_object = open(filename, mode)

In this instance, filename is the name of the file that you want to interact with, with the file extension included. That is, if you have a text file that is workData.txt, your filename is not just "workData". It’s "workData.txt".

You can also specify the exact path that the file is located at, such as “C:\ThisFolder\workData.txt”, if you’re using Windows.

Remember, however, that a single backslash in a string indicates to Python the beginning of a string literal. So there’s a problem here, because these two meanings will conflict…

Thankfully, Python has two ways to deal with this. The first is to use double backslashes like so: "C:\\ThisFolder\\workData.txt". The second is to use forward slashes: "C:/ThisFolder/workData.txt".

The mode in the open function tells Python what you want to do with the file. There are multiple modes that you can specify when dealing with text files.

'w' – Write Mode: This mode is used when the file needs to be altered and information changed or added. Keep in mind that this erases the existing file to create a new one. File pointer is placed at the beginning of the file.
'r' – Read Mode: This mode is used when the information in the file is only meant to be read and not changed inany way. File pointer is placed at the beginning of the file.
'a' – Append Mode: This mode adds information to the end of the file automatically. File pointer is placed at the end of the file.
'r+' – Read/Write Mode: This is used when you will be making changes to the file and reading information from it. The file pointer is placed at the beginning of the file.
'a+' – Append and Read Mode: A file is opened to allow data to be added to the end of the file and lets your program read information as well. File pointer is placed at the end of the file.

When you are using binary files, you will use the same mode specifiers. However, you add a b to the end. So a write mode specifier for a binary file is 'wb'. The others are 'rb', 'ab', 'r+b', and 'a+b' respectively.

In Python 3, there is one new mode that was added:

'x' – Exclusive Creation Mode: This mode is used exclusively to create a file. If a file of the same name already exists, the function call will fail.

Let’s go through an example of how to open a file and setting the access mode.

When using the open() function, you’d typically assign its result to variable. Given a file named workData.txt, the proper code to open the file for reading and writing would be the following:

data_file = open("workData.txt", "r+")

This creates an object called data_file that we can then manipulate using Pythons File Object Methods.

We used the 'r+' access mode in this code example which tells Python that we want to open the file for reading and writing. This gives us a lot of flexibility, but often you might want to restrict your program to just reading or just writing to a file and this is where the other modes come in handy.

Closing a File in Python

Knowing how to close a file is important when you’re reading and writing.

It frees up system resources that your program is using for I/O purposes. When writing a program that has space or memory constraints, this lets you manage your resources effectively.

Also, closing a file ensures that any pending data is written out to the underlying storage system, for example, your local disk drive. By explicitly closing the file you ensure that any buffered data held in memory is flushed out and written to the file.

The function to close a file in Python is simply fileobject.close(). Using the data_file file object that we created in the previous example, the command to close it would be:

data_file.close()

After you close a file, you can’t access it any longer until you reopen it at a later date. Attempting to read from or write to a closed file object will throw a ValueError exception:

>>> f = open("/tmp/myfile.txt", "w")
>>> f.close()
>>> f.read()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    f.read()
ValueError: I/O operation on closed file.

In Python, the best practice for opening and closing files uses the with keyword. This keyword closes the file automatically after the nested code block completes:

with open("workData.txt", "r+") as workData:
    # File object is now open.
    # Do stuff with the file:
    workData.read()

# File object is now closed.
# Do other things...

If you don’t use the with keyword or use the fileobject.close() function then Python will automatically close and destroy the file object through the built in garbage collector. However, depending on your code, this garbage collection can happen at any time.

So it’s recommended to use the with keyword in order to control when the file will be closed—namely after the inner code block finishes executing.

Working With Python File Objects

Once you’ve successfully opened a file, you can use built-in methods to deal with the new file object. You can read data from it, or write new data to it. There are also other operations like moving the “read/write pointer”, which determines where in the file data is read from and where it is written to. We’ll take a look at that a little later in the tutorial.

Next up you’ll learn how to read data from a file you’ve opened:

Reading Data From a File in Python

Reading a file’s contents uses the fileobject.read(size) method. By default, this method will read the entire file and print it out to the console as either a string (in text mode) or as byte objects (in binary mode).

You have to be careful when using the default size, however. If the file you’re reading is larger than your available memory, you won’t be able to access the entire file all at once. In a case like this, you need to use the size parameter to break it up into chunks your memory can handle.

The size parameter tells the read method how many bytes into the file to return to the display. So let’s assume that our “workData.txt” file has the following text in it:

This data is on line 1
This data is on line 2
This data is on line 3

Then if you wrote the following program in Python 3:

with open("workData.txt", "r+") as work_data:
    print("This is the file name: ", work_data.name)
    line = work_data.read()
    print(line)

You’ll get this output:

This is the file name: workData.txt
This data is on line 1
This data is on line 2
This data is on line 3

On the other hand, if you tweak the third line to say:

line = workData.read(6)

You’ll get the following output:

This is the file name: workData.txt
This d

As you can see, the read operation only read the data in the file up to position 6, which is what we passed to the read() call above. That way you can limit how much data is read from a file in one go.

If you read from the same file object again, it will continue reading data where you left off. That way you can process a large file in several smaller “chunks.”

Reading Text Files Line-by-Line With `readline()`

You can also parse data in a file by reading it line by line. This can let you scan an entire file line by line, advancing only when you want to, or let you see a specific line.

The fileobject.readline(size) method defaults to returning the first line of the file. But by changing the integer size parameter, you can get any line in your file you need.

For example:

with open("workData.txt", "r+") as work_data:
     print("This is the file name: ", work_data.name)
     line_data = work_data.readline()
     print(line_data)

This would return the output of:

This is the file name:  workData.txt
This data is on line 1

You can call readline() repeatedly to read additional lines of text from the file.

A similar method is the fileobject.readlines() call (notice the plural), which returns a list of all lines in the file. If you did a call of:

print(work_data.readlines())

You would get the following output:

['This data is on line 1', 'This data is on line 2', 'This data is on line 3']

As you can see, this reads the whole file into memory and splits it up into several lines. This only works with text files however. A binary file is just a blob of data—it doesn’t really have a concept of what a single line is.

Processing an Entire Text File Line-By-Line

The easiest way to process an entire text file line-by-line in Python is by using a simple loop:

with open("workData.txt", "r+") as work_data:
    for line in work_data:
        print(line)

This has the following output:

This data is on line 1
This data is on line 2
This data is on line 3

This approach is very memory-efficient, because we’ll be reading and processing each line individually. This means our program never needs to read the whole file into memory at once. Thus, using readline() is a comfortable and efficient way to process a big text file in smaller chunks.

Writing to a File With Python Using `write()`

Files wouldn’t be any good if you couldn’t write data to them. So let’s discuss that.

Remember that when you create a new file object, Python will create the file if one doesn’t already exist. When creating a file for the first time, you should either use the a+ or w+ modes.

Often it’s preferable to use the a+ mode because the data will default to be added to the end of the file. Using w+ will clear out any existing data in the file and give you a “blank slate” to start from.

The default method of writing to a file in Python is using fileobject.write(data). For example, you could add a new line to our “workData.txt” file by using the following code:

work_data.write("This data is on line 4\n")

The \n acts as the new line indicator, moving subsequent writes to the next line.

If you want to write something that isn’t a string to a text file, such as a series of numbers, you have to convert or “cast” them to strings, using conversion code.

For example, if you wanted to add the integers 1234, 5678, 9012 to the work_data file, you’d do the following. First, you cast your non-strings as a string, then you write that string to your file object:

values = [1234, 5678, 9012]

with open("workData.txt", "a+") as work_data:
    for value in values:
        str_value = str(value)
        work_data.write(str_value)
        work_data.write("\n")

File Seeks: Moving the Read/Write Pointer

Remember that when you write using the a+ mode, your file pointer is always going to be at the end of the file. So taking the above code where we’ve written the two numbers, if you use the fileobject.write() method, you’re not going to get anything in return. That’s because that method is looking after the pointer to find additional text.

What you need to do then, is move the pointer back to the beginning of the file. The easiest way to do this is to use the fileobject.seek(offset, from_what) method. In this method, you put the pointer at a specific spot.

The offset is the number of characters from the from_what parameter. The from_what parameter has three possible values:

0 – indicates the beginning of the file
1 – indicates the current pointer position
2 – indicates the end of the file

When you’re working with text files (those that have been opened without a b in the mode), you can only use the default 0, or a seek(0, 2), which will take you to the end of the file.

So by using work_data.seek(3, 0) on our “workData.txt” file, you will place the pointer at the 4th character (remember that Python starts counts at 0). If you use the line print loop, you would then get an output of:

s data is on line 1
This data is on line 2
This data is on line 3

If you want to check the current position of the pointer, you can use the fileobject.tell() method, which returns a decimal value for where the pointer is at in the current file. If we want to find how long our current work_data file is, we can use the following code:

with open("workData.txt", "a+") as work_data:
    print(work_data.tell())

This will give a return value of 69, which is the size of the file.

Editing an Existing Text File with Python

There will come a time when you need to edit an existing file rather than just append data to it. You can’t just use w+ mode to do it. Remember that mode w will completely overwrite the file, so even with using fileobject.seek(), you won’t be able to do it. And a+ will always insert any data at the end of the file.

The easiest way to do it involves pulling the entire file out and creating a list or array data type with it. Once the list is created, you can use the list.insert(i, x) method to insert your new data. Once the new list is created, you can then join it back together and write it back to your file.

Remember that for list.insert(i, x), i is an integer that indicates the cell number. The data of x then is placed before the cell in the list indicated by i.

For example, using our “workData.txt” file, let’s say we needed to insert the text line, “This goes between line 1 and 2” in between the first and second lines. The code to do it is:

# Open the file as read-only
with open("workData.txt", "r") as work_data:
    work_data_contents = work_data.readlines()

work_data_contents.insert(1, "This goes between line 1 and 2\n")

# Re-open in write-only format to overwrite old file
with open("workData.txt", "w") as work_data:
    work_dataContents = "".join(work_data_contents)
    work_data.write(work_data_contents)

Once this code runs, if you do the following:

with open("workData.txt", "r") as work_data:
    for line in work_data:
        print(line)

You’ll get an output of:

This data is on line 1
This goes between line 1 and 2
This data is on line 2
This data is on line 3

This demonstrated how to edit an existing text file in Python, inserting a new line of text at exactly the place you wanted.

Python File I/O – Additional Resources

In this tutorial you learned the basics of file handling in Python. Here’s the range of topics we covered:

The difference between binary and text files
Where to find Python’s built-in file I/O functions and tools
How to open and close files in Python
The various ways to read data from a file in Python
How to write data to a file object in Python
File seeks in Python and moving the read/write pointer
Editing an existing text file with Python

But really, we’ve only scratched the surface here. As with anything programming-related, there’s lots more to learn…

So I wanted to give you a few additional resources you can use to deepen your Python file-handling skills:

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to Reverse a String in Python

Tue, 09 Jan 2018 00:00:00 GMT

How to Reverse a String in Python

An overview of the three main ways to reverse a Python string: “slicing”, reverse iteration, and the classic in-place reversal algorithm. Also includes performance benchmarks.

What’s the best way to reverse a string in Python? Granted, string reversal isn’t used all that often in day-to-day programming, but it’s a popular interviewing question:

# You have this:
'TURBO'

# And you want that:
'OBRUT'

One variation of this question is to write a function that checks whether a given string is a palindrome, that is, whether or not it reads the same forwards and backwards:

def is_palindrome(string):
    reversed_string = # ???
    return string == reversed_string

>>> is_palindrome('TACOCAT')
True
>>> is_palindrome('TURBO')
False

Clearly, we’ll need to figure out how to reverse a string to implement this is_palindrome function in Python…so how do you do it?

Python’s str string objects have no built-in .reverse() method like you might expect if you’re coming to Python from a different language, like Java or C#—the following approach will fail:

>>> 'TURBO'.reverse()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'reverse'

In this tutorial you’ll learn the three major ways to reverse strings in Python:

Option 1: Reversing a Python String With the “`[::-1]`” Slicing Trick

Strings follow the sequence protocol in Python. And all sequences support an interesting feature called slicing. You can view slicing as an extension of the square-brackets indexing syntax.

It includes a special case where slicing a sequence with “[::-1]” produces a reversed copy. Because Python strings are sequences this is a quick and easy way to get a reversed copy of a string:

>>> 'TURBO'[::-1]
'OBRUT'

Of course, you can wrap this (slightly unwieldy) slicing expression into a named function to make it more obvious what the code does:

def reverse_string1(s):
    """Return a reversed copy of `s`"""
    return s[::-1]

>>> reverse_string1('TURBO')
'OBRUT'

So, how do you like this solution?

It’s short and sweet—but, in my mind, the biggest downside to reversing a string with the slicing syntax is that it uses an advanced Python feature that some developers would say is “arcane.”

I don’t blame them—list slicing can be difficult to understand the first time you encounter its quirky and terse syntax.

When I’m reading Python code that makes use of slicing I often have to slow down and concentrate to “mentally parse” the statement, to make sure I understand what’s going on.

My biggest gripe here is that the “[::-1]” slicing syntax doesn’t communicate clearly enough that it creates a reversed copy of the original string.

For this reason I feel like using Python’s slicing feature to reverse a string is a decent solution, but it can be a difficult to read to the uninitiated.

[ You can learn more about slicing in this article. ]

Moving on…

Option 2: Reversing a Python String Using `reversed()` and `str.join()`

Reversing a string using reverse iteration with the reversed() built-in is another option. You get a reverse iterator you can use to cycle through the elements in the string in reverse order:

>>> for elem in reversed('TURBO'):
...     print(elem)
O
B
R
U
T

Using reversed() does not modify the original string (which wouldn’t work anyway as strings are immutable in Python.) What happens is that you get a “view” into the existing string you can use to look at all the elements in reverse order.

This is a powerful technique that takes advantage of Python’s iterator protocol.

So far, all you saw was how to iterate over the characters of a string in reversed order. But how can you use this technique to create a reversed copy of a Python string with the reversed() function?

Here’s how:

>>> ''.join(reversed('TURBO'))
'OBRUT'

This code snippet used the .join() method to merge all of the characters resulting from the reversed iteration into a new string. Pretty neat, eh?

Of course, you can once again extract this code into a separate function to create a proper “reverse string” function in Python. Here’s how:

def reverse_string2(s):
    """Return a reversed copy of `s`"""
    return "".join(reversed(s))

>>> reverse_string2('TURBO')
'OBRUT'

I really like this reverse iterator approach for reversing strings in Python.

It communicates clearly what is going on, and even someone new to the language would intuitively understand that I’m creating a reversed copy of the original string.

And while understanding how iterators work at a deeper level is helpful, it’s not absolutely necessary to use this technique.

One more approach you should check out:

Option 3: The “Classic” In-Place String Reversal Algorithm Ported to Python

This is the “classic” textbook in-place string reversal algorithm, ported to Python. Because Python strings are immutable, you first need to convert the input string into a mutable list of characters, so you can perform the in-place character swap:

def reverse_string3(s):
    """Return a reversed copy of `s`"""
    chars = list(s)
    for i in range(len(s) // 2):
        tmp = chars[i]
        chars[i] = chars[len(s) - i - 1]
        chars[len(s) - i - 1] = tmp
    return ''.join(chars)

>>> reverse_string3('TURBO')
'OBRUT'

As you can tell, this solution is quite unpythonic and not very idiomatic at all. It doesn’t play to Python’s strengths and it’s basically a straight port of a C algorithm.

And if that wasn’t enough—it’s also the slowest solution, as you’ll see in the next section where I’ll do some benchmarking on these three implementations.

Performance Comparison

After implementing the string reversal approaches I showed you in this tutorial I became curious about what their relative performance would be.

So I set out to do some benchmarking:

>>> import timeit
>>> s = 'abcdefghijklmnopqrstuvwxyz' * 10

>>> timeit.repeat(lambda: reverse_string1(s))
[0.6848115339962533, 0.7366074129968183, 0.7358982900041156]

>>> timeit.repeat(lambda: reverse_string2(s))
[5.514941683999496, 5.339547180992668, 5.319950777004124]

>>> timeit.repeat(lambda: reverse_string3(s))
[48.74324739299482, 48.637329410004895, 49.223478018000606]

Well, that’s interesting…Here are the results in table form:

Algorithm	Execution Time	Slowdown
Slicing	0.72s	1x
reversed + join	5.39s	7.5x
Classic / In-Place	48.87s	67.9x

As you can see, there’s a massive performance gap between these three implementations.

Slicing is the fastest approach, reversed() is 8x slower than slicing, and the “classic” in-place algorithm is a whopping 71x slower in this benchmark!

Now, the in-place swap could definitely be optimized (leave a comment below with your improved solution if you want)—but this performance comparison still gives us a decent idea of which string reversal operation is the fastest in Python.

Summary: Reversing Strings in Python

String reversal is a standard operation in programming (and in coding interviews.) In this tutorial you learned about three different approaches for reversing a string in Python.

Let’s do a quick recap on each before I’ll give you my recommendation on which option I personally prefer:

Option 1: List Slicing Trick

You can use Python’s slicing syntax to create a reversed copy of a string. This works well, however it is slightly arcane and therefore not very Pythonic, in my opinion.

>>> 'TURBO'[::-1]
'OBRUT'

Creates a reversed copy of the string
This is the fastest way to reverse a string in Python

Option 2: reversed() and str.join()

The built-in reversed() function allows you to create a reverse iterator for a Python string (or any sequence object.) This is a flexible and clean solution that relies on some advanced Python features—but it remains readable due to the clear naming of the reversed() function.

>>> ''.join(reversed('TURBO'))
'OBRUT'

reversed() returns an iterator that iterates over the characters in the string in reverse order
This character stream needs to be combined into a string again with the str.join() function
This is slower than slicing, but arguably more readable

Option 3: “Roll Your Own” String Reversal

Taking the standard “in-place” character swap algorithm and port it to Python works, but it offers inferior performance and readability compared to the other options.

def reverse_string(s):
    """Return a reversed copy of `s`"""
    chars = list(s)
    for i in range(len(s) // 2):
        tmp = chars[i]
        chars[i] = chars[len(s) - i - 1]
        chars[len(s) - i - 1] = tmp
    return ''.join(chars)

>>> reverse_string('TURBO')
'OBRUT'

This is vastly slower than slicing or reverse iteration (depending on the implementation)
Rolling your own string reversal algorithm is not recommended, unless you’re in a coding interview situation

If you’re wondering what the best way is to reverse a string in Python my answer is: “It depends.” Personally, I like the reversed() approach because it’s “self-documenting” and reasonably fast.

However, there’s an argument to be made that the eight times faster slicing approach should be used for reasons of performance…

Depending on your use case, this might well be. And yet, this is also the perfect time for me to whip out Donald Knuth’s immortal quote:

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered.

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Yet we should not pass up our opportunities in that critical 3%.”

— Donald Knuth (Source)

For this reason, I wouldn’t start worrying about string reversal performance in your programs—unless it’s an integral part of what your software does. If you’re reversing millions of strings in a tight loop, by all means, do optimize for speed.

But for the average Python application it won’t make a perceptible difference. So I would go with the most readable (and therefore maintainable) approach.

In my books this is option 2: reversed() + join().

If you’d like to dig deeper into the subject, be sure to watch my YouTube tutorial on list reversal in Python. Also, you’re welcome to leave a comment below to let me know about your favorite string reversal techniques. Happy Pythoning!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Mastering Click: Writing Advanced Python Command-Line Apps

Tue, 02 Jan 2018 00:00:00 GMT

Mastering Click: Writing Advanced Python Command-Line Apps

How to improve your existing Click Python CLIs with advanced features like sub-commands, user input, parameter types, contexts, and more.

Welcome to the second Click tutorial on how to improve your command-line tools and Python scripts. I’ll show you some more advanced features that help you when things are getting a bit more complex and feature rich in you scripts.

You might wonder why I suggest using Click over argparse or optparse. I don’t think they are bad tools, they both have their place and being part of the standard library gives them a great advantage. However, I do think that Click is much more intuitive and requires less boilerplate code to write clean and easy-to-use command-line clients.

I go into more details about that in the first tutorial and give you a comprehensive introduction to Click as well. I also recommend you to take a look at that if this is the first time you hear the name “Click” so you know the basics. I’ll wait here for you.

Now that we are all starting from a similar knowledge level, let’s grab a cup of tea, glass of water or whatever it is that makes you a happy coder and learner ✨. And then we’ll dive into discovering:

how you can read parameter values from environment variables,
we’ll then separate functionality into multiple sub-commands
and get the user to provide some input data on the command-line.
We’ll learn what parameter types are and how you can use them
and we’ll look at contexts in Click to share data between commands.

Sounds great? Let’s get right to it then.

Building on our existing Python command-line app

We’ll continue building on top of the example that I introduced in the previous tutorial. Together, we built a simple command-line tool that interacted with the OpenWeatherMap API.

It would print the current weather for a location provided as an argument. Here’s an example:

$ python cli.py --api-key <your-api-key> London
The weather in London right now: light intensity drizzle.

You can see the full source code on Github. As a little reminder, here’s what our final command-line tool looked like:

@click.command()
@click.argument('location')
@click.option(
    '--api-key', '-a',
    help='your API key for the OpenWeatherMap API',
)
def main(location, api_key):
    """
    A little weather tool that shows you the current weather in a LOCATION of
    your choice. Provide the city name and optionally a two-digit country code.
    Here are two examples:
    1. London,UK
    2. Canmore
    You need a valid API key from OpenWeatherMap for the tool to work. You can
    sign up for a free account at https://openweathermap.org/appid.
    """
    weather = current_weather(location, api_key)
    print(f"The weather in {location} right now: {weather}.")


if __name__ == "__main__":
    main()

In this tutorial, we’ll extend the existing tool by adding functionality to store data in a configuration file. You’ll also learn multiple ways to validate user input in your Python command-line apps.

Storing the API key in an environment variable

In the example, we have to specify the API key every time we are calling the command-line tool to access the underlying Web API. That can be pretty annoying. Let’s consider a few options that we have to improve how our tool handles this.

One of the first things that comes to mind is storing the API key in an environment variable in a 12-factor style.

$ export API_KEY="your-api-key"

We can then extract the API key from that variable in Python using os.getenv. Try it out yourself:

>>> import os
>>> api_key = os.getenv("API_KEY")
>>> print(api_key)
your-api-key

This works totally fine but it means that we have to manually integrate it with the Click parameter that we already have. Luckily, Click already allows us to provide parameter values as environment variables. We can use envvar in our parameter declaration:

@click.option(
    '--api-key', '-a',
    envvar="API_KEY",
)

That’s all! Click will now use the API key stored in an environment variable called API_KEY and fall back to the --api-key option if the variable is not defined. And since examples speak louder than words, here’s how you’d use the command with an environment variable:

$ export API_KEY="<your-api-key>"
$ python cli.py London
The weather in London right now: light intensity drizzle.

But you can still use the --api-key option with an API key as well:

$ python cli.py --api-key <your-api-key> London
The weather in London right now: light intensity drizzle.

You’re probably wondering about what happens when you have the environment variable defined and also add the option when running the weather tool. The answer is simple: the option beats environment variable.

We have now simplified running our weather command with just adding a single line of code.

Separating functionality into sub-commands

I am sure you agree that we can do better. If you’ve worked with a command-line tool like docker or heroku, you are familiar with how they manage a large set of functionality and handle user authentication.

Let’s take a look at the Heroku Toolbelt. It provides a --help option for more details:

$ heroku --help
Usage: heroku COMMAND

Help topics, type heroku help TOPIC for more details:

 access          manage user access to apps
 addons          tools and services for developing, extending, and operating your app
 apps            manage apps
 auth            heroku authentication
 authorizations  OAuth authorizations
 ... # there's more but we don't care for now

They use a mandatory argument as a new command (also called sub-command) that provides a specific functionality. For example heroku login will authenticate you and store a token in a configuration file if the login is successful.

Wouldn’t it be nice if we could do the same for our weather command? Well, we can! And you’ll see how easy it is as well.

We can use Click’s Commands and Groups to implement our own version of this. And trust me, it sounds more complicated than it actually is.

Let’s start with looking at our weather command and defining the command that we’d like to have. We’ll move the existing functionality into a command and name it current (for the current weather). We’d now run it like this:

$ python cli.py current London
The weather in London right now: light intensity drizzle.

So how can we do this? We start by creating a new entry point for our weather command and registering it as a group:

@click.group()
def main():
   pass

We have now turned our main function into a command group object that we can use to register new commands “below” it. What that means is, that we change our @click.command decorator to @main.command when wrapping our weather function. We’ll also have to rename the function from main to the name we want to give our command. What we end up with is this:

@main.command()
@click.argument('location')
@click.option(
    '--api-key', '-a',
    help='your API key for the OpenWeatherMap API',
)
def current(location, api_key):
    ...

And I’m sure you’ve already guessed it, this means we know run our command like this:

$ python cli.py current London
The weather in London right now: light intensity drizzle.

Storing the API key in a configuration file using another sub-command

The change we made above obviously doesn’t make sense on its own. What we wanted to add is a way to store an API key in a configuration file, using a separate command. I suggest we call it config and make it ask the user to enter their API key:

$ python cli.py config
Please enter your API key []: your-api-key

We’ll then store the key in a config file that we’ll put into the user’s home directory: e.g. $HOME/.weather.cfg for UNIX-based systems.

$ cat ~/.weather.cfg
your-api-key

We start with adding a new function to our Python module with the same name as our command and register it with our main command group:

@main.command()
def config():
    """
    Store configuration values in a file.
    """
    print("I handle the configuration.")

You can now run that new command and it will print the statement above.

$ python cli.py config
I handle the configuration.

Boom, we’ve now extended our weather tool with two separate commands:

$ python cli.py --help
<NEED CORRECT OUTPUT>

Asking the user for command-line input

We created a new command but it doesn’t to anything, yet. What we need is the API key from the user, so we can store it in our config file. Let’s start using the --api-key option on our config command and write it to the configuration file.

@main.command()
@click.option(
    '--api-key', '-a',
    help='your API key for the OpenWeatherMap API',
)
def config(api_key):
    """
    Store configuration values in a file.
    """
    config_file = os.path.expanduser('~/.weather.cfg')

    with open(config_file, 'w') as cfg:
        cfg.write(api_key)

We are now storing the API key provided by the user in our config file. But how can we ask the user for their API key like I showed you above? By using the aptly named click.prompt.

@click.option(
    '--api-key', '-a',
    help='your API key for the OpenWeatherMap API',
)
def config(api_key):
    """
    Store configuration values in a file.
    """
    config_file = os.path.expanduser('~/.weather.cfg')

    api_key = click.prompt(
        "Please enter your API key",
        default=api_key
    )

    with open(config_file, 'w') as cfg:
        cfg.write(api_key)

Isn’t it amazing how simple that was? This is all we need to have our config command print out the question asking the user for their API key and receiving it as the value of api_key when the user hits [Enter].

We also continue to allow the --api-key option and use it as the default value for the prompt which means the user can simply hit [Enter] to confirm it:

$ python cli.py config --api-key your-api-key
Please enter your API key [your-api-key]:

That’s a lot of new functionality but the code required is minimal. I’m sure you agree that this is awesome!

Introducing Click’s parameter types

Until now, we’ve basically ignored what kind of input we receive from a user. By default, Click assumes a string and doesn’t really care about anything beyond that. That makes it simple but also means we can get a lot of 🚮.

You probably guessed it, Click also has a solution for that. Actually there are multiple ways of handling input but we’ll be looking at Parameter Types for now.

The name gives a pretty good clue at what it does, it allows us to define a the type of our parameters. The most obvious ones are the builtin Python types such as str, int, float but Click also provides additional types: Path, File and more. The complete list is available in the section on Parameter Types.

Ensuring that an input value is of a specific type is as easy as you can make it. You simply pass the parameter type you’re expecting to the decorator as type argument when defining your parameter. Something like this:

@click.option('--api-key', '-a', type=str)
@click.option('--config-file', '-c', type=click.Path())

Looking at our API key, we expect a string of 32 hexadecimal characters. Take a moment to look at this Wikipedia article if that doesn’t mean anything to you or believe me when I say it means each character is a number between 0 and 9 or a letter between a and f.

There’s a parameter type for that, you ask? No, there is not. We’ll have to build our own. And like everything else, it’ll be super easy (I feel like a broken record by now 😇).

Building a custom parameter type to validate user input

What do we need implement our own parameter type? We have to do two things: (1) we define a new Python class derived from click.ParamType and (2) implement it’s convert method. Classes and inheritance might be a new thing for you, so make sure you understand the benefits of using classes and are familiar with Object-Oriented Programming.

Back to implementing our own parameter type. Let’s call it ApiKey and start with the basic boilerplate:

class ApiKey(click.ParamType):

    def convert(self, value, param, ctx):
        return value

The only thing that should need some more explanation is the list of arguments expected by the convert method. Why are there three of them (in addition to self) and where do they come from?

When we use our ApiKey as the type for our parameter, Click will call the convert method on it and pass the user’s input as the value argument. param will contain the parameter that we declared using the click.option or click.argument decorators. And finally, ctx refers to the context of the command which is something that we’ll be talking about later in this tutorial.

The last thing to note is the return value. Click expects us to either return the cleaned and validated value for the parameter or raise an exception if the value is not valid. If we raise an exception, Click will automatically abort and tell the user that their value is not of the correct type. Sweet, right?

That’s been a lot of talk and no code, so let’s stop here, take a deep breath and look at the implementation.

import re

class ApiKey(click.ParamType):
    name = 'api-key'

    def convert(self, value, param, ctx):
        found = re.match(r'[0-9a-f]{32}', value)

        if not found:
            self.fail(
                f'{value} is not a 32-character hexadecimal string',
                param,
                ctx,
            )

        return value

You can see that we’re only interested in the value of our parameter. We use a regular expression to check for a string of 32 hexadecimal characters. I won’t go into details on regular expressions here but Al Sweigart does in this PyCon video.

Applying a re.match will return a match object for a perfect match or None otherwise. We check if they match and return the unchanged value or call the fail() method provided by Click to explain why the value is incorrect.

Almost done. All we have to do now is plug this new parameter type into our existing config command.

@main.command()
@click.option(
    '--api-key', '-a',
    type=ApiKey(),
    help='your API key for the OpenWeatherMap API',
)
def config(api_key):
    ...

And we are done! A user will now get an error if their API key is in the wrong format and we can put an end to those sleepless nights 🤣.

$ python cli.py config --api-key invalid
Usage: cli.py [OPTIONS] COMMAND [ARGS]...

Error: Invalid value for "--api-key" / "-a": your-api-key is not a 32-character hexadecimal string

I’ve thrown a lot of information at you. I have one more thing that I’d like to show you before we end this tutorial. But if you need a quick break, go get yourself a delicious beverage, hot or cold, and continue reading when you feel refreshed. I’ll go get myself a ☕️ and be right back…

Using the Click context to pass parameters between commands

Alright, welcome back 😉. You probably thought about the command we created, our new API key option and wondered if this means we actually have to define the option on both of our commands, config and current. And your assumption would be correct. Before your eyes pop out and you shout at me “Hell no! I like my code DRY!”, there’s a better way to do this. And if DRY doesn’t mean anything to you, check out this Wikipedia arcticle on the “Don’t Repeat Yourself” principle.

How can we avoid defining the same option on both commands? We use a feature called the “Context”. Click executes every command within a context that carries the definition of the command as well as the input provided by the user. And it comes with a placeholder object called obj, that we can use to pass arbitrary data around between commands.

First let’s look at our group and how we can get access to the context of our main entrypoint:

@click.group()
@click.pass_context
def main(ctx):
   ctx.obj = {}

What we are doing here is telling Click that we want access to the context of the command (or group) and Click will pass it to our function as the first argument, I called it ctx. In the function itself, we can now set the obj attribute on the context to an empty dictionary that we can then fill with data. obj can also be an instance of a custom class that we implement but let’s keep it simple. You can imagine how flexible this is. The only thing you can’t do, is assign your data to anything but ctx.obj.

Now that we have access to the context, we can move our option --api-key to the main function and then save then store the API key in the context:

@click.group()
@click.option(
    '--api-key', '-a',
    type=ApiKey(),
    help='your API key for the OpenWeatherMap API',
)
@click.pass_context
def main(ctx, api_key):
    ctx.obj = {
        'api_key': api_key,
    }

I should mention that it doesn’t matter where you put the click.pass_context decorator, the context will always be the first argument. And with the API key stored in the context, we can now get access to it in both of our commands by adding the pass_context decorator as well:

@main.command()
@click.pass_context
def config(ctx):
    api_key = ctx.obj['api_key']
    ...

The only thing this changes for the user, is that the --api-key option has to come before the config or current commands. Why? Because the option is no associated with the main entry point and not with the sub-commands:

$ python cli.py --api-key your-api-key current Canmore
The weather in Canmore right now: overcast clouds.

I think that’s a small price to pay for keeping our code DRY. And even if you disagree with me, you still learned how the Click context can be used for sharing data between commands; that’s all I wanted anyways 😇.

Advanced Python CLIs with Click — Summary

Wow, we work though a lot of topics. You should have an even better knowledge of Click and it features now. Specifically we looked at:

How to read parameter values from environment variables.
How you can separate functionality into separate commands.
How to ask the user for input on the command-line.
What parameter types are in Click and how you can use them for input validation.
How Click contexts can help you share data between commands.

I am tempted to call you a Master of Click 🏆 with all of the knowledge you have now. At this point, there should be little that you don’t know how to do. So start playing around with what you learned and improve you own command-line tools. Then come back for another tutorial on testing and packaging of Click commands.

Full code example

import re
import os
import click
import requests

SAMPLE_API_KEY = 'b1b15e88fa797225412429c1c50c122a1'


class ApiKey(click.ParamType):
    name = 'api-key'

    def convert(self, value, param, ctx):
        found = re.match(r'[0-9a-f]{32}', value)

        if not found:
            self.fail(
                f'{value} is not a 32-character hexadecimal string',
                param,
                ctx,
            )

        return value


def current_weather(location, api_key=SAMPLE_API_KEY):
    url = 'https://api.openweathermap.org/data/2.5/weather'

    query_params = {
        'q': location,
        'appid': api_key,
    }

    response = requests.get(url, params=query_params)

    return response.json()['weather'][0]['description']


@click.group()
@click.option(
    '--api-key', '-a',
    type=ApiKey(),
    help='your API key for the OpenWeatherMap API',
)
@click.option(
    '--config-file', '-c',
    type=click.Path(),
    default='~/.weather.cfg',
)
@click.pass_context
def main(ctx, api_key, config_file):
    """
    A little weather tool that shows you the current weather in a LOCATION of
    your choice. Provide the city name and optionally a two-digit country code.
    Here are two examples:
    1. London,UK
    2. Canmore
    You need a valid API key from OpenWeatherMap for the tool to work. You can
    sign up for a free account at https://openweathermap.org/appid.
    """
    filename = os.path.expanduser(config_file)

    if not api_key and os.path.exists(filename):
        with open(filename) as cfg:
            api_key = cfg.read()

    ctx.obj = {
        'api_key': api_key,
        'config_file': filename,
    }


@main.command()
@click.pass_context
def config(ctx):
    """
    Store configuration values in a file, e.g. the API key for OpenWeatherMap.
    """
    config_file = ctx.obj['config_file']

    api_key = click.prompt(
        "Please enter your API key",
        default=ctx.obj.get('api_key', '')
    )

    with open(config_file, 'w') as cfg:
        cfg.write(api_key)


@main.command()
@click.argument('location')
@click.pass_context
def current(ctx, location):
    """
    Show the current weather for a location using OpenWeatherMap data.
    """
    api_key = ctx.obj['api_key']

    weather = current_weather(location, api_key)
    print(f"The weather in {location} right now: {weather}.")


if __name__ == "__main__":
    main()

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Working with Random Numbers in Python

Tue, 26 Dec 2017 00:00:00 GMT

Working with Random Numbers in Python

An overview for working with randomness in Python, using only functionality built into the standard library and CPython itself.

Generating Random Floats Between 0.0 and 1.0

The random.random() function returns a random float in the interval [0.0, 1.0). This means the returned random number will always be smaller than the right-hand endpoint (1.0). This is also known as a semi-open range:

>>> import random
>>> random.random()
0.11981376476232541
>>> random.random()
0.757859420322092
>>> random.random()
0.7384012347073081

Generating Random Ints Between `x` and `y`

This is how you can generate a random integer between two endpoints in Python with the random.randint() function. This spans the full [x, y] interval and may include both endpoints:

>>> import random
>>> random.randint(1, 10)
10
>>> random.randint(1, 10)
3
>>> random.randint(1, 10)
7

With the random.randrange() function you can exclude the right-hand side of the interval, meaning the generated number always lies within [x, y) and it will always be smaller than the right endpoint:

>>> import random
>>> random.randrange(1, 10)
5
>>> random.randrange(1, 10)
3
>>> random.randrange(1, 10)
4

Generating Random Floats Between `x` and `y`

If you need to generate random float numbers that lie within a specifc [x, y] interval you can use the random.uniform function:

>>> import random
>>> random.uniform(1, 10)
7.850184644194309
>>> random.uniform(1, 10)
4.00388600011348
>>> random.uniform(1, 10)
6.888959882650279

Picking a Random Element From a List

To pick a random element from a non-empty sequence (like a list or a tuple) you can use Python’s random.choice function:

>>> import random
>>> items = ['one', 'two', 'three', 'four', 'five']
>>> random.choice(items)
'five'
>>> random.choice(items)
'one'
>>> random.choice(items)
'four'

This works for any non-empty sequence, however it will throw an IndexError exception if the sequence is empty.

Randomizing a List of Elements

You can randomize a sequence in place using the random.shuffle function. This will modify the sequence object and randomize the order of elements:

>>> import random
>>> items = ['one', 'two', 'three', 'four', 'five']
>>> random.shuffle(items)
>>> items
['four', 'one', 'five', 'three', 'two']

If you’d rather not mutate the original you’ll need to make a copy first and then shuffle the copy. You can create copies of Python objects with the copy module.

Picking `n` Random Samples From a List of Elements

To pick a random sample of n unique elements from a sequence, use the random.sample function. It performs random sampling without replacement:

>>> import random
>>> items = ['one', 'two', 'three', 'four', 'five']
>>> random.sample(items, 3)
['one', 'five', 'two']
>>> random.sample(items, 3)
['five', 'four', 'two']
>>> random.sample(items, 3)
['three', 'two', 'five']

Generating Cryptographically Secure Random Numbers

If you need cryptographically secure random numbers for security purposes, use random.SystemRandom which uses a cryptographically secure pseudo-random number generator.

Instances of the SystemRandom class provide most of the random number generator operations available as function on the random module. Here’s an example:

>>> import random
>>> rand_gen = random.SystemRandom()

>>> rand_gen.random()
0.6112441459034399

>>> rand_gen.randint(1, 10)
2

>>> rand_gen.randrange(1, 10)
5

>>> rand_gen.uniform(1, 10)
8.42357365980016

>>> rand_gen.choice('abcdefghijklmn')
'j'

>>> items = ['one', 'two', 'three', 'four', 'five']
>>> rand_gen.shuffle(items)
>>> items
['two', 'four', 'three', 'one', 'five']

>>> rand_gen.sample('abcdefghijklmn', 3)
['g', 'e', 'c']

Be aware that SystemRandom isn’t guaranteed to be available on all systems that run Python (although it typically will be.)

Python 3.6+ – The secrets Module:

If you’re working on Python 3 and your goal is to generate cryptographically secure random numbers, then be sure to check out the secrets module. This module is available in the Python 3.6 (and above) standard library. It makes generating secure tokens a breeze.

Here are a few examples:

>>> import secrets

# Generate secure tokens:
>>> secrets.token_bytes(16)
b'\xc4\xf4\xac\x9e\x07\xb2\xdc\x07\x87\xc8 \xdf\x17\x85^{'
>>> secrets.token_hex(16)
'a20f016e133a2517414e0faf3ce4328f'
>>> secrets.token_urlsafe(16)
'eEFup5t7vIsoehe6GZyM8Q'

# Picking a random element from a sequence:
>>> secrets.choice('abcdefghij')
'h'

# Securely compare two strings for equality
# (Reduces the risk of timing attacks):
>>> secrets.compare_digest('abcdefghij', '123456789')
False
>>> secrets.compare_digest('123456789', '123456789')
True

You can learn more about the secrets module here in the Python 3 docs.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to Send an Email With Python

Tue, 19 Dec 2017 00:00:00 GMT

How to Send an Email With Python

Learn how to send emails using Python code via the built-in “smtplib” module from the standard library.

In most applications, you need to communicate with your users using electronic methods. Email is used to send password resets, confirmation of orders, and verification of user accounts. Whatever your reason, the process of emailing is always the same no matter what language you use.

In this tutorial you’ll learn how to send emails using Python.

Understanding Email Basics

Before we get into the code, you should understand basic email workflow. When you send an email either from a web-based application or local software running on your computer, your client application packages the message and sends it using an SMTP (Simple Mail Transfer Protocol) server.

You need this server to send email regardless if it’s from an email client like Outlook or Thunderbird, or a Python program. When you open a hosting account with any company, they will give you SMTP credentials to send email using their servers.

There are numerous open, free-to-use SMTP servers, but these are often used by spammers and blocked by most incoming mail servers. It’s best to use a password-protected SMTP server, because your mail will likely reach the recipient instead of getting filtered and dumped into the recipient’s spam folder.

An SMTP server isn’t always an external server on the host. In some cases, you will send email from the same machine running your Python code. You would then use “localhost” as your SMTP server. To find out the right configurations for this Python email example, please review your email provider’s documentation. I’ll use Gmail for this example.

When you want to send email to a recipient, first you need to gather the email’s parameters. This can be either from input entered by the user or hardcoded in your code.

A typical email requires the following parameters:

Recipient email address
Sender email address
Message Subject
Message Body
Attachments (if any, not required)
SMTP server address
SMTP port (usually 25, but could also be 2525 or 587 as alternatives)

A note about the sender address: You can use any email address that you want, but some incoming servers (i.e. Gmail) detect fake sender addresses and may drop your email into the spam folder for security purposes.

So it’s better to use a “real” email address that actually exists. You can then set it up as a “Do Not Reply” sender to alert users not to reply to the message rather than use a fake email sender address. In some cases, the SMTP server will reject the message and the recipient won’t get the email message at all.

Sending Email in Python With the `smtplib` Module

The first step is to import Python’s built-in smtplib library. This library takes care of most of the code in its own methods and properties, so you don’t need much code to send an email at all.

Type the following code at the beginning of your file:

import smtplib

With this library imported, we can set up email parameters. We know that at least recipient, sender, subject and body is needed, so let’s set up those variables:

import smtplib

sender = "sss@yourdomain.com"
recipient = "rrr@gmail.com"
subject = "Test email from Python"
text = "Hello from Python"

Easy enough. But now we need to send the email using an SMTP sever. In this example, we’ll use Gmail since it’s free and open to anyone with a Google account. Just keep in mind that if you’re hosting a website or web-based application, your host will have an SMTP server associated with your hosting account and you’ll need to adjust the SMTP server address and credentials for this example to work.

🔐 Enabling SMTP access in Gmail

To allow your Python app to log into the Gmail servers using your account in order to send emails, you need to allow it in your account settings. Go to this link while signed into your account and flip this feature on.

If you forget to turn on access to less secure applications, you will receive an SMTPAuthenticationError exception.

Gmail’s SMTP server is “smtp.gmail.com” and they use port 587. The username is your email address, and the password is your email password. Let’s add another variable to hold the password since we already have the username in the “sender” variable:

import smtplib

sender = "sss@yourdomain.com"
recipient = "rrr@gmail.com"
password = "thepassword" # Your SMTP password for Gmail
subject = "Test email from Python"
text = "Hello from Python"

Notice that the text variable only contains one sentence. If you need multi-line support, you can use the \\n character to add line-feeds:

text = "Hello from Python\nThis is line 2\nAnd line 3"

With our basic email parameters set up, we can now use the smtplib library to send the email. You can communicate with the SMTP server in plain text or encrypted using SSL.

Because privacy is an important issue, we’ll use the SMTP_SSL class to ensure communication between your Python program and the SMTP server is encrypted.

Please note that this is true only for the first “hop” in the chain—email is a distributed system and any email you send likely travels through many independent email servers that can access the full unencrypted contents of your email. There are also no guarantees that emails are encrypted in transit from one email server to the next, so email can’t be deemed a secure medium.

It’s always a good idea to use SMTP_SSL here because it will ensure we’re not leaking your SMTP credentials when connecting to the email server:

import smtplib

sender = "sss@yourdomain.com"
recipient = "rrr@gmail.com"
password = "xxxxxx" # Your SMTP password for Gmail
subject = "Test email from Python"
text = "Hello from Python"

smtp_server = smtplib.SMTP_SSL("smtp.gmail.com", 465)
smtp_server.login(sender, password)
message = "Subject: {}\n\n{}".format(subject, text)
smtp_server.sendmail(sender, recipient, message)
smtp_server.close()

Let’s review what happens in the above code snippet.

First, the SMTP_SSL method sets up the server settings using SSL. Then, the login() method verifies your username and password. If it’s incorrect, you’ll receive an authentication error:

smtplib.SMTPAuthenticationError: (535, b'5.7.8 Username and Password not accepted. Learn more at\n5.7.8  https://support.google.com/mail/?p=BadCredentials o22 sm62348871wrb.40 - gsmtp')

Also, with Gmail, if the wrong username and password is used, you receive an alert on your account that a failed login attempt was made. If you use Gmail to practice emailing with Python, try to avoid too many incorrect login attempts or Google will lock the account for security purposes. It’s always better to use a throwaway account when practicing.

Next, the sendmail() method tells the SMTP server to deliver the actual email payload. You’ll notice this method doesn’t accept separate arguments for the message subject and body. Instead, the email subject is denoted by a Subject: prefix in the message payload. So we’ll need to prepare the message body first by formatting the subject and text variables, and then passing the result to sendmail().

This will hand over the message to the SMTP server and deliver it to the recipient. If there’s an issue with your SMTP username and password or the login() call failed, you might encounter a SMTPSenderRefused exception:

smtplib.SMTPSenderRefused: (530, b'5.5.1 Authentication Required. Learn more at\n5.5.1  https://support.google.com/mail/?p=WantAuthError o22sm62348871wrb.40 - gsmtp', 'sss@yourdomain.com')

If all goes well and you run the code above with your own email accounts, the email message will be delivered to the recipient address.

You can send more than one email that way simply by repeatedly calling the sendmail() method. Once you’re done sending you should close the SMTP connection by calling the close() method on the SMTP_SSL object.

That’s all it takes to send an email in Python.

Just remember to limit the number of emails that you send at once, or you could run into spam filters. Gmail rate limits the number of messages that you can send at once, so you might want to place a delay between sending messages, for example with Python’s time.sleep() function.

Additional Resources

Python’s smptlib module documentation (Python 2, Python 3)
Simple Mail Transfer Protocol (SMTP) on Wikipedia
Python’s email module documentation (Python 2, Python 3): The email module included with Python’s standard library helps you format and parse email messages. Instead of assembling the message payload manually using string formatting you can use the functions in the email module and make your code more robust and readable.
Instead of directly connecting to an SMTP server and sending your emails that way, you can sign up with an email service provider that offers its own Python SDK or generic web API for sending email. Two services I can recommend are SendGrid and MailJet.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python Tricks: The Book Is Now Available on Kindle

Wed, 13 Dec 2017 00:00:00 GMT

Python Tricks: The Book Is Now Available on Kindle

Get the Kindle version of “Python Tricks: A Buffet of Awesome Python Features” and enjoy a smooth reading experience across all of your devices.

My couch + a good book on my Kindle == bliss.

I can’t help it, it’s my happy place. So whenever I can I spend a few happy hours stretched out on the couch, reading and gulping down coffee.

Apparently I’m not the only Pythonista who enjoys this—

When the paperback version of Python Tricks came out a few weeks ago, I got several emails that basically said:

“That’s cool… Can’t wait for the Kindle version.”

Okay, I get it. Printed books are nice and all but reading on your Kindle is just so…damn convenient.

Especially programming books! You get full-text search, you can highlight text passages and review them when you’re back at your computer.

You can even copy and paste code examples…

And, you can continue reading the same book on your phone wherever and whenever your want: on the bus, on the train, waiting at the doctor’s office.

(Plus ebooks are “tree cruelty-free.”)

Now, if you’ve been waiting for the Kindle version of Python Tricks: The Book to finally come to Amazon, you’re in luck:

After tons of back and forth to resolve any remaining formatting issues, Python Tricks is now available on Amazon Kindle 😄

~~~ BEGIN RANT ~~~

By the way, I find that too many programming ebooks have sloppy typography—and it really triggers me.

So for Python Tricks I sweated away in my man cave for days, tweaking and testing, to make the Kindle version a truly enjoyable read.

It even has proper Python syntax highlighting, weee!

~~~ END RANT ~~~

Anyway, here’s the deal:

With Python Tricks: The Book you’ll discover Python’s best practices and the power of beautiful & Pythonic code with simple examples and a step-by-step narrative.

Right now the book has 114 reviews on Amazon.com (96% are 5-star ratings):

The paperback became an Amazon bestseller and ranked #1 in several programming-related categories—it even hit the top 500 list for *all* books available on Amazon:

If you’re serious about improving your Python skills, you’ll find a ton of value in this book. But don’t just take my word for it—

Mariatta Wijaya (who is a CPython Core Developer and also wrote the foreword for the book) said this about the book:

“I wished I had access to a book like this when I started learning Python many years ago.”

Where to Get the Kindle Version

You can currently get the Kindle version at an introductory launch discount. In a few days the price will be higher.

So be sure to act quickly and grab your copy before Amazon adjusts the price:

>> Click here to get the Kindle version of Python Tricks on Amazon

Thank You

The feedback I received from the dbader.org community and from PythonistaCafe members was priceless—and I’m extremely happy with how the final book turned out.

So once again I want to send out a big ❤️ Thank You ❤️ to everyone who sent me feedback, offered their criticism, or called me out on a typo!

I’d like to especially thank Michael Howitz, Johnathan Willitts, Julian Orbach, Johnny Giorgis, Bob White, Daniel Meyer, Michael Stueben, Bob Belderbos, Smital Desai, Andreas Kreisig, David Perkins, Jay Prakash Singh, and Ben Felder for their ongoing support and excellent feedback.

Also, huge thanks to my friend Mariatta Wijaya for the amazing foreword she contributed to the book. I couldn’t have hoped for a better introduction for my readers 🙂

Learn More About the Kindle Version

Check out the video embedded below to learn more about the Kindle version of Python Tricks and how it compares to the paperback:

On top of that, you might be interested in these two podcast episodes where I made guest appearances to discuss some actionable tips you can use to write better Python code, and chatted about the making of the book:

And finally, here are the Amazon links again:

Happy Pythoning!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python Multi-line Comments: Your Two Best Options

Tue, 12 Dec 2017 00:00:00 GMT

Python Multi-line Comments: Your Two Best Options

Does Python support multi-line comments the way other languages do? What are your options for writing comment blocks in Python if you need them?

Most programming languages have syntax for block comments that span multiple lines of text, like C or Java:

/*
This is a block comment.
It spans multiple lines.
Nice, eh?
*/
int answer = 42;

How do you write the same style of multiline comment in Python? The short answer is: you can’t—at least not in exactly the same way.

Python uses different conventions and syntax for block comments that span multiple lines. In this article you’ll see some options for creating multiline comments in Python that actually work.

Option 1: Consecutive Single-line Comments

Your first option for commenting out multiple lines of code in Python is to simply use a # single-line comment on every line:

# This is a "block comment" in Python, made
# out of several single-line comments.
# Pretty great, eh?
answer = 42

In my experience, most Python projects follow this style and Python’s PEP 8 style guide also favors repeated single-line comments. So this is what I’d recommend that you use most of the time. This is also the only way to write “real” comment blocks in Python that are ignored by the parser.

If it bothers you that Python doesn’t support proper multiline comments because you think it takes more effort to comment out multiple lines of code, here’s a handy tip for you:

Most code editors have a shortcut for block commenting. In my Sublime Text development setup I simply select a couple of lines using shift and the cursor keys (or the mouse) and then I hit cmd + / to comment them out all at once.

This even works in reverse, that is, I can select a block of single-line comments and when I hit the cmd + / keyboard shortcut the whole block gets uncommented again.

Other editors can do this too—Atom, VS Code, and even Notepad++ all have built-in shortcuts for block commenting in Python. Managing your Python comments manually is a chore, and this editor feature can save you hours of your time.

Option 2: Using Multi-line Strings as Comments

Another option for writing “proper” multi-line comments in Python is to use multi-line strings with the """ syntax in creative ways. Here’s an example:

"""
This is a "block comment" in Python, made
out of a mult-line string constant.
This actually works quite well!
"""
answer = 42

As you can see, you can use triple-quoted strings to create something that resembles a multiline comment in Python. You just need to make sure you indent the first """ correctly, otherwise you’ll get a SyntaxError. For example, if you’d like to define a block comment inside a function with this technique you have to do it like this:

def add_stuff(a, b):
    result = a + b
    """
    Now we return the result, wee!
    Hurray! I'm so excited I can't contain
    my joy to just one or two lines!
    """
    return result

Just keep in mind that this technique doesn’t create “true” comments. This simply inserts a text constant that doesn’t do anything. It’s the same as inserting a regular single-line string somewhere in your code and never accessing it.

However, such an orphaned string constant won’t show up in the bytecode, effectively turning it into a multi-line comment. Here’s proof that the unused string won’t appear in the CPython bytecode disassembly:

>>> import dis
>>> dis.dis(add_stuff)
  2    0 LOAD_FAST      0 (a)
       2 LOAD_FAST      1 (b)
       4 BINARY_ADD
       6 STORE_FAST     2 (result)
  8    8 LOAD_FAST      2 (result)
      10 RETURN_VALUE

However, be careful where you place these “comments” in the code. If the string follows right after a function signature, a class definition, or at the start of a module, it turns into a docstring which has a different meaning altogether in Python:

def add_stuff(a, b):
    """
    This is now a function docstring associated
    with the function object and accessible as
    run-time metadata.
    """
    result = a + b
    return result

Docstrings (“documentation strings”) let you associate human-readable documentation with Python modules, functions, classes, and methods. They’re different from source code comments:

A comment is removed by the parser, whereas a docstring ends up in the bytecode and is associated with the documented object. It can even be accessed programmatically at runtime.

Like I said earlier, the only way to get “true” multi-line comments in Python that get ignored by the parser is to use multiple # single-line comments.

I’ll admit I was slightly surprised to find this “fake” block commenting style was endorsed by Guido van Rossum, the creator of Python:

“Python tip: You can use multi-line strings as multi-line comments. Unless used as docstrings, they generate no code! :-)” (Source)

But there you have it—in some cases using triple-quoted strings to make a comment block might be the right choice. Personally I’ll try to avoid them in production-ready code, but I occasionally use them when I’m working on a source file to jot down notes or to make little ad-hoc to-do lists.

Multi-line Comments in Python – Key Takeaways

Unlike other programming languages Python doesn’t support multi-line comment blocks out of the box.
The recommended way to comment out multiple lines of code in Python is to use consecutive # single-line comments. This is the only way to get “true” source code comments that are removed by the Python parser.
You may consider using triple-quote """ strings to create something akin to multi-line comments in Python, but this isn’t a perfect technique and your “comments” may turn into accidental docstrings.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Writing Python Command-Line Tools With Click

Wed, 06 Dec 2017 00:00:00 GMT

Writing Python Command-Line Tools With Click

An in-depth tutorial on writing Python command-line (CLI) apps using the Click library for argument parsing and more.

Python is often referred to as a glue code language because it’s extremely flexible and integrates well with existing programs. This means that a large portion of Python code is written as scripts and command-line interfaces (CLI).

Building these command-line interfaces and tools is extremely powerful because it makes it possible to automate almost anything. As a result, CLIs can become quite complex over time—

It usually starts with a very simple script that runs a bit of Python code to do one specific thing. For example, access a web API and print the output to the console:

# print_user_agent.py
import requests

json = requests.get('http://httpbin.org/user-agent').json()
print(json['user-agent'])

You can simply run this using python print_user_agent.py and it will print out the name of the user agent used to make the API call.

As I said, a very simple script 😉

But what are your options when such a Python command-line script grows and becomes more complex?

That’s what we’ll be looking at throughout this tutorial. You’ll learn about the basics of building a CLI in Python and how click makes it a much better experience.

We’ll use that knowledge and go step-by-step from a simple script to a CLI with command-line arguments, options and useful usage instructions. All of this using the power of a framework called click.

At the end of this tutorial, you’ll know:

Why click is a better alternative to argparse and optparse
How to create a simple CLI with it
How to add mandatory command-line arguments to your scripts
How to parse command-line flags and options; and
How you can make your command-line apps more user friendly by adding help (usage) text

And you’ll see how to achieve all of that with a minimal amount of boilerplate, too.

By the way, all of the code examples in this tutorial use Python 3.6. They might not work with earlier versions of Python, but if you run into any trouble leave a comment below and we’ll get it sorted out together.

Let’s get started!

Why should you write Python command-line scripts and tools?

The code snippet above is just an example and not very useful in real life. The scripts that I’ve written throughout my career as a Python developer are a lot more complex. They usually help build, test and deploy applications and make the process repeatable.

You might have your own experiences and know that this can be a large part of our daily work: Some scripts remain within the project they are built for. Others become useful to other teams or projects. They might even be extended with additional features.

In these cases, it becomes important to make the scripts more flexible and configurable using command-line parameters. It makes it possible to provide server names, credentials or any other piece of information to the script.

This is where Python modules like optparse and argparse come in and make your life a lot easier. But before we take a closer look at those, let’s get our terminology straight.

Basics of a command-line interface

A command-line interface (CLI) starts with the name of the executable. You type it’s name in the console and you access the main entry point of the script, such as pip.

Depending on the complexity of the CLI, you usually have parameters that you can pass to the script which can either be:

An argument, which is a mandatory parameter that’s passed to the script. If you don’t provide it, the CLI will return an error. For example, click is the argument in this command: pip install click.
Or it can be an option, which is an optional (🤯) parameter combining a name and a value portion such as --cache-dir ./my-cache. You tell the CLI that the value ./my-cache should be uses as the cache directory.
One special options is the flag which enables or disables a certain behaviour. The most common is probably --help. You only specify the name and the CLI interprets the value internally.

With more complex CLIs such as pip or the Heroku Toolbelt, you’ll get access to a collection of features that are all grouped under the main entry point. They are usually referred to as commands or sub-commands.

You’ve probably already used a CLI when you installed a Python package using pip install <PACKAGE NAME>. The command install tells the CLI that you’d like to access the feature to install a package and gives you access to parameters that are specific to this feature.

Command-line frameworks available in the Python 3.x standard library

Adding commands and parameters to your scripts is extremely powerful but the parsing of the command-line isn’t as straight forward as you would think. Instead of starting to write your own, you should use one of Python’s many packages that have solved this problem already.

The two most well-known packages are optparse and argparse. They are part of the Python standard library following the “batteries included” principle.

They mostly provide the same functionality and work very similar. The biggest difference is that optparse is deprecated since Python 3.2 and argparse is considered the standard for implementing CLIs in Python.

You can find more details on both of them in the Python documentation but to give you an idea what an argparse script looks like, here’s an example:

import argparse

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                    help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
                    const=sum, default=max,
                    help='sum the integers (default: find the max)')

args = parser.parse_args()
print(args.accumulate(args.integers))

`click` vs `argparse`: A better alternative?

You’re probably looking at the code example above, thinking “what do any of these things mean?” And that’s exactly one of the problems I have with argparse: it’s unintuitive and hard to read.

That’s why I fell in love with click.

Click is solving the same problem as optparse and argparse but uses a slightly different approach. It uses the concept of decorators. This requires commands to be functions that can be wrapped using decorators.

Dan wrote a great introduction to decorators if this is the first time you hear the term or would like a quick refresher.

The author of click, Armin Ronacher, describes in a lot of detail why he wrote the framework. You can read the section “Why Click?” in the documentation and I encourage you to take a look.

The main reason why I use click is that you can easily build a feature-rich CLI with a small amount of code. And the code is easy to read even when your CLI grows and becomes more complex.

Building a simple Python command-line interface with `click`

I’ve talked enough about CLIs and frameworks. Let’s take a look at what it means to build a simple CLI with click. Similar to the first example in this tutorial, we can create a simple click-based CLI that prints to the console. It doesn’t take much effort:

# cli.py
import click

@click.command()
def main():
    print("I'm a beautiful CLI ✨")

if __name__ == "__main__":
    main()

First of all, let’s not worry about the last two lines for now. This is just Python’s (slightly unintuitive) way to run the main function when the file is executed as a script.

As you can see, all we have to do, is create a function and add the @click.command() decorator to it. This turns it into a click command which is the main entry point for our script. You can now run it on the command-line and you’ll see something like this:

$ python cli.py
I'm a beautiful CLI ✨

The beauty about click is, that we get some additional features for free. We didn’t implement any help functionality but you add the --help option and you’ll see a basic help page printed to the command-line:

$ python cli.py --help
Usage: cli.py [OPTIONS]

Options:
  --help  Show this message and exit.

A more realistic Python CLI example with `click`

Now that you know how click makes it easy to build a simple CLI, we are going to take a look at a slightly more realistic example. We’ll be building a program that allows us to interact with a Web API. Everyone uses them these days and they give us access to some cool data.

The API that we’ll look at for the rest of this tutorial is the OpenWeatherMap API. It provides the current weather as well as a five day forecast for a specific location. We’ll start with their sample API returning the current weather for a location.

I like to experiment with an API before I start writing code to understand better how it works. One tool that I think you should know about is HTTPie which we can use to call the sample API and see the result that it returns. You can even try their online terminal to run it without installation.

Let’s look at what happens when we call the API with London as the location:

$ http --body GET http://samples.openweathermap.org/data/2.5/weather \
  q==London \
  appid==b1b15e88fa797225412429c1c50c122a1
{
    "base": "stations",
    "clouds": {
        "all": 90
    },
    "cod": 200,
    "coord": {
        "lat": 51.51,
        "lon": -0.13
    },
    "dt": 1485789600,
    "id": 2643743,
    "main": {
        "humidity": 81,
        "pressure": 1012,
        "temp": 280.32,
        "temp_max": 281.15,
        "temp_min": 279.15
    },
    "name": "London",
    "sys": {
        "country": "GB",
        "id": 5091,
        "message": 0.0103,
        "sunrise": 1485762037,
        "sunset": 1485794875,
        "type": 1
    },
    "visibility": 10000,
    "weather": [
        {
            "description": "light intensity drizzle",
            "icon": "09d",
            "id": 300,
            "main": "Drizzle"
        }
    ],
    "wind": {
        "deg": 80,
        "speed": 4.1
    }
}

In case you’re looking at the screen with a face like this 😱 because the above example contains an API key, don’t worry that’s the sample API key they provide.

The more important observation from the above example is that we send two query parameters (denoted by == when using HTTPie) to get the current weather:

q is our location name; and
appid is our API key.

This allows us to create a simple implementation using Python and the Requests library (we’ll ignore error handling and failed requests for the purpose of simplicity.)

import requests

SAMPLE_API_KEY = 'b1b15e88fa797225412429c1c50c122a1'

def current_weather(location, api_key=SAMPLE_API_KEY):
    url = 'http://samples.openweathermap.org/data/2.5/weather'

    query_params = {
        'q': location,
        'appid': api_key,
    }

    response = requests.get(url, params=query_params)

    return response.json()['weather'][0]['description']

This function makes a simple request to the weather API using the two query parameters. It takes a mandatory argument location which is assumed to be a string. We can also provide an API key by passing api_key in the function call. It is optional and uses the sample key as a default.

And here’s our current weather for London form the Python REPL:

>>> current_weather('London')
'light intensity drizzle'  # not surprising 😉

⏰ Sidebar: Making your `click` command executable

You may be wondering how to make your Python script executable so that you can call it from the command-line as $ weather London instead of having to call the python interpreter manually each time:

# Nice:
$ python cli.py London

# Even better:
$ weather London

Check out this tutorial on how to turn your Python scripts into “real” command-line commands you can run from the system terminal.

Parsing a mandatory parameter with `click`

The simple current_weather function allows us to build our CLI with a custom location provided by the user. I would like it to work similar to this:

$ python cli.py London
The weather in London right now: light intensity drizzle.

You probably guessed it already, the location in this call is what I introduced as an argument earlier. That’s because it is a mandatory parameter for our weather CLI.

How do we implement that in Click? It’s pretty straight forward, we use a decorator called argument. Who would’ve thought?

Let’s take the simple example from earlier and modify it slightly by defining the argument location.

@click.command()
@click.argument('location')
def main(location):
    weather = current_weather(location)
    print(f"The weather in {location} right now: {weather}.")

You can see that all we have to do is add an additional decorator to our main function and give it a name. Click uses that name as the argument name passed into the wrapped function.

In our case, the value for the command-line argument location will be passed to the main function as the argument location. Makes sense, right?

You can also use dashes (-) in your names such as api-key which Click will turned into snake case for the argument name in the function, e.g. main(api_key).

The implementation of main simply uses our current_weather function to get the weather for the location provided by the caller of our CLI. And then we use a simple print statement to output the weather information 🤩

Done!

And if that print statement looks weird to you, that’s because it is a shiny new way of formatting strings in Python 3.6+ called f-string formatting. You should check out the 4 major ways to format strings to learn more.

Parsing optional parameters with `click`

You’ve probably figured out a tiny flaw with the sample API that we’ve used above, you’re a smart 🍪

Yes, it’s a static endpoint always returning the weather for London from January 2017. So let’s use the actual API with a real API key. You can sign up for a free account to follow along.

The first thing we’ll need to change is the URL endpoint for the current weather. We can do that by replacing the url in the current_weather function to the endpoint in the OpenWeatherMap documentation:

def current_weather(location, api_key=SAMPLE_API_KEY):
    url = 'https://api.openweathermap.org/data/2.5/weather'

    # everything else stays the same
    ...

The change we just made will now break our CLI because the default API key is not valid for the real API. The API will return a 401 UNAUTHORIZED HTTP status code. Don’t believe me? Here’s the proof:

$ http GET https://api.openweathermap.org/data/2.5/weather q==London appid==b1b15e88fa797225412429c1c50c122a1
HTTP/1.1 401 Unauthorized
{
    "cod": 401,
    "message": "Invalid API key. Please see http://openweathermap.org/faq#error401 for more info."
}

So let’s add a new parameter to our CLI that allows us to specify the API key. But first, we have to decide if this should be an argument or an option. I say we make it an option because adding a named parameter like --api-key makes it more explicit and self-documenting.

Here’s how I think the user should run it:

$ python cli.py --api-key <your-api-key> London
The weather in London right now: light intensity drizzle.

That’s nice and easy. So let’s see how we can add it to our existing click command.

@click.command()
@click.argument('location')
@click.option('--api-key', '-a')
def main(location, api_key):
    weather = current_weather(location, api_key)
    print(f"The weather in {location} right now: {weather}.")

Once again, we are adding a decorator to our main function. This time, we use the very intuitively named @click.option and add in the name for our option including the leading double dashes (--). As you can see, we can also provide a shortcut option with a single dash (-) to save the user some typing.

I mentioned before that click creates the argument passed to the main function from the long version of the name. In case of an option, it strips the leading dashes and turns them into snake case. --api-key becomes api_key.

Last thing we have to do to make this work is passing the API key through to our current_weather function. Boom 👊🏼

We’ve made it possible for our CLI user to use their own key and check out any location:

$ python cli.py --api-key <your-api-key> Canmore
The weather in Canmore right now: broken clouds.

And looking out my window, I can confirm that’s true 😇

Adding auto-generated usage instructions to your Python command-line tool

You can pat yourself on the back, you’ve built a great little CLI with a minimal amount of boilerplate code. But before you take a break and enjoy a beverage of your choice. Let’s make sure a new user can learn how to run our little CLI…by adding some documentation (don’t run, it’ll be super easy.)

First let’s check and see what the --help flag will display after all the changes that we’ve made. As you can see, it’s not bad for no effort at all:

$ python cli.py --help
Usage: cli.py [OPTIONS] LOCATION

Options:
  -a, --api-key TEXT
  --help              Show this message and exit.

The first thing that we want to fix is the missing description for our API key option. All we have to do is provide a help text to the @click.option decorator:

@click.command()
@click.argument('location')
@click.option(
    '--api-key', '-a',
    help='your API key for the OpenWeatherMap API',
)
def main(location, api_key):
    ...

The second and final change we’ll make is adding documentation for the overall click command. And the easiest and most Pythonic way is adding a docstring to our main function. Yes, we should do that anyways, so this isn’t even extra work:

...
def main(location, api_key):
    """
    A little weather tool that shows you the current weather in a LOCATION of
    your choice. Provide the city name and optionally a two-digit country code.
    Here are two examples:

    1. London,UK

    2. Canmore

    You need a valid API key from OpenWeatherMap for the tool to work. You can
    sign up for a free account at https://openweathermap.org/appid.
    """
    ...

Putting it all together, we get some really nice output for our weather tool.

$ python cli.py --help
Usage: cli.py [OPTIONS] LOCATION

  A little weather tool that shows you the current weather in a LOCATION of
  your choice. Provide the city name and optionally a two-digit country
  code. Here are two examples:

  1. London,UK

  2. Canmore

  You need a valid API key from OpenWeatherMap for the tool to work. You can
  sign up for a free account at https://openweathermap.org/appid.

Options:
  -a, --api-key TEXT  your API key for the OpenWeatherMap API
  --help              Show this message and exit.

I hope at this point you feel like I felt when I first discovered click: 🤯

Python CLIs with `click`: Summary & Recap

Alright, we’ve covered a ton of ground in this tutorial. Now it’s time for you to feel proud of yourself. Here’s what you’ve learned:

Why click is a better alternative to argparse and optparse
How to create a simple CLI with it
How to add mandatory command-line arguments to your scripts
How to parse command-line flags and options; and
How you can make your command-line apps more user friendly by adding help (usage) text

And all of that with a minimal amount of boilerplate! The full code example below illustrates that. Feel free to use it for your own experiments 😎

import click
import requests

SAMPLE_API_KEY = 'b1b15e88fa797225412429c1c50c122a1'


def current_weather(location, api_key=SAMPLE_API_KEY):
    url = 'https://api.openweathermap.org/data/2.5/weather'

    query_params = {
        'q': location,
        'appid': api_key,
    }

    response = requests.get(url, params=query_params)

    return response.json()['weather'][0]['description']


@click.command()
@click.argument('location')
@click.option(
    '--api-key', '-a',
    help='your API key for the OpenWeatherMap API',
)
def main(location, api_key):
    """
    A little weather tool that shows you the current weather in a LOCATION of
    your choice. Provide the city name and optionally a two-digit country code.
    Here are two examples:
    1. London,UK
    2. Canmore
    You need a valid API key from OpenWeatherMap for the tool to work. You can
    sign up for a free account at https://openweathermap.org/appid.
    """
    weather = current_weather(location, api_key)
    print(f"The weather in {location} right now: {weather}.")


if __name__ == "__main__":
    main()

If this has inspired you, you should check out the official click documentation for more features. You can also check out my introduction talk to click at PyCon US 2016. Or keep an eye out for my follow up tutorial where you’ll learn how to add some more advanced features to our weather CLI.

Happy CLI coding!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python’s enumerate() Function Demystified

Tue, 05 Dec 2017 00:00:00 GMT

Python’s enumerate() Function Demystified

How and why you should use the built-in enumerate function in Python to write cleaner and more Pythonic loops.

Python’s enumerate function is a mythical beast—it’s hard to summarize its purpose and usefulness in a single sentence.

And yet, it’s a super useful feature that many beginners and even intermediate Pythonistas are blissfully unaware of. Basically, enumerate() allows you to loop over a collection of items while keeping track of the current item’s index in a counter variable.

Let’s take a look at an example:

names = ['Bob', 'Alice', 'Guido']
for index, value in enumerate(names):
    print(f'{index}: {value}')

This leads to the following output:

0: Bob
1: Alice
2: Guido

As you can see, this iterated over the names list and generated an index for each element by increasing a counter variable starting at zero.

[ If you’re wondering about the f'...' string syntax I used in the above example, this is a new string formatting technique available in Python 3.6 and above. ]

Make Your Loops More Pythonic With `enumerate()`

Now why is keeping a running index with the enumerate function useful?

I noticed that new Python developers coming from a C or Java background sometimes use the following range(len(...)) antipattern to keep a running index while iterating over a list with a for-loop:

# HARMFUL: Don't do this
for i in range(len(my_items)):
    print(i, my_items[i])

By using the enumerate function skillfully, like I showed you in the “names” example above, you can make this looping construct much more “Pythonic” and idiomatic.

There’s usually no need to generate element indexes manually in Python—you simply leave all of this work to the enumerate function. And as a result your code will be easier to read and less vulnerable to typos.

Changing the Starting Index

Another useful feature is the ability to choose the starting index for the enumeration. The enumerate() function accepts an optional argument which allows you to set the initial value for its counter variable:

names = ['Bob', 'Alice', 'Guido']
for index, value in enumerate(names, 1):
    print(f'{index}: {value}')

In the above example I changed the function call to enumerate(names, 1) and the extra 1 argument now starts the index at one instead of zero:

1: Bob
2: Alice
3: Guido

Voilà, this is how you switch from zero-based indexing to starting with index 1 (or any other int, for that matter) using Python’s enumerate() function.

How `enumerate()` Works Behind The Scenes

You might be wondering how the enumerate function works behind the scenes. Part of it’s magic lies in the fact that enumerate is implemented as a Python iterator. This means that element indexes are generated lazily (one by one, just-in-time), which keeps memory use low and keeps this construct so fast.

Let’s play with some more code to demonstrate what I mean:

>>> names = ['Bob', 'Alice', 'Guido']
>>> enumerate(names)
<enumerate object at 0x1057f4120>

In the above code snippet I set up the same enumeration you’ve already seen in the previous examples. But instead of immediately looping over the result of the enumerate call I’m just displaying the returned object on the Python console.

As you can see, it’s an “enumerate object.” This is the actual iterator. And like I said, it generates its output elements lazily and one by one when they’re requested.

In order to retrieve those “on demand” elements so we can inspect them, I’m going to call the built-in list() function on the iterator:

>>> list(enumerate(names))
[(0, 'Bob'), (1, 'Alice'), (2, 'Guido')]

For each element in the input list (names) the iterator returned by enumerate() produces a tuple of the form (index, element). In your typical for-in loop you’ll use this to your advantage by leveraging Python’s data structure unpacking feature:

for index, element in enumerate(iterable):
    # ...

The `enumerate` Function in Python – Key Takeaways

enumerate is a built-in function of Python. You use it to loop over an iterable with an automatic running index generated by a counter variable.
The counter starts at 0 by default, but you can set it to any integer.
enumerate was added to Python starting at version 2.3 with the implementation of PEP 279.
Python’s enumerate function helps you write more Pythonic and idiomatic looping constructs that avoid the use of clunky and error-prone manual indexing.
To use enumerate to its fullest potential, be sure to study up on Python’s iterators and data structure unpacking features.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Using Python for Mobile Development: Kivy vs BeeWare

Tue, 28 Nov 2017 00:00:00 GMT

Using Python for Mobile Development: Kivy vs BeeWare

Mobile application development on Android and iOS wasn’t Python’s strong suit in the past—but things might be changing…

What about using Python for mobile app development? Historically, Python didn’t have a strong story when it came to writing mobile GUI applications.

In fact, Android and iOS development was pretty much out of the question with pure Python alone. That’s a shame—

Thankfully, there have been a number of developments in recent years that vastly improved the outlook on using Python for writing mobile apps.

In this article we’ll take a look at a few modern options for mobile application development with Python. There are two frameworks I’d like to call out specifically: Kivy and the BeeWare project.

If you prefer video, check out the embedded video below with a quick five minute walkthrough of both projects and the philosophies behind them:

Kivy – Cross-platform Python GUIs

Kivy an open-source Python library for developing cross-platform GUI applications. It allows you to write pure-Python graphical applications that run on the main desktop platforms (Windows, Linux, and macOS) and on iOS & Android.

Now, every time I hear about a new GUI toolkit I always want to know how “native” it feels—I believe that great GUI application should play to the strengths of the platform they run on.

For example, when I’m using my iPhone I want consistency across the apps I use. It feels jarring to use an app that was designed with user interface patterns from another platform.

Kivy comes with a custom-built UI toolkit that provides it’s own versions of buttons, text labels, text entry forms, and so on. This means these widgets are not rendered using the native platform UI controls. This has pros and cons:

On the one hand this guarantees consistency and portability of your app from one platform to another. But on the other hand it also means that your Android app won’t really look and feel like an Android app…

Depending on the kind of app you have in mind, this might not be a problem at all, however. For most games, for example, the “nativeness” of the UI isn’t very important. The same is true for certain kind of niche apps like graphical MIDI controllers for making music. But for other types of apps this has a huge impact on usability.

So, if you can work with a non-native UI toolkit in your apps then Kivy is a great choice. It allows you to write mobile applications using your Python programming skills without having to learn another platform-specific language like Apple’s Swift.

You can learn more about Kivy at https://kivy.org

The BeeWare Project – Native Python Mobile Apps

The second Python GUI and mobile development framework I want to tell you about is called the “BeeWare” project. It offers you a set of tools and an abstraction layer you can use to write native-looking mobile and desktop applications using Python.

The key difference between Kivy and BeeWare is that BeeWare programs use the native UI toolkit of the platform they run on, whereas Kivy apps use a custom UI toolkit that uses the same controls across all platforms.

With BeeWare, the UI controls your app uses will be the buttons, check boxes, and form elements provided by the underlying operating system. This means you can build apps that look and feel 100% native to each specific mobile (and desktop) platform.

Sounds great, right?

The only downside is that the BeeWare project is still relatively new and currently under heavy development lead by Pythonista Russel Keith-Magee. As with any framework that hasn’t had a chance yet to mature for years this means more work for you as a developer due to (potentially frequent) API changes, bugs, and lack of features.

Nevertheless, I’d encourage you to read up on BeeWare, it’s a really exciting project. You can learn more about it here: https://pybee.org/project/using/

Pythonic Mobile App Development – Conclusion

Now, which way should you look if you want to build a mobile app with Python? Both Kivy and BeeWare are worth considering. And as far as maturity goes, Kivy seems to be the more mature platform right now.

For the use cases that I’m personally the most interested in—making native-looking mobile and desktop apps with Python—I think that BeeWare will eventually gain the upper hand, due to the “native UI controls” advantage.

But, to be honest with you, if you’re thinking about writing a great mobile app today it might not make much sense to build it with Python… If you want the best result and use state of the art platform-specific features, your best bet will be getting comfortable with Java (Android) and Swift (iOS).

However, I believe this can and will change in the future. Python’s future in the mobile dev space is looking brighter by the minute. And with Python’s rising popularity there’s a great argument to be made for using it for mobile app development.

Personally, I’d love to have the ability to write cross-platform mobile apps with Python, simply because Python is such an enjoyable language to work with.

I’m truly excited to see what the possibilities will be in a year from now. So, if you’re looking for a cool open-source project to contribute to, please consider Kivy and the BeeWare project.

You’ll help create a better future for all of us 🙂

Happy (mobile) Pythoning!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Memoization in Python: How to Cache Function Results

Tue, 21 Nov 2017 00:00:00 GMT

Memoization in Python: How to Cache Function Results

Speed up your Python programs with a powerful, yet convenient, caching technique called “memoization.”

In this article, I’m going to introduce you to a convenient way to speed up your Python code called memoization (also sometimes spelled memoisation):

Memoization is a specific type of caching that is used as a software optimization technique.

A cache stores the results of an operation for later use. For example, your web browser will most likely use a cache to load this tutorial web page faster if you visit it again in the future.

So, when I talk about memoization and Python, I am talking about remembering or caching a function’s output based on its inputs. Memoization finds its root word in “memorandum”, which means “to be remembered.”

Memoization allows you to optimize a Python function by caching its output based on the parameters you supply to it. Once you memoize a function, it will only compute its output once for each set of parameters you call it with. Every call after the first will be quickly retrieved from a cache.

In this tutorial, you’ll see how and when to wield this simple but powerful concept with Python, so you can use it to optimize your own programs and make them run much faster in some cases.

Why and When Should You Use Memoization in Your Python Programs?

The answer is expensive code:

When I am analyzing code, I look at it in terms of how long it takes to run and how much memory it uses. If I’m looking at code that takes a long time to run or uses a lot of memory, I call the code expensive.

It’s expensive code because it costs a lot of resources, space and time, to run. When you run expensive code, it takes resources away from other programs on your machine.

If you want to speed up the parts in your Python application that are expensive, memoization can be a great technique to use. Let’s take a deeper look at memoization before we get our hands dirty and implement it ourselves!

All code examples I use in this tutorial were written in Python 3, but of course the general technique and patterns demonstrated here apply just as well to Python 2.

The Memoization Algorithm Explained

The basic memoization algorithm looks as follows:

Set up a cache data structure for function results
Every time the function is called, do one of the following:
- Return the cached result, if any; or
- Call the function to compute the missing result, and then update the cache before returning the result to the caller

Given enough cache storage this virtually guarantees that function results for a specific set of function arguments will only be computed once.

As soon as we have a cached result we won’t have to re-run the memoized function for the same set of inputs. Instead, we can just fetch the cached result and return it right away.

Let’s Write a Memoization Decorator From Scratch

Next, I’m going to implement the above memoization algorithm as a Python decorator, which is a convenient way to implement generic function wrappers in Python:

A decorator is a function that takes another function as an input and has a function as its output.

This allows us to implement our memoization algorithm in a generic and reusable way. Sounds a little confusing? No worries, we’ll take this step-by-step and it will all become clearer when you see some real code.

Here’s the memoize() decorator that implements the above caching algorithm:

def memoize(func):
    cache = dict()

    def memoized_func(*args):
        if args in cache:
            return cache[args]
        result = func(*args)
        cache[args] = result
        return result

    return memoized_func

This decorator takes a function and returns a wrapped version of the same function that implements the caching logic (memoized_func).

I’m using a Python dictionary as a cache here. In Python, using a key to look-up a value in a dictionary is quick. This makes dict a good choice as the data structure for the function result cache.

Whenever the decorated function gets called, we check if the parameters are already in the cache. If they are, then the cached result is returned. So, instead of re-computing the result, we quickly return it from the cache.

Bam, memoization!

If the result isn’t in the cache, we must update the cache so we can save some time in the future. Therefore, we first compute the missing result, store it in the cache, and then return it to the caller.

[ As I mentioned, decorators are an important concept to master for any intermediate or advanced Python developer. Check out my Python decorators tutorial for a step-by-step introduction if you’d like to know more. ]

Let’s test our memoization decorator out on a recursive Fibonacci sequence function. First, I’ll define a Python function that calculates the n-th Fibonacci number:

def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)

This fibonacci function will serve as an example of an “expensive” computation. Calculating the n-th Fibonacci number this way has O(2^n) time complexity—it takes exponential time to complete.

This makes it quite an expensive function indeed.

Next up, I’m going to do some benchmarking in order to get a feel for how computationally expensive this function is. Python’s built-in timeit module lets me measure the execution time in seconds of an arbitrary Python statement.

Here’s how I’ll measure the execution time of the fibonacci function I just defined using Python’s built-in timeit module:

>>> import timeit
>>> timeit.timeit('fibonacci(35)', globals=globals(), number=1)
5.1729652720096055

As you can see, on my machine, it takes about five seconds to compute the 35th number in the Fibonacci sequence. That’s a pretty slow and expensive operation right there.

⏰ Sidebar: `timeit.timeit` Arguments

Python’s built-in timeit module lets me measure the execution time in seconds of an arbitrary Python statement. Here’s a quick note on the arguments I’m passing to timeit.timeit in the above example:

Because I’m running this benchmark in a Python interpreter (REPL) session I need to set up the environment for this benchmark run by setting globals to the current set of global variables retrieved with the globals() built-in.
By default timeit() will repeat the benchmark several times to make the measured execution time more accurate. But because a single fibonacci(35) call already takes a few seconds to execute I’m limiting the number of executions to one with the number argument. For this experiment I’m interested in ballpark timing figures and millisecond accuracy isn’t needed.

Let’s see if we can speed it up by leveraging the function result caching provided by our memoization decorator:

>>> memoized_fibonacci = memoize(fibonacci)
>>> timeit.timeit('memoized_fibonacci(35)', globals=globals(), number=1)
4.941958484007046

The memoized function still takes about five seconds to return on the first run. So far, so underwhelming…

We’ll get a similar execution time because the first time I ran the memoized function the result cache was cold—we started out with an empty cache which means there were no pre-computed results that could help speed up this function call.

Let’s run our benchmark a second time:

>>> timeit.timeit('memoized_fibonacci(35)', globals=globals(), number=1)
1.9930012058466673e-06

Now we’re talking!

Notice the e-06 suffix at the end of that floating point number? The second run of memoized_fibonacci took only about 2 microseconds to complete. That’s 0.0000019930012058466673 seconds—quite a nice speedup indeed!

Instead of recursively calculating the 35th Fibonacci number our memoize decorator simply fetched the cached result and returned it immediately, and this is what led to the incredible speedup in the second benchmarking run.

Inspecting the Function Results Cache

To really drive home how memoization works “behind the scenes” I want to show you the contents of the function result cache used in the previous example:

>>> memoized_fibonacci.__closure__[0].cell_contents
{(35,): 9227465}

To inspect the cache I reached “inside” the memoized_fibonacci function using its __closure__ attribute. The cache dict is the first local variable and stored in cell 0. I wouldn’t recommend that you use this technique in production code—but here it makes for a nice little debugging trick 🙂

As you can see, the cache dictionary maps the argument tuples for each memoized_fibonacci function call that happened so far to the function result (the n-th Fibonacci number.)

So, for example, (35,) is the argument tuple for the memoized_fibonacci(35) function call and it’s associated with 9227465 which is the 35th Fibonacci number:

>>> fibonacci(35)
9227465

Let’s do a nother little experiment to demonstrate how the function result cache works. I’ll call memoized_fibonacci a few more times to populate the cache and then we’ll inspect its contents again:

>>> memoized_fibonacci(1)
1
>>> memoized_fibonacci(2)
1
>>> memoized_fibonacci(3)
2
>>> memoized_fibonacci(4)
3
>>> memoized_fibonacci(5)
5

>>> memoized_fibonacci.__closure__[0].cell_contents
{(35,): 9227465, (1,): 1, (2,): 1, (3,): 2, (4,): 3, (5,): 5}

As you can see, the cache dictionary now also contains cached results for several other inputs to the memoized_fibonacci function. This allows us to retrieve these results quickly from the cache instead of slowly re-computing them from scratch.

A quick word of warning on the naive caching implementation in our memoize decorator: In this example the cache size is unbounded, which means the cache can grow at will. This is usually not a good idea because it can lead to memory exhaustion bugs in your programs.

With any kind of caching that you use in your programs, it makes sense to put a limit on the amount of data that’s kept in the cache at the same time. This is typically achieved either by having a hard limit on the cache size or by defining an expiration policy that evicts old items from the cache at some point.

Please keep in mind that the memoize function we wrote earlier is a simplified implementation for demonstration purposes. In the next section in this tutorial you’ll see how to use a “production-ready” implementation of the memoization algorithm in your Python programs.

Python Memoization with `functools.lru_cache`

Now that you’ve seen how to implement a memoization function yourself, I’ll show you you can achieve the same result using Python’s functools.lru_cache decorator for added convenience.

One of the things I love the most about Python is that the simplicity and beauty of its syntax goes hand in hand with beauty and simplicity of its philosophy. Python is “batteries included”, which means that Python is bundled with loads of commonly used libraries and modules which are only an import statement away!

I find functools.lru_cache to be a great example of this philosophy. The lru_cache decorator is the Python’s easy to use memoization implementation from the standard library. Once you recognize when to use lru_cache, you can quickly speed up your application with just a few lines of code.

Let’s revisit our Fibonacci sequence example. This time I’ll show you how to add memoization using the functools.lru_cache decorator:

import functools

@functools.lru_cache(maxsize=128)
def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)

Note the maxsize argument I’m passing to lru_cache to limit the number of items stored in the cache at the same time.

Once again I’m using the timeit module to run a simple benchmark so I can get a sense of the performance impact of this optimization:

>>> import timeit
>>> timeit.timeit('fibonacci(35)', globals=globals(), number=1)
3.056201967410743e-05
>>> timeit.timeit('fibonacci(35)', globals=globals(), number=1)
1.554988557472825e-06

You may be wondering why we’re getting the result of the first run so much faster this time around. Shouldn’t the cache be “cold” on the first run as well?

The difference is that, in this example, I applied the @lru_cache decorator at function definition time. This means that recursive calls to fibonacci() are also looked up in the cache this time around.

By decorating the fibonacci() function with the @lru_cache decorator I basically turned it into a dynamic programming solution, where each subproblem is solved just once by storing the subproblem solutions and looking them up from the cache the next time.

This is just a side-effect in this case—but I’m sure you can begin to see the beauty and the power of using a memoization decorator and how helpful a tool it can be to implement other dynamic programming algorithms as well.

Why You Should Prefer `functools.lru_cache`

In general, Python’s memoization implementation provided by functools.lru_cache is much more comprehensive than our ad hoc memoize function, as you can see in the CPython source code.

For example, it provides a handy feature that allows you to retrieve caching statistics with the cache_info method:

>>> fibonacci.cache_info()
CacheInfo(hits=34, misses=36, maxsize=None, currsize=36)

Again, as you can see in the CacheInfo output, Python’s lru_cache() memoized the recursive calls to fibonacci(). When we look at the cache information for the memoized function, you’ll recognize why it is faster than our version on the first run—the cache was hit 34 times.

As I hinted at earlier, functools.lru_cache also allows you to limit the number of cached results with the maxsize parameter. By setting maxsize=None you can force the cache to be unbounded, which I would usually recommend against.

There’s also a typed boolean parameter you can set to True in order to tell the cache that function arguments of different types should be cached separately. For example, fibonacci(35) and fibonacci(35.0) would be treated as distinct calls with distinct results.

Another useful feature is the ability to reset the result cache at any time with the cache_clear method:

>>> fibonacci.cache_clear()
>>> fibonacci.cache_info()
CacheInfo(hits=0, misses=0, maxsize=128, currsize=0)

If you want to learn more about the intricacies of using the lru_cache decorator I recommend that you consult the Python standard library documentation.

In summary, you should never need to roll your own memoizing function. Python’s built-in lru_cache() is readily-available, more comprehensive, and battle-tested.

Caching Caveats – What Can Be Memoized?

Ideally, you will want to memoize functions that are deterministic.

def deterministic_adder(x, y):
    return x + y

Here deterministic_adder() is a deterministic function because it will always return the same result for the same pair of parameters. For example, if you pass 2 and 3 into the function, it will always return 5.

Compare this behavior with the following nondeterministic function:

from datetime import datetime

def nondeterministic_adder(x, y):
    # Check to see if today is Monday (weekday 0)
    if datetime.now().weekday() == 0:
        return x + y + x
    return x + y

This function is nondeterministic because its output for a given input will vary depending on the day of the week: If you run this function on Monday, the cache will return stale data any other day of the week.

Generally I find that any function that updates a record or returns information that changes over time is a poor choice to memoize.

Or, as Phil Karlton puts it:

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

🙂

Memoization in Python: Quick Summary

In this Python tutorial you saw how memoization allows you to optimize a function by caching its output based on the parameters you supply to it.

Once you memoize a function, it will only compute its output once for each set of parameters you call it with. Every call after the first will be quickly retrieved from a cache.

You saw how to write your own memoization decorator from scratch, and why you probably want to use Python’s built-in lru_cache() battle-tested implementation in your production code:

Memoization is a software optimization technique that stores and return the result of a function call based on its parameters.
If your code meets a certain criteria, memoization can be a great method to speed up your application.
You can import a comprehensive memoization function, lru_cache(), from Python’s standard library in the functools module.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Finding Python Projects to Grow Your Programming Skills

Tue, 14 Nov 2017 00:00:00 GMT

Finding Python Projects to Grow Your Programming Skills

Where to find inspiration for Python projects that will help you develop real-world coding skills and lay the foundation of your programming portfolio.

I got this question from a newsletter reader who’s an entry-level Pythonista:

What’s the best way of moving from a basic understanding of Python to working on real projects? And what Python projects should I build? I have no idea which ones would help me grow.

It’s easy to get hung up on this question and to be stuck in “overthinking mode”—

What if you pick the wrong project to work on? What if you’re working on the wrong skills? What if you’d make progress faster by working on something else?

… and so on, and so forth. I’ve been there myself, jumping from one shiny thing to the next looking for a “quick fix” to boost my coding skills. And trust me, constantly doubting your decisions is the quickest way to destroy your forward momentum.

So what should you be doing instead? The trick here is to temporarily ignore all advice that says “re-inventing the wheel” is bad.

It’s true, “re-inventing the wheel disease” is bad for the productivity of experienced developers.

But, it’s actually a godsend for beginning developers who need to get some experience under their belt. So, hear me out: If you’re working on improving your coding skills, you should be re-inventing wheels *a lot*.

Really, go nuts!

Try to re-invent and re-write everything from scratch. Write little GUI calculators, try to write your own text editor, write a “file copy” command-line tool…

Write backup/archiving tools! Write arcade games: Tetris, Snake, Tic-Tac-Toe.

Re-invent it all and copy, copy, copy the user facing designs! You’re not doing this to steal someone’s business or app idea—but to understand how small real-world projects work behind the scenes.

The smaller in scope the project, the better. You want to focus on copying small “commodity” software that’s around you every day:

How many standard UNIX command-line tools like cp, cat, and ls can you write from scratch in an afternoon? And feel free to cut corners—maybe your “cp” command can only copy files and not directories…That’s fine!

Just get something out the door. I promise you’re going to learn something. And even if you fail at first, this approach constantly creates new questions you can then set out to answer.

These questions will be your “learning compass” and give you directions on what to focus on next.

So, can you do one of these little projects a day and keep up the pace for a week, a month? There’s no doubt in my mind that your Python skills will massively improve if you re-implement one of these small tools a day from scratch.

In summary: Action, Action, Action!

Leave a comment below and let me know which tool or app you’re going to “re-invent” with Python 🙂

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Interfacing Python and C: Advanced “ctypes” Features

Tue, 07 Nov 2017 00:00:00 GMT

Interfacing Python and C: Advanced “ctypes” Features

Learn advanced patterns for interfacing Python with native libraries, like dealing with C structs from Python and pass-by-value versus pass-by-reference semantics.

The built-in ctypes module is a powerful feature in Python, allowing you to use existing libraries in other languages by writting simple wrappers in Python itself.

In the first part of this tutorial, we covered the basics of ctypes. In part two we will dig a little deeper, covering:

Creating simple Python classes to mirror C structures
Dealing with C pointers in Python: Pass-by-value vs Pass-by-reference
Expanding our C structure wrappers to hide complexity from Python code
Interacting with nested C structures from Python

Again, let’s start by taking a look with the simple C library we will be using and how to build it, and then jump into loading a C library and calling functions in it.

Interfacing Python and C: The C Library Testbed

As with the previous tutorial, all of the code to build and test the examples discussed here (as well as the Markdown for this article) are committed to my GitHub repository.

The library consists of two data structures: Point and Line. A Point is a pair of (x, y) coordinates while a Line has a start and end point. There are also a handful of functions which modify each of these types.

Let’s take a closer look at the Point structure and the functions surrounding it. Here’s the corresponding C code split into a Point.h header file and a Point.c implementation:

/* Point.h */
/* Simple structure for ctypes example */
typedef struct {
    int x;
    int y;
} Point;

/* Point.c */
/* Display a Point value */
void show_point(Point point) {
    printf("Point in C      is (%d, %d)\n", point.x, point.y);
}

/* Increment a Point which was passed by value */
void move_point(Point point) {
    show_point(point);
    point.x++;
    point.y++;
    show_point(point);
}

/* Increment a Point which was passed by reference */
void move_point_by_ref(Point *point) {
    show_point(*point);
    point->x++;
    point->y++;
    show_point(*point);
}

/* Return by value */
Point get_point(void) {
    static int counter = 0;
    Point point = { counter++, counter++ };
    printf("Returning Point    (%d, %d)\n", point.x, point.y);
    return point;
}

I won’t go into each of these functions in detail as they are fairly straightforward. The most interesting bit here is the difference between move_point and move_point_by_ref. We’ll talk a bit later about this when we discuss pass-by-value and pass-by-reference semantics.

We’ll also be using a Line structure, which is composed of two Points:

/* Line.h */
/* Compound C structure for our ctypes example */
typedef struct {
    Point start;
    Point end;
} Line;

/* Line.c */
void show_line(Line line) {
    printf("Line in C      is (%d, %d)->(%d, %d)\n",
           line.start.x, line.start.y,
           line.end.x, line.end.y);
}

void move_line_by_ref(Line *line) {
    show_line(*line);
    move_point_by_ref(&line->start);
    move_point_by_ref(&line->end);
    show_line(*line);
}

Line get_line(void) {
    Line l = { get_point(), get_point() };
    return l;
}

The Point structure and its associated functions will allow us to show how to wrap structures and deal with memory references in ctypes. The Line structure will allow us to work with nested structures and the complications that arise from that.

The Makefile in the repo is set up to completely build and run the demo from scratch:

all: point wrappedPoint line

clean:
    rm *.o *.so

libpoint.so: Point.o
    gcc -shared $^ -o $@

libline.so: Point.o Line.o
    gcc -shared $^ -o $@

.o: .c
    gcc -c -Wall -Werror -fpic $^

point: libpoint.so
    ./testPoint.py

wrappedPoint: libpoint.so
    ./testWrappedPoint.py

line: libline.so
    ./testLine.py

doc:
    pandoc ctypes2.md > ctypes2.html
    firefox ctypes2.html

To build and run the demo you only need to run the following command in your shell:

$ make

Creating Simple Python Classes to Mirror C Structures

Now that we’ve seen the C code we’ll be using, we can start in on Python and ctypes. We’ll start with a quick wrapper function that will simplify the rest of our code, then we’ll look at how to wrap C structures. Finally, we’ll discuss dealing with C pointers from Python and the differences between pass-by-value and pass-by-reference.

Wrapping `ctypes` Functions

Before we get into the depths of this tutorial, I’ll show you a utility function we’ll be using throughout. This Python function is called wrap_function. It takes the object returned from ctypes.CDLL and the name of a function (as a string). It returns a Python object which holds the function and the specified restype and argtypes:

def wrap_function(lib, funcname, restype, argtypes):
    """Simplify wrapping ctypes functions"""
    func = lib.__getattr__(funcname)
    func.restype = restype
    func.argtypes = argtypes
    return func

These are concepts covered in my previous ctypes tutorial, so if this doesn’t make sense, it might be worth reviewing part one again.

Mirroring C Structures with Python Classes

Creating Python classes which mirror C structs requires little code, but does have a little magic behind the scenes:

class Point(ctypes.Structure):
    _fields_ = [('x', ctypes.c_int), ('y', ctypes.c_int)]

    def __repr__(self):
        return '({0}, {1})'.format(self.x, self.y)

As you can see above, we make use of the _fields_ attribute of the class. Please note the single underscore—this is not a “dunder” function. This attribute is a list of tuples and allows ctypes to map attributes from Python back to the underlying C structure.

Let’s look at how it’s used:

>>> libc = ctypes.CDLL('./libpoint.so')
>>> show_point = wrap_function(libc, 'show_point', None, [Point])
>>> p = Point(1, 2)
>>> show_point(p)
'(1, 2)'

Notice that we can access the x and y attributes of the Point class in Python in the __repr__ function. We can also pass the Point directly to the show_point function in the C library. Ctypes uses the _fields_ map to manage the conversions automatically for you. Care should be taken with using the _fields_ attribute, however. We’ll look at this in a little more detail in the nested structures section below.

Pass-by-value vs Pass-by-reference (pointers)

In Python we get used to referring to things as either mutable or immutable. This controls what happens when you modify an object you’ve passed to a function. For example, number objects are immutable. When you call myfunc in the code below, the value of y does not get modified. The program prints the value 9:

def myfunc(x):
    x = x + 2

y = 9
myfunc(y)
print("this is y", y)

Contrarily, list objects are mutable. In a similar function:

def mylistfunc(x):
    x.append("more data")

z = list()
mylistfunc(z)
print("this is z", z)

As you can see, the list, z, that is passed in to the function is modified and the output is this is z ['more data']

When interfacing with C, we need to take this concept a step further. When we pass a parameter to a function, C always “passes by value”. What this means is that, unless you pass in a pointer to an object, the original object is never changed. Applying this to ctypes, we need to be aware of which values are being passed as pointers and thus need the ctypes.POINTER(Point) type applied to them.

In the example below, we have two versions of the function to move a point: move_point, which passes by value, and move_point_by_ref which passes by reference.

# --- Pass by value ---
print("Pass by value")
move_point = wrap_function(libc, 'move_point', None, [Point])
a = Point(5, 6)
print("Point in Python is", a)
move_point(a)
print("Point in Python is", a)
print()

# --- Pass by reference ---
print("Pass by reference")
move_point_by_ref = wrap_function(libc, 'move_point_by_ref', None,
                                  [ctypes.POINTER(Point)])
a = Point(5, 6)
print("Point in Python is", a)
move_point_by_ref(a)
print("Point in Python is", a)
print()

The output from these two code sections looks like this:

Pass by value
Point in Python is (5, 6)
Point in C      is (5, 6)
Point in C      is (6, 7)
Point in Python is (5, 6)

Pass by reference
Point in Python is (5, 6)
Point in C      is (5, 6)
Point in C      is (6, 7)
Point in Python is (6, 7)

As you can see, when we call move_point, the C code can change the value of the Point, but that change is not reflected in the Python object. When we call move_point_by_ref, however, the change is visible in the Python object. This is because we passed the address of the memory which holds that value and the C code took special care (via using the -> accessor) to modify that memory.

When working in cross-language interfaces, memory access and memory management are important aspects to keep in mind.

Accessing C Structs from Python – An OOP Wrapper

We saw above that providing a simple wrapper to a C structure is quite easy using ctypes. We can also expand this wrapper to make it behave like a “proper” Python class instead of a C struct using object-oriented programming principles.

Here’s an example:

class Point(ctypes.Structure):
    _fields_ = [('x', ctypes.c_int), ('y', ctypes.c_int)]

    def __init__(self, lib, x=None, y=None):
        if x:
            self.x = x
            self.y = y
        else:
            get_point = wrap_function(lib, 'get_point', Point, None)
            self = get_point()

        self.show_point_func = wrap_function(lib, 'show_point', None, [Point])
        self.move_point_func = wrap_function(lib, 'move_point', None, [Point])
        self.move_point_ref_func = wrap_function(lib, 'move_point_by_ref', None,
                                                 [ctypes.POINTER(Point)])

    def __repr__(self):
        return '({0}, {1})'.format(self.x, self.y)

    def show_point(self):
        self.show_point_func(self)

    def move_point(self):
        self.move_point_func(self)

    def move_point_by_ref(self):
        self.move_point_ref_func(self)

You’ll see the _fields_ and __repr__ attributes are the same as we had in our simple wrapper, but now we’ve added a constructor and wrapping functions for each method we’ll use.

The interesting code is all in the constructor. The initial part initializes the x and y fields. You can see that we have two methods to achieve this. If the user passed in values, we can directly assign those to the fields. If the default values were used, we call the get_point function in the library and assign that directly to self.

Once we’ve initialized the fields in our Point class, we then wrap the functions into attributes of our class to allow them to be accessed in a more object oriented manner.

In the testWrappedPoint module, we do the same tests we did with our Point class but instead of passing the Point class to the function, move_point_by_ref(a), we call the function on the object a.move_point_by_ref().

Accessing Nested C Structures From Python

Finally, we’re going to look at how to use nested structures in ctypes. The obvious next step in our example is to extend a Point to a Line:

class Line(ctypes.Structure):
    _fields_ = [('start', testPoint.Point), ('end', testPoint.Point)]

    def __init__(self, lib):
        get_line = wrap_function(lib, 'get_line', Line, None)
        line = get_line()
        self.start = line.start
        self.end = line.end
        self.show_line_func = wrap_function(lib, 'show_line', None, [Line])
        self.move_line_func = wrap_function(lib, 'move_line_by_ref', None,
                                            [ctypes.POINTER(Line)])

    def __repr__(self):
        return '{0}->{1}'.format(self.start, self.end)

    def show_line(self):
        self.show_line_func(self)

    def moveLine(self):
        self.move_line_func(self)

Most of this class should look fairly familiar if you’ve been following along. The one interesting difference is how we initialize the _fields_ attribute. You’ll remember in the Point class we could assign the returned value from get_point() directly to self. This doesn’t work with our Line wrapper as the entries in the _fields_ list are not basic CTypes types, but rather a subclass of one of them. Assigning these directly tends to mess up how the value is stored so that the Python attributes you add to the class are inaccessible.

The basic rule I’ve found in wrapping structures like this is to only add the Python class attributes at the top level and leave the inner structures (i.e. Point) with the simple _fields_ attribute.

Advanced ctypes Features – Conclusion

In this tutorial we covered some more advanced topics in using the ctypes module to interface Python with external C libraries. I found several resources out there while researching:

The ctypesgen project has tools which will auto generate Python wrapping modules for C header files. I spent some time playing with this and it looks quite good.
The idea for the wrap_function function was lifted shamelessly from some ctypes tips here.

In the first part of this tutorial, we covered the basics of ctypes, so be sure to check there if you’re looking for a ctypes primer. And, finally, if you’d like to see and play with the code I wrote while working on this, please visit my GitHub repository. This tutorial is in the tutorial2 directory.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python Tricks: The Book Launch FAQ

Wed, 01 Nov 2017 00:00:00 GMT

Python Tricks: The Book Launch FAQ

I’m getting a ton of emails, Tweets, and YouTube comments from people with questions about my new book. Here’s a quick Q&A to keep you informed and my carpal tunnel happy.

Alright, I’m getting a ton (and I mean a ton) of emails, Tweets, and YouTube comments from people with questions about my new book.

(In case you haven’t heard about it, it’s called Python Tricks: A Buffet of Awesome Python Features, it’s ranked #1 on Amazon in Python programming books and several other categories, and you can get it on a 33% launch discount if you buy before Thursday night.)

👉 Click here to get your print copy on Amazon

Some of these questions are about the availability of the book on Amazon. This is the first time I’m launching a product on Amazon and apparently some things are not as straightforward as I thought.

So to make sure you get your questions answered (and to keep my carpal tunnel happy) I put together the following Q&A:

1. Can I get a sample chapter and the table of contents for the book?

Yep, you can check out the full table of contents and read a sample of the full book’s contents on the Amazon page for the book.

Just click on the book cover where it says “Look Inside” and you’ll find it. (If the Amazon feature doesn’t work please try the PDF sample.)

2. How can I get the launch price? The 33% discount link seems to not provide any discount

Amazon doesn’t allow authors to control discounts for paperback books, so basically what I did to work around this was set the price lower this week for the launch.

Tomorrow I’ll manually raise it to the regular price ($29.99) to give everyone a chance to get a copy at the launch price.

Pricing on Amazon is quite strange by the way, this morning I saw they actually decided to change the price below my target launch price that I entered. But hey, that’s your chance to save another dollar today ;-)

3. Your Amazon link throws a 404 error! / I live in $COUNTRY and Amazon says the book isn’t available for purchase 🙁

This is another Amazon quirk that I discovered… I’m publishing the book through Amazon’s own printing service and they handle the global distribution. My understanding was that this would include all country-specific stores worldwide and from the get go.

But once again it turns out things are a little more complicated than that…

Right now, the book is available on the following country-specific stores:

Unfortunately, the paperback version is not yet available on: Amazon.in, Amazon.cn, Amazon.co.jp, Amazon.com.br, Amazon.com.mx, Amazon.com.au, and Amazon.nl.

Amazon says it can take up to 8 weeks for the paperback to be available on the remaining “extended distribution” stores.

I know a lot of you have emailed me from India and Brazil asking about the book and I would’ve loved to get you the paperback right now.

The next best thing I can offer you at the moment is getting the digital version from my website. I’ll let you know if there’s any news on worldwide availability of the paperback!

4. Does the paperback version include the bonus video content?

Yes it does! There’s a special URL in the paperback version where you can get access to the bonus material.

These bonuses include two hours of tutorial videos that go hand-in-hand with select chapters in the book and help reinforce the key points.

5. What about a Kindle version?

Right now only the paperback is available on Amazon. I’m working on getting the Kindle version out on Amazon soon, guaranteeing you easy synchronization between all of your reading devices.

When the Kindle version comes out on Amazon I’ll set up a “Kindle Match” deal so you can upgrade your paperback purchase to paperback + Kindle. Will let you know when that’s ready!

If you want a digital copy (PDF, ePub, Mobi) right now you can purchase it on my website.

6. Is the book worth looking at for beginners?

Great question, some thoughts on that: Python Tricks isn’t a step-by-step Python tutorial. It’s not an entry-level Python book. If you’re in the beginning stages of learning Python, this book alone won’t transform you into a professional Python developer.

Reading it will still be beneficial to you, but you need to make sure you’re working with some other resources to build up your foundational Python skills.

Here’s how you can judge for yourself:

Open up the Amazon page and click the book cover (“Look Inside”). Skim through a few of the sample pages and see if you can make sense of it or if you feel like you’re “getting thrown into the deep end” :-)

I wrote this book to take you beyond the basics of the language and towards the point where you’re comfortable writing “developer style” Python.

I spent a lot of time trying to make my explanations logical and easy to follow, but in the end you’ll have to decide for yourself if my book works for you.

Edit: Well, or just listen to my friend Ram here—

@dbader_org book is definitely for beginners also. https://t.co/L0RpyQtk5U
— Ram Meena (@rammeena) November 1, 2017

😃

That’s it for the questions.

The launch price ends tomorrow.

At midnight the price for the paperback on Amazon goes up to $29.99.

I have no plans to ever offer the paperback at this price again, so if you’re on the fence now is the time to act.

Jump on this deal before it’s over:

👉 Get your print copy on Amazon

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Announcing Python Tricks: The Book!

Tue, 31 Oct 2017 00:00:00 GMT

Announcing Python Tricks: The Book!

It’s finally here! I’m super excited to announce the release of my book “Python Tricks: A Buffet of Awesome Python Features”

It’s been a crazy and awesome couple of weeks to get this 300 page book ready for release. I’ve spent almost a year writing and editing this book and I still can’t believe I finally got it published on Amazon.

It all started with an email I received from a newsletter member and the following statement that somehow got stuck in my mind:

“I don’t even feel like I’ve scratched the surface of what I can do with Python”

I wrote this book for anyone who feels the same way about Python:

If you’re wondering which lesser known parts in Python you should know about, you’ll get a roadmap with this book. Discover cool (yet practical!) Python tricks and blow your coworkers’ minds in your next code review.
If you’ve got experience with legacy versions of Python, the book will get you up to speed with modern patterns and features introduced in Python 3 and backported to Python 2.
If you’ve worked with other programming languages and you want to get up to speed with Python, you’ll pick up the idioms and practical tips you need to become a confident and effective Pythonista.
If you want to make Python your own and learn how to write clean and Pythonic code, you’ll discover best practices and little-known tricks to round out your knowledge.

With Python Tricks: The Book you’ll discover Python’s best practices and the power of beautiful & Pythonic code with simple examples and a step-by-step narrative.

You’ll get one step closer to mastering Python, so you can write beautiful and idiomatic code that comes to you naturally.

Learning the ins and outs of Python is difficult—and with this book you’ll be able to focus on the practical skills that matter. Discover “hidden gold” in Python’s standard library and start writing clean and Pythonic code today.

Update – 2017-11-01

Check this out:

Python Tricks: A Buffet of Awesome Python Features is the #1 Amazon bestseller right now in Programming Languages…

It’s also the #1 bestseller in Python Programming:

The #1 bestseller in Web Programming,

And the #1 bestselling new release in Computer Programming.

That’s absolutely insane! My mind == officially blown! If you haven’t bought your copy of Python Tricks yet, now is the time. The paperback version is still available at a 33% launch discount, before the price goes up to $29.99:

👉 Get your print copy on Amazon

Launch Price

The books is currently on a 33% off launch discount for the book + bonus videos package. In a few days the price will go up to $29.99. So be sure to act quick and grab your copy before then:

👉 Click here to order a print copy on Amazon

– or –

👉 Click here to get a digital copy from my online store

Thank You

The feedback I received from the dbader.org community and from PythonistaCafe members was priceless—and I’m extremely happy with how the final book turned out.

A big ❤️ Thank You ❤️ to everyone who sent me feedback, offered their criticism, or called me out on a typo!

Also, huge thanks to my friend Mariatta Wijaya for the amazing foreword she contributed to the book. I couldn’t have hoped for a better introduction for my readers 🙂

Can you help me?

Writing this book was a true labor of love for me and now I want to get it into the hands of as many Pythonistas as possible.

I can’t do this without your help. With Python training now being my primary income, it would mean the world to me if you could share the book with a friend, co-worker, or your favorite Slack room:

Tweet, Facebook share, or send a friend to dbader.org/pytricks-book.

Thank you and enjoy the book!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python Parallel Computing (in 60 Seconds or less)

Tue, 24 Oct 2017 00:00:00 GMT

Python Parallel Computing (in 60 Seconds or less)

If your Python programs are slower than you’d like you can often speed them up by parallelizing them. In this short primer you’ll learn the basics of parallel processing in Python 2 and 3.

Basically, parallel computing allows you to carry out many calculations at the same time, thus reducing the amount of time it takes to run your program to completion.

I know, this sounds fairly vague and complicated somehow…but bear with me for the next 50 seconds or so.

Here’s an end-to-end example of parallel computing in Python 2/3, using only tools built into the Python standard library—

Ready? Go!

First, we need to do some setup work. We’ll import the collections and the multiprocessing module so we can use Python’s parallel computing facilities and define the data structure we’ll work with:

import collections
import multiprocessing

Second, we’ll use collections.namedtuple to define a new (immutable) data type we can use to represent our data set, a collection of scientists:

Scientist = collections.namedtuple('Scientist', [
    'name',
    'born',
])

scientists = (
    Scientist(name='Ada Lovelace', born=1815),
    Scientist(name='Emmy Noether', born=1882),
    Scientist(name='Marie Curie', born=1867),
    Scientist(name='Tu Youyou', born=1930),
    Scientist(name='Ada Yonath', born=1939),
    Scientist(name='Vera Rubin', born=1928),
    Scientist(name='Sally Ride', born=1951),
)

Third, we’ll write a “data processing function” that accepts a scientist object and returns a dictionary containing the scientist’s name and their calculated age:

def process_item(item):
    return {
        'name': item.name,
        'age': 2017 - item.born
    }

The process_item() function just represents a simple data transformation to keep this example short and sweet—but you could easily swap it out with a more complex computation.

(20 seconds remaining)

Fourth, and this is where the real parallelization magic happens, we’ll set up a multiprocessing pool that allows us to spread our calculations across all available CPU cores.

Then we call the pool’s map() method to apply our process_item() function to all scientist objects, in parallel batches:

pool = multiprocessing.Pool()
result = pool.map(process_item, scientists)

Note how batching and distributing the work across multiple CPU cores, performing the work, and collecting the results are all handled by the multiprocessing pool. How great is that?

The only caveat is that the function you pass to map() must be picklable. That is, it must be possible to serialize the function using Python’s built-in pickle module, otherwise the map() call will fail.

Fifth, we’re all done here with about 5 seconds remaining—

Let’s print the results of our data transformation to the console so we can make sure the program did what it was supposed to:

print(tuple(result))

That’s the end of our little program. And here’s what you should expect to see printed out on your console:

({'name': 'Ada Lovelace', 'age': 202},
 {'name': 'Emmy Noether', 'age': 135},
 {'name': 'Marie Curie', 'age': 150},
 {'name': 'Tu Youyou', 'age': 87},
 {'name': 'Ada Yonath', 'age': 78},
 {'name': 'Vera Rubin', 'age': 89},
 {'name': 'Sally Ride', 'age': 66})

Isn’t Python just lovely?

Now, obviously I took some shortcuts here and picked an example that made parallelization seem effortless—

But, I stand by the lessons learned here:

If you know how to structure and represent your data, parallelization is convenient and feels completely natural. Any Pythonista should pick up the basics of functional programming for this reason.
Python is a joy to work with and eminently suitable for these kinds of programming tasks.

Additional Learning Resources

We only scratched the surface here with this quick primer on parallel processing using Python. If you’d like to dig deeper into this subject, then check out the following two videos in my “Functional Programming in Python” tutorial series:

Full Example Source Code

Here’s the complete source code for this example if you’d like to use it as a basis for your own experiments.

Please note that you might encounter some issues running this multiprocessing example from inside a Jupyter notebook. The best way to get around that is to save this code in a standalone .py file and to run it from the command-line using the Python interpreter.

"""
Python Parallel Processing (in 60 seconds or less)
https://dbader.org/blog/python-parallel-computing-in-60-seconds
"""
import collections
import multiprocessing

Scientist = collections.namedtuple('Scientist', [
    'name',
    'born',
])

scientists = (
    Scientist(name='Ada Lovelace', born=1815),
    Scientist(name='Emmy Noether', born=1882),
    Scientist(name='Marie Curie', born=1867),
    Scientist(name='Tu Youyou', born=1930),
    Scientist(name='Ada Yonath', born=1939),
    Scientist(name='Vera Rubin', born=1928),
    Scientist(name='Sally Ride', born=1951),
)

def process_item(item):
    return {
        'name': item.name,
        'age': 2017 - item.born
    }

pool = multiprocessing.Pool()
result = pool.map(process_item, scientists)

print(tuple(result))

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Writing a Domain Specific Language (DSL) in Python

Tue, 17 Oct 2017 00:00:00 GMT

Writing a Domain Specific Language (DSL) in Python

Learn how to create your own Domain Specific Language with Python from scratch with this step-by-step tutorial.

A Domain Specific Language, or DSL for short, is a language that’s specialized to a particular application domain. In other words, it’s a programming language that’s used for a more specific application or use case than a general-purpose language like Python.

For example, regular expressions are a DSL. Another widely-used DSL is SQL. As you can see, DSLs run the gamut from the complex, like regular expressions, to the simple and very niche variety we’re going to create in this tutorial.

To give you an idea of how simple they can be, let’s take a sneak peek at what our DSL written in Python will look like:

# This is a comment
module1 add 1 2
module2 sub 12 7
module1 print_results

With the DSL you’ll create in this tutorial you’ll be able to call Python functions and pass arguments to them using a syntax that resembles assembly language.

Blank lines or comment lines that start with “#” are ignored, just like Python. Any other line starts with the module name, then the function name followed by its arguments, separated by spaces.

As you’ll see in the course of this tutorial, even a simple language like this can offer a lot of flexibility and make your Python applications “scriptable.”

What You’ll Learn in This Tutorial

Writing a Domain Specific Language (DSL) may sound difficult—like something that’s really hard and should only be done by advanced programmers. Perhaps you haven’t heard of a DSL before. Or you’re not sure what one is.

If so, then this tutorial is for you. This isn’t a subject reserved for advanced programmers. A DSL doesn’t have to be complex or involve studying parser theory and abstract syntax trees.

We’re going to write a simple DSL in Python that’s generic in nature that uses other Python source files to do some work. It’s simple and generic for a reason. I want to show you how easy it is to use Python to write a DSL that you can adapt for your own use in your projects.

Even if you don’t have a direct use for a DSL today, you may pick up some new ideas or bits of the language that you haven’t seen before. We’ll look at:

dynamically importing Python modules at runtime
using getatttr() to access an object’s attributes
using variable-length function arguments and keyword arguments
converting strings to other data types

Defining Your Own Programming Language

Our DSL is a language that’s used to run Python code to perform some work. The work that’s done is completely arbitrary. It can be whatever you decide is appropriate to expose to the user that helps them accomplish their work. Also, the users of our DSL aren’t necessarily Python programmers. They just know that they have work to get done via our DSL.

It’s up to the user to decide what they need to accomplish and therefore write in the DSL source file. All the user knows is they have been provided a library of functionality, or commands, that they can run using the DSL.

For writing our DSL, we’ll start with the simplest implementation possible and incrementally add functionality. Each version of the source files you’ll see for Python and our DSL will have the same version suffix added to it.

So our first implementation will have the source files “dsl1.py”, “src1.dsl” and “module1.py”. The second version with additional functionality will end with “2” and so on.

In summary, we’ll end up with the following naming scheme for our files:

“src1.dsl” is the DSL source file that users write. This is not Python code but contains code written in our custom DSL.
“dsl1.py” is the Python source file that contains the implementation of our domain specific language.
“module1.py” contains the Python code that users will call and execute indirectly via our DSL.

If you ever get stuck, you can find the full source code for this tutorial on GitHub.

DSL Version 1: Getting Started

Let’s make this more concrete by deciding what the first version of our DSL will be able to do. What’s the simplest version we could make?

Since the users need to be able to run our Python code, they need to be able to specify the module name, function name and any arguments the function might accept. So the first version of our DSL will look like this:

# src1.dsl
module1 add 1 2

Blank lines or comment lines that start with “#” are ignored, just like Python. Any other line starts with the module name, then the function name followed by its arguments, separated by spaces.

Python makes this easy by simply reading the DSL source file line by line and using string methods. Let’s do that:

# dsl1.py

#!/usr/bin/env python3
import sys

# The source file is the 1st argument to the script
if len(sys.argv) != 2:
    print('usage: %s <src.dsl>' % sys.argv[0])
    sys.exit(1)

with open(sys.argv[1], 'r') as file:
    for line in file:
        line = line.strip()
        if not line or line[0] == '#':
            continue
        parts = line.split()
        print(parts)

Running “dsl1.py” from the command-line will lead to the following result:

$ dsl1.py src1.dsl
['module1', 'add', '1', '2']

If you’re using macOS or Linux, remember to make “dsl1.py” executable if it’s not already. This will allow you to run your application as a command-line command.

You can do this from your shell by running chmod +x dsl1.py. For Windows, it should work with a default Python installation. If you run into errors, check the Python FAQ.

With just a few lines of code, we were able to get a list of tokens from a line in our source file. These token values, in the list “parts”, represent the module name, function name and function arguments. Now that we have these values, we can call the function in our module with its arguments.

Importing a Python Module at Runtime

But this brings up a new challenge. How do we import a module in Python if we don’t know the module name ahead of time? Typically, when we’re writing code, we know the module name we want to import and just enter import module1.

But with our DSL, we have the module name as the first item in a list as a string value. How do we use this?

The answer is we use can use importlib from the standard library to dynamically import the module at runtime. So let’s dynamically import our module next by adding the following line at the top of “dsl1.py” right under import sys:

import importlib

Before the with block you’ll want to add another line to tell Python where to import modules from:

sys.path.insert(0, '/Users/nathan/code/dsl/modules')

The sys.path.insert() line is necessary so Python knows where to find the directory that contains the modules that make up our library. Adjust this path as needed for your application so it references the directory where Python modules are saved.

Then, at the end of the file, insert the following lines of code:

mod = importlib.import_module(parts[0])
print(mod)

After making these changes, “dsl1.py” will look as follows:

# dsl1.py -- Updated

#!/usr/bin/env python3
import sys
import importlib

# The source file is the 1st argument to the script
if len(sys.argv) != 2:
    print('usage: %s <src.dsl>' % sys.argv[0])
    sys.exit(1)

sys.path.insert(0, '/Users/nathan/code/dsl/modules')

with open(sys.argv[1], 'r') as file:
    for line in file:
        line = line.strip()
        if not line or line[0] == '#':
            continue
        parts = line.split()
        print(parts)

        mod = importlib.import_module(parts[0])
        print(mod)

Now if we run “dsl1.py” from the command-line again, it will lead to the following result and printout:

$ dsl1.py src1.dsl
['module1', 'add', '1', '2']
<module 'module1' from '/Users/nathan/code/dsl/modules/module1.py'>

Great–we just imported a Python module dynamically at runtime using the importlib module from the standard library.

Additional importlib Learning Resources

To learn more about importlib and how you can benefit from using it in your programs, check out the following resources:

See the Python docs for more information regarding importlib
And also Doug Hellmann’s PyMOTW article
For an alternative approach to using importlib, see runpy
Python Plugin System: Load Modules Dynamically With importlib (video tutorial)

Invoking Code

Now that we’ve imported the module dynamically and have a reference to the module stored in a variable called mod, we can invoke (call) the specified function with its arguments. At the end of “dsl1.py”, let’s add the following line of code:

getattr(mod, parts[1])(parts[2], parts[3])

This may look a little odd. What’s happening here?

We need to get a reference to the function object in the module in order to call it. We can do this by using getattr with the module reference. This is the same idea as using import_module to dynamically get a reference to the module.

Passing the module to getattr and the name of the function returns a reference to the module’s add function object. We then call the function by using parentheses and passing the arguments along, the last two items in the list.

Remember, everything in Python is an object. And objects have attributes. So it follows that we’d be able to access a module dynamically at runtime using getattr to access its attributes. For more information, see getattr in the Python docs.

Let’s look at “module1.py”:

# module1.py

def add(a, b):
    print(a + b)

If we run “dsl1.py src1.dsl” now, what will the output be? “3”? Let’s see:

$ dsl1.py src1.dsl
['module1', 'add', '1', '2']
<module 'module1' from '/Users/nathan/code/dsl/modules/module1.py'>
12

Wait, “12”? How did that happen? Shouldn’t the output be “3”?

This is easy to miss at first and may or may not be what you want. It depends on your application. Our arguments to the add function were strings. So Python dutifully concatenated them and returned the string “12”.

This brings us to a higher level question and something that’s more difficult. How should our DSL handle arguments of different types? What if a user needs to work with integers?

One option would be to have two add functions, e.g. add_str and add_int. add_int would convert the string parameters to integers:

print(int(a) + int(b))

Another option would be for the user to specify what types they’re working with and have that be an argument in the DSL:

module1 add int 1 2

What decisions you make in regards to your DSL’s syntax and how it functions depends on your application and what your users need to accomplish. What we’ve seen so far is, of course, a simple example, but the dynamic nature of Python is powerful.

In other words, Python’s built-in features can take you a long way; without having to write a lot of custom code. We’ll explore this more next in version 2 of our DSL.

You can find the final version of “dsl1.py” here on GitHub.

DSL Version 2: Parsing Arguments

Let’s move on to version 2 and make things more general and flexible for our users. Instead of hardcoding the arguments, we’ll let them pass any number of arguments. Let’s look at the new DSL source file:

# src2.dsl
module2 add_str foo bar baz debug=1 trace=0
module2 add_num 1 2 3 type=int
module2 add_num 1 2 3.0 type=float

We’ll add a function that splits the DSL arguments into an “args” list and a “kwargs” dictionary that we can pass to our module functions:

def get_args(dsl_args):
    """return args, kwargs"""
    args = []
    kwargs = {}
    for dsl_arg in dsl_args:
        if '=' in dsl_arg:
            k, v = dsl_arg.split('=', 1)
            kwargs[k] = v
        else:
            args.append(dsl_arg)
    return args, kwargs

This get_args function we just wrote can be used as follows:

args, kwargs = get_args(parts[2:])
getattr(mod, parts[1])(*args, **kwargs)

After calling get_args, we’ll have an arguments list and a keyword arguments dictionary. All that’s left to do is change our module function signatures to accept *args and **kwargs and update our code to use the new values.

From within our module’s function, *args is a tuple and **kwargs is a dictionary. Here’s the new generalized code for “module2.py” that uses these new values:

# module2.py

def add_str(*args, **kwargs):
    kwargs_list = ['%s=%s' % (k, kwargs[k]) for k in kwargs]
    print(''.join(args), ','.join(kwargs_list))

def add_num(*args, **kwargs):
    t = globals()['__builtins__'][kwargs['type']]
    print(sum(map(t, args)))

In add_str, kwargs_list is a list that’s created using a list comprehension. If you haven’t seen this before, a list comprehension creates a list using an expressive and convenient syntax.

We simply loop over the keys in the dictionary (for k in kwargs) and create a string representing each key/value pair in the dictionary. We then print the result of joining the list of arguments with an empty string and the result of joining the list of keyword arguments with “,“:

foobarbaz debug=1,trace=0

For more on list comprehensions, see this tutorial: “Comprehending Python’s Comprehensions”.

With add_num, we decided to give the user a little more power. Since they need to add numbers of specific types (int or float), we need to handle the string conversion somehow.

We call globals() to get a dictionary of references to Python’s global variables. This gives us access to the __builtins__ key/value which in turn gives us access to the classes and constructors for “int” and “float”.

This allows the user to specify the type conversion for the string values passed in our DSL source file “src2.dsl”, e.g. “type=int”. The type conversion is done in one step for all arguments in the call to map and its output is fed to sum.

The map() function takes a function and an iterable and calls the function for each item in the iterable, capturing its output. Think of it as a way of transforming a sequence of values into new values. If it’s not clear and it’s too much on one line, break it into two lines for clarity:

converted_types = map(t, args)  # t is class "int" or "float"
print(sum(converted_types))

For the DSL source lines:

module2 add_num 1 2 3 type=int
module2 add_num 1 2 3.0 type=float

We get the output:

6
6.0

Users can now pass any number of arguments to our functions. What I think is particularly helpful is the use of **kwargs, the keyword arguments dictionary.

Users can call our functions with keywords from the DSL, passing options, just like they’d do if they were Python programmers or running programs from the command line. Keywords are also a form of micro-documentation and serve as reminders for what’s possible. For best results, try to pick succinct and descriptive names for your keyword arguments.

Once again you can find the final version of “dsl2.py” on GitHub.

DSL Version 3: Adding Documentation

Let’s add one more feature to help our users and create version 3. They need some documentation. They need a way to discover the functionality provided by the library of modules.

We’ll add this feature by adding a new command line option in “dsl3.py” and checking the modules and their functions for docstrings. Python docstrings are string literals that appear as the first line of a module, function, class or method definition. The convention is to use triple-quoted strings like this:

def function_name():
    """A helpful docstring."""
    # Function body

When users pass “help=module3” on the command line to “dsl3.py”, the get_help function is called with “module3”:

def get_help(module_name):
    mod = importlib.import_module(module_name)
    print(mod.__doc__ or '')
    for name in dir(mod):
        if not name.startswith('_'):
            attr = getattr(mod, name)
            print(attr.__name__)
            print(attr.__doc__ or '', '\n')

In get_help, the module is dynamically imported using import_module like we’ve done before. Next we check for the presence of a docstring value using the attribute name __doc__ on the module.

Then we need to check all functions in the module for a docstring. To do this we’ll use the built-in function “dir”. “dir” returns a list of all attribute names for an object. So we can simply loop over all the attribute names in the module, filter out any private or special names that begin with “_” and print the function’s name and docstring if it exists.

The final version of “dsl3.py” is also available on GitHub.

Writing a DSL With Python – Review & Recap

Let’s recap what we’ve done in this tutorial. We’ve created a simple DSL that lets our users easily get some work done by calling into a library of functions. Luckily for us, we know Python. So we can use it to implement our DSL and make things easy for us too.

DSLs are powerful tools that are fun to think about and work on. They’re another way we can be creative and solve problems that make it easier for our users to get work done. I hope this tutorial has given you some new ideas and things to think about that you can apply and use in your own code.

From the user’s perspective, they’re just running “commands.” From our perspective, we get to leverage Python’s dynamic nature and its features and, in turn, reap the rewards of having all of the power of Python and its ecosystem available to us. For example, we can easily make changes to a library module or extend the library with new modules to expose new functionality using the standard library or 3rd party packages.

In this tutorial we looked at a few techniques:

importlib.import_module(): dynamically import a module at runtime
getattr(): get an object’s attribute
variable-length function arguments and keyword arguments
converting a string to a different type

Using just these techniques is quite powerful. I encourage you to take some time to think about how you might extend the code and functionality I’ve shown here. It could be as simple as adding a few lines of code using some of the features built-in to Python or writing more custom code using classes.

Using importlib

I’d like to mention one more thing regarding the use of “importlib”. Another application and example of using dynamic imports with “importlib” is implementing a plugin system. Plugin systems are very popular and widely used in all types of software.

There’s a reason for this. Plugin systems are a method of allowing extensibility and flexibility in an otherwise static application. If you’re interested in deepening your knowledge, see Dan’s excellent tutorial “Python Plugin System: Load Modules Dynamically With importlib“

Error Checking

In this tutorial I’ve omitted error checking on purpose. One reason is to keep additional code out of the examples for clarity. But also so the users and Python programmers of the library modules can see a full stack trace when there are errors.

This may or may not be the right behavior for your application. Think about what makes the most sense for your users and handle errors appropriately, especially for common error cases.

Security Considerations

A cautionary note on security: please consider and be aware that the dynamic nature of importing and running code may have security implications depending on your application and environment. Be sure that only authorized users have access to your source and module directories. For example, unauthorized write access to the “modules” directory will allow users to run arbitrary code.

Python DSLs: Next Steps

Where do we go from here? What’s next? You may be thinking, “Well, this is nice and all, but I need more cowbell! I need to create a real DSL with real syntax and keywords.”

A good next step would be to look at Python parsing libraries. There are many! And their functionality, ease-of-use and documentation vary widely.

One that I’ve used in the past and had success with is the pyparsing module available on PyPI.
For a comprehensive survey, see “Parsing In Python: Tools And Libraries”

If you’d like to use the code used in this tutorial for your own experiments, the full source code is available on GitHub.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Getting a Job as a Self-Taught Python Developer

Tue, 10 Oct 2017 00:00:00 GMT

Getting a Job as a Self-Taught Python Developer

Do you need a university degree to get a coder job? Is a generic Computer Science degree best or are there more specific programs?

I got this email with Python career questions from newsletter reader Brad:

First, with regards to your Python Tricks book, I thought it was well-written and well-priced. I got good use out of, I’d say, 4 or 5 sections.

I’ve been writing in Python for a little under a year now. I’m entirely self-taught and had 0 programming experience prior. I’ve picked it up pretty quickly though just by devouring any book I can get my hands on. (McKinney, Hilpisch, Shaw, Sargent/Stachurski, yours, etc.)

Here is my question:

If I’m thinking about making Python the core of my career/job rather than just a smaller part of it, is formally going back to school necessary in your opinion?

If so, where do I start looking—at general comp sci degrees or are there more specific programs out there? How many self-taught guys do you know who have done very well for themselves?

Alright, I counted at least three questions in there 🙂

Let’s tackle them one by one. I’ll get on the “is formally going back to school necessary to get a coder job” question first:

Getting a formal computer science degree is the “classical” option (it’s the path that I went down.) And I think it’s a thorough and helpful option if you love doing a deep dive on CompSci theory.

I would not do this and get a CS degree purely for career options, however. Do it if you love and enjoy computer science and want to focus a few years on building your skills with a solid theoretical foundation. Don’t do it if your biggest goal is to “get a job” as a dev—

Here’s the reason why:

In my experience most schools don’t teach very many practical skills or help you build up a portfolio as part of their CS programs. So that’s something you’d have to figure out on your own and do it on the side. (Brad sounds really proactive so this might not be a problem.) Also, getting a formal degree can be quite expensive—and, as I said, it’s probably not the fastest route to “employability.”

Let’s talk about the alternatives to general Computer Science degrees that Brad asked about in his email:

If you don’t want to go down the formal education route and your main goal is to get a coder job, another option would be joining a development bootcamp.

That’s a practical, hands-on experience lasting several weeks (and up to around 3 months) where you meet, code, and learn with peers and mentors. The greatest benefit of doing a bootcamp is that you’ll end up with some example projects and code in your portfolio that you can show in an interview.

You know, for employers the biggest challenge in hiring junior/entry-level developers is that there’s little or no data about their past performance. So if someone who’s still early in their career looking for their first job it helps a lot if they can share some example code (on their GitHub profile etc.)

These programs can work well for someone who’s committed. I’ve worked with people who had entered the dev industry that way and who are now well on their path towards building a programming career.

So, attending a dev bootcamp might be an option worth exploring for you. It’s also a smaller commitment that a CS degree from a time and money perspective. Plus, you can pair it with online training classes to get up to speed on the theoretical fundamentals and to ensure you keep improving after the bootcamp is over.

But just to be clear:

A 3 month coding bootcamp is never going to replace the breadth and depth of a 4 year bachelor’s Computer Science program. There’s a lot of material to cover and it takes time and long-term effort to absorb it all. But if your goal is to get a paid job as a coder as quickly as possible they can be a valid option.

Something else you want to keep in mind is that it can be challenging to find a high-quality Python bootcamp with a good curriculum and engaging teachers—especially if you live outside the United States.

[Got another Python career question? I’m covering more of them in my “Python Q&A” videos on my YouTube channel → Click here to check out the full list of episodes.]

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to Structure Your Python Programs

Tue, 03 Oct 2017 00:00:00 GMT

How to Structure Your Python Programs

Learn a simple trick for keeping your Python code organized and maintainable—even as a project grows over time.

How should you structure your Python programs?

Is there a recommended way to bring “structure to the chaos?”

If you’re writing anything that goes beyond a few lines of Python code, how should you lay out your functions and other building blocks?

Interesting thought on that topic from a newsletter member:

I have always felt that programming is telling a computer a story.

If your story is good, the computer will execute it efficiently, if the story sucks, the execution sucks.

I write code like I would write a technical paper, it has direction and flow, and you can tell when you have reached the conclusion of the story.

Cute! And true:

I’ve had good results with this “narrative” approach. It works especially well for single-file automation or data crunching scripts. And it helps you keep your code organized and maintainable, even as a project grows.

Let’s take a look at how this would work in practice. We’ll first lay out the logical flow for an example program and then we’ll compare different ways to implement this narrative in Python.

Breaking Down the “Program Narrative”

Imagine the following high-level logical flow for a simple report generator program:

Read input data
Perform calculations
Write report

Notice how each stage (after the first one) depends on some byproduct or output of its predecessor:

Read input data
Perform calculations (based on input data)
Write report (based on calculated report data)

The way I see it, you have two choices here: You can either implement this logical flow from the top down or from the bottom up.

“Top-Down” vs “Bottom-Up” Code Layout

If write your program bottom-up, your function layout will match the logic flow—it’ll go from the fully independent building blocks to the ones that depend on their results.

Here’s a sketch for a “bottom-up” implementation:

def read_input_file(filename):
    pass

def generate_report(data):
    pass

def write_report(report):
    pass

data = read_input_file('data.csv')
report = generate_report(data)
write_report(report)

This structure “makes sense” in an intuitive way, doesn’t it?

We first need to read the input file before we can generate a report, and we need to generate the report before we can write it out to disk.

This logical structure is reflected in the program’s layout.

[Or, in scary-sounding Computer Science terms: This is basically a topological sorting of the dependency graph.]

Let’s take a look at the “top-down” implementation:

For a “top-down” approach you’d flip the same structure on its head and start with the highest-level building block first, fleshing out the details later.

This results in the following program sketch:

def main():
    data = read_input_file('data.csv')
    report = generate_report(data)
    write_report(report)

def write_report(report):
    pass

def generate_report(data):
    pass

def read_input_file(filename):
    pass

# Application entry point -> call main()
main()

See how I started with the high-level, “most dependent” functionality first this time around?

The “main()” function at the top states clearly what this program is going to do—without having defined yet how exactly it will achieve the desired result.

Which Approach Is Better:
“Top-Down” or “Bottom-Up?”

I don’t think there’s much of a practical difference between them, to be honest.

The important thing for me is that they both encode a logical narrative—they’re both “telling the computer a story” and they have “direction and flow.”

This is the key insight for me.

The worst thing one can do is deliberately obfuscating this logical structure, thereby killing the narrative:

def write_report(report):
    pass

def read_input_file(filename):
    pass

def generate_report(data):
    pass

(Yuck!)

Now, obviously I’m using a small “toy” example here—

But imagine what happens with programs that consist of 10, 100, 1000 steps in their “narrative” and they’re organized incoherently?

Maybe it’s my German need for order and stability speaking—but in my experience the result is usually utter chaos and madness:

“If the story sucks, the execution sucks”

The more you practice this “narrative flow” mindset as a way of structuring your programs, the more natural it will feel and the more automatic it will become as a behavior while you’re coding.

If you’re looking for a way to practice this method, try to revisit some of your older Python code and rewrite/refactor it to follow the principles laid out in this article.

Of course, you can also extend this idea to other “building blocks” like classes and modules as well…but more on that at some other time.

Happy Pythoning!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

I’m bad at math and I’ll never be a real programmer

Tue, 26 Sep 2017 00:00:00 GMT

I’m bad at math and I’ll never be a real programmer

Do you have to be good at math to be a good programmer? Should you pursue a computer science career if you’re bad at math?

There’s a phase in the life of every coder I call “The Valley of Despair.”

It looks something like this:

It’s 3am and my eyes hurt. I fought my way through a stack of documentation and books—but when I think about writing a simple program my chest tenses up.

After four hours of reading the code and the documentation for the “urllib” module, fetching a URL, parsing the response, and printing some headers to the terminal still feels about as natural as climbing Everest without oxygen.

There’s all this knowledge crammed into my head and for once it’s time to spread my wings and program something useful, some small thing that solves a problem in the real-world… And yet, every time I step close to the edge I recoil:

“I’m not made out to grasp this stuff.”

“I’m bad at math and I’ll never be a real programmer.”

“Everybody thinks I’m a fool for trying to learn this in my spare time and having nothing to show for.”

It’s a catch 22:

If you can’t write your own programs successfully, you can’t build your confidence. And if you don’t have the confidence, you can’t write your own programs.

I think almost everyone has been through some version of this.

I’ve certainly experienced it. And it got so bad that almost psyched myself out of applying to university for a Computer Science degree because I felt I was inadequate—

That I couldn’t do it.

So, one night I decided to work through the weekend and to give myself a challenge to determine my fate:

If I could sit down with an article about the Minimax algorithm and write a Java game “AI” that plays Tic Tac Toe, then I’d know I have what it takes and I’d apply to university.

And if I couldn’t write this program, I’d forget about my dream and would pick a different career…

Now, how did this experiment go?

Well, let’s leave it at this: Monday morning I emerged with with bloodshot eyes and less confident about my programming skills than ever before. But I decided I had written something workable and that I might as well apply to university and try my luck—and the rest is history.

Just to be clear, I don’t necessarily recommend this as a “silver bullet” technique you should use in your own life.

But what it did for me (besides giving me terribly stressful weekend) was that it taught me a valuable lesson about pain tolerance and persistence:

If you want to learn a difficult skill like programming, it *will be* a series of “stuff’s too hard, smack head against wall” moments—interspersed with the occasional intellectual rapture.

There’s ALWAYS a new challenge in this industry and the feelings of frustration and having to stretch yourself will never fully go away. The only way I found to deal with this pressure is to embrace it as a fact of life.

So, if you’re going through “The Valley” right now, realize this:

Literally hundreds of thousands of coders and want-to-be coders are going through the same experience right now. Millions of others have experienced it before you, and many more will live through it in the future.

You’re not walking alone.

It takes courage to push through the frustrations and to make it to the other side.

And you’ll likely arrive there with second-degree burns and a lot of sand in your underwear—but if you dream of becoming a programmer, it’s the only way.

Keep going forward, and don’t let up.

I know you can make it.

P.S. A while ago I was invited as a guest on a Portuguese software development podcast and had a chance to discuss this topic some more. Click here to listen to the show (episode is in English).

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Contributing to Python Open-Source Projects

Tue, 19 Sep 2017 00:00:00 GMT

Contributing to Python Open-Source Projects

How can you become a contributor on popular, “high-profile” Python open-source projects like Django, Requests, and so on?

Contributing to open-source projects is a great way to build your programming skills, take part in the community, and to make a real impact with your code…

It can also help you get a job as a professional Python developer, but becoming a contributor in the first place—that’s often tough.

So, let’s talk about this question I got from newsletter member Sudhanshu the other day:

Hi Dan,

I am student from India, I don’t really know whether this is your field or not but I have been doing Django development for 5 to 6 months.

I have made few projects on REST APIs, websites, etc. Then I decided to contribute in Django open source projects, particularly those by the Django organization and Mozilla.

What should I do at this point? How can I improve my level of Python knowledge so that I can contribute to these projects?

It sounds like Sudhanshu is in a good spot already.

I love the fact that he’s been working on his own side-projects to build up a portfolio—that’ll be a great asset when he goes job hunting.

If you’re in Sudhanshu’s shoes right now, here’s what I’d focus on next:

Try to strike up some personal connections with people working on those “high-profile” Python projects you want to contribute to.

See if you can make contact somehow—are they on Twitter? Can you comment or ask a question on a GitHub issue? Maybe you can even cold-email them…

Little by little, you’ll be able to build relationships with some of them. Building trust takes a lot of time and dedication, but eventually the timing will be right to offer your help:

Just ask them if there’s something small you could contribute to, like cleaning up the documentation, or fixing typos—simple things like that.

Open-source maintainers usually appreciate it when others help improve the documentation of a project. So that’s often a good way for you to get the foot in the door, metaphorically speaking.

What I want to say is this:

Getting your contributions accepted comes down much more to having built trust with the right people, rather than “throwing a bunch of code over the wall” and creating random pull-requests.

If you’re interested in some more thoughts on this topic, check out the YouTube video I recorded. It contains additional tips and tactics that will help you break into the open-source world:

Good luck on your Python open-source journey and…Happy Pythoning!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Iterator Chains as Pythonic Data Processing Pipelines

Tue, 12 Sep 2017 00:00:00 GMT

Iterator Chains as Pythonic Data Processing Pipelines

Here’s another great feature of iterators in Python: By chaining together multiple iterators you can write highly efficient data processing “pipelines.”

If you take advantage of Python’s generator functions and generator expressions, you’ll be building concise and powerful iterator chains in no time.

In this tutorial you’ll find out what this technique looks like in practice and how you can use it in your own programs.

The first time I saw this pattern in action in a PyCon presentation by David Beazley, it simply blew my mind.

But first things first—let’s do a quick recap:

Generators and generator expressions are syntactic sugar for writing iterators in Python. They abstract away much of the boilerplate code needed when writing class-based iterators.

While a regular function produces a single return value, generators produce a sequence of results. You could say they generate a stream of values over the course of their lifetime.

For example, I can define the following generator that produces the series of integer values from one to eight by keeping a running counter and yielding a new value every time next() gets called on it:

def integers():
    for i in range(1, 9):
        yield i

You can confirm this behaviour by running the following code in a Python REPL:

>>> chain = integers()
>>> list(chain)
[1, 2, 3, 4, 5, 6, 7, 8]

So far, so not-very-interesting. But we’ll quickly change this now. You see, generators can be “connected” to each other in order to build efficient data processing algorithms that work like a pipeline.

Making Generator “Pipelines”

You can take the “stream” of values coming out of the integers() generator and feed them into another generator again. For example, one that takes each number, squares it, and then passes it on:

def squared(seq):
    for i in seq:
        yield i * i

This is what our “data pipeline” or “chain of generators” would do now:

>>> chain = squared(integers())
>>> list(chain)
[1, 4, 9, 16, 25, 36, 49, 64]

And we can keep on adding new building blocks to this pipeline. Data flows in one direction only, and each processing step is shielded from the others via a well-defined interface.

This is similar to how pipelines work in Unix. We chain together a sequence of processes so that the output of each process feeds directly as input to the next one.

Building Longer Generator Chains

Why don’t we add another step to our pipeline that negates each value and then passes it on to the next processing step in the chain:

def negated(seq):
    for i in seq:
        yield -i

If we rebuild our chain of generators and add negated at the end, this is the output we get now:

>>> chain = negated(squared(integers()))
>>> list(chain)
[-1, -4, -9, -16, -25, -36, -49, -64]

My favorite thing about chaining generators is that the data processing happens one element at a time. There’s no buffering between the processing steps in the chain:

The integers generator yields a single value, let’s say 3.
This “activates” the squared generator, which processes the value and passes it on to the next stage as 3 × 3 = 9
The square number yielded by the squared generator gets fed immediately into the negated generator, which modifies it to -9 and yields it again.

You could keep extending this chain of generators to build out a processing pipeline with many steps. It would still perform efficiently and could easily be modified because each step in the chain is an individual generator function.

Chained Generator Expressions

Each individual generator function in this processing pipeline is quite concise. With a little trick, we can shrink down the definition of this pipeline even more, without sacrificing much readability:

integers = range(8)
squared = (i * i for i in integers)
negated = (-i for i in squared)

Notice how I’ve replaced each processing step in the chain with a generator expression built on the output of the previous step. This code is equivalent to the chain of generators we built throughout this tutorial:

>>> negated
<generator object <genexpr> at 0x1098bcb48>
>>> list(negated)
[0, -1, -4, -9, -16, -25, -36, -49]

The only downside to using generator expressions is that they can’t be configured with function arguments, and you can’t reuse the same generator expression multiple times in the same processing pipeline.

But of course, you could mix-and-match generator expressions and regular generators freely in building these pipelines. This will help improve readability with complex pipelines.

Chained Iterators in Python – Key Takeaways

In this tutorial you saw how chaining together multiple iterators let’s you write highly efficient data processing “pipelines.” This is another great feature of iterators in Python:

Generators can be chained together to form highly efficient and maintainable data processing pipelines.
Chained generators process each element going through the chain individually.
Generator expressions can be used to write concise pipeline definitions, but this can impact readability.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Extending Python With C Libraries and the “ctypes” Module

Tue, 05 Sep 2017 00:00:00 GMT

Extending Python With C Libraries and the “ctypes” Module

An end-to-end tutorial of how to extend your Python programs with libraries written in C, using the built-in “ctypes” module.

The built-in ctypes module is a powerful feature in Python, allowing you to use existing libraries in other languages by writting simple wrappers in Python itself.

Unfortunately it can be a bit tricky to use. In this article we’ll explore some of the basics of ctypes. We’ll cover:

Loading C libraries
Calling a simple C function
Passing mutable and immutable strings
Managing memory

Let’s start by taking a look with the simple C library we will be using and how to build it, and then jump into loading a C library and calling functions in it.

A Simple C Library That Can Be Used From Python

All of the code to build and test the examples discussed here (as well as the Markdown for this article) are committed to my GitHub repository.

I’ll walk through a little bit about the C library before we get into ctypes.

The C code we’ll use in this tutorial is designed to be as simple as possible while demonstrating the concepts we’re covering. It’s more of a “toy example” and not intended to be useful on its own. Here are the functions we’ll be using:

int simple_function(void) {
    static int counter = 0;
    counter++;
    return counter;
}

The simple_function function simply returns counting numbers. Each time it is called in increments counter and returns that value.

void add_one_to_string(char *input) {
    int ii = 0;
    for (; ii < strlen(input); ii++) {
        input[ii]++;
    }
}

The add_one_to_string function adds one to each character in a char array that is passed in. We’ll use this to talk about Python’s immutable strings and how to work around them when we need to.

char * alloc_C_string(void) {
    char* phrase = strdup("I was written in C");
    printf("C just allocated %p(%ld):  %s\n",
           phrase, (long int)phrase, phrase);
    return phrase;
}

void free_C_string(char* ptr) {
    printf("About to free %p(%ld):  %s\n",
           ptr, (long int)ptr, ptr);
    free(ptr);
}

This pair of functions allocate and free a string in the C context. This will provide the framework for talking about memory management in ctypes.

Finally, we need a way to build this source file into a library. While there are many tools I prefer to use make, I use it for projects like this because of its low overhead and ubiquity. Make is available on all Linux-like systems.

Here’s a snippet from the Makefile which builds the C library into a .so file:

clib1.so: clib1.o
    gcc -shared -o libclib1.so clib1.o

clib1.o: clib1.c
    gcc -c -Wall -Werror -fpic clib1.c

The Makefile in the repo is set up to completely build and run the demo from scratch; you only need to run the following command in your shell:

$ make

Loading a C Library With Python’s “ctypes” Module

Ctypes allows your to load a shared library (“DLL” on Windows) and access methods directly from it, provided you take care to “marshal” the data properly.

The most basic form of this is:

import ctypes

# Load the shared library into c types.
libc = ctypes.CDLL("./libclib1.so")

Note that this assumes that your shared library is in the same directory as your script and that you are calling the script from that directory. There are a lot of OS-specific details around library search paths which are beyond the scope of this article, but if you can package the .py file alongside the shared library, you can use something like this:

libname = os.path.abspath(
    os.path.join(os.path.dirname(__file__), "libclib1.so"))

libc = ctypes.CDLL(libname)

This will allow you to call the script from any directory.

Once you have loaded the library, it is stored in a Python object which has methods for each exported function.

Calling Simple Functions with ctypes

The great thing about ctypes is that it makes the simple things quite simple. Simply calling a function with no parameters is trivial. Once you have loaded the library, the function is just a method of the library object.

import ctypes

# Load the shared library into c types.
libc = ctypes.CDLL("./libclib1.so")

# Call the C function from the library
counter = libc.simple_function()

You’ll remember that the C function we’re calling returns counting numbers as int objects. Again, ctypes makes easy things easy—passing ints around works seamlessly and does pretty much what you expect it to.

Dealing with Mutable and Immutable Strings as ctypes Parameters

While basic types, ints and floats, generally get marshalled by ctypes trivially, strings pose a problem. In Python, strings are immutable, meaning they cannot change. This produces some odd behavior when passing strings in ctypes.

For this example we’ll use the add_one_to_string function shown in the C library above. If we call this passing in a Python string it runs, but does not modify the string as we might expect. This Python code:

print("Calling C function which tries to modify Python string")
original_string = "starting string"
print("Before:", original_string)

# This call does not change value, even though it tries!
libc.add_one_to_string(original_string)

print("After: ", original_string)

Results in this output:

Calling C function which tries to modify Python string
Before: starting string
After:  starting string

After some testing, I proved to myself that the original_string is not available in the C function at all when doing this. The original string was unchanged, mainly because the C function modified some other memory, not the string. So, not only does the C function not do what you want, but it also modifies memory that it should not, leading to potential memory corruption issues.

If we want the C function to have access to the string we need to do a little marshalling work up front. Fortunately, ctypes makes this fairly easy, too.

We need to convert the original string to bytes using str.encode, and then pass this to the constructor for a ctypes.string_buffer. String_buffers are mutable, and they are passed to C as a char * as you would expect.

# The ctypes string buffer IS mutable, however.
print("Calling C function with mutable buffer this time")

# Need to encode the original to get bytes for string_buffer
mutable_string = ctypes.create_string_buffer(str.encode(original_string))

print("Before:", mutable_string.value)
libc.add_one_to_string(mutable_string)  # Works!
print("After: ", mutable_string.value)

Running this code prints:

Calling C function with mutable buffer this time
Before: b'starting string'
After:  b'tubsujoh!tusjoh'

Note that the string_buffer is printed as a byte array on the Python side.

Specifying Function Signatures in ctypes

Before we get to the final example for this tutorial, we need to take a brief aside and talk about how ctypes passes parameters and returns values. As we saw above we can specify the return type if needed.

We can do a similar specifcation of the function parameters. Ctypes will figure out the type of the pointer and create a default mapping to a Python type, but that is not always what you want to do. Also, providing a function signature allows Python to check that you are passing in the correct parameters when you call a C function, otherwise crazy things can happen.

Because each of the functions in the loaded library is actually a Python object which has its own properties, specifying the return value is quite simple. To specify the return type of a function, you get the function object and set the restype property like this:

alloc_func = libc.alloc_C_string
alloc_func.restype = ctypes.POINTER(ctypes.c_char)

Similarly, you can specify the types of any arguments passed in to the C function by setting the argtypes property to a list of types:

free_func = libc.free_C_string
free_func.argtypes = [ctypes.POINTER(ctypes.c_char), ]

I’ve found several different clever methods in my studies for how to simplify specifying these, but in the end they all come down to these properties.

Memory Management Basics in ctypes

One of the great features of moving from C to Python is that you no longer need to spend time doing manual memory management. The golden rule when doing ctypes, or any cross-language marshalling is that the language that allocates the memory also needs to free the memory.

In the example above this worked quite well as Python allocated the string buffers we were passing around so it could then free that memory when it was no longer needed.

Frequently, however, the need arises to allocate memory in C and then pass it to Python for some manipulation. This works, but you need to take a few more steps to ensure you can pass the memory pointer back to C so it can free it when we’re done.

For this example, I’ll use these two C functions, alloc_C_string and free_C_string. In the example code both functions print out the memory pointer they are manipulating to make it clear what is happening.

As mentioned above, we need to be able to keep the actual pointer to the memory that alloc_C_string allocated so that we can pass it back to free_C_string. To do this, we need to tell ctype that alloc_C_string should return a ctypes.POINTER to a ctypes.c_char. We saw that earlier.

The ctypes.POINTER objects are not overly useful, but they can be converted to objects which are useful. Once we convert our string to a ctypes.c_char, we can access it’s value attribute to get the bytes in Python.

Putting that all together looks like this:

alloc_func = libc.alloc_C_string

# This is a ctypes.POINTER object which holds the address of the data
alloc_func.restype = ctypes.POINTER(ctypes.c_char)

print("Allocating and freeing memory in C")
c_string_address = alloc_func()

# Wow we have the POINTER object.
# We should convert that to something we can use
# on the Python side
phrase = ctypes.c_char_p.from_buffer(c_string_address)

print("Bytes in Python {0}".format(phrase.value))

Once we’ve used the data we allocated in C, we need to free it. The process is quite similar, specifying the argtypes attribute instead of restype:

free_func = libc.free_C_string
free_func.argtypes = [ctypes.POINTER(ctypes.c_char), ]
free_func(c_string_address)

Python’s “ctypes” Module – Conclusion

Python’s built-in ctypes feature allows you to interact with C code from Python quite easily, using a few basic rules to allow you to specify and call those functions. However, you must be careful about memory management and ownership.

If you’d like to see and play with the code I wrote while working on this, please visit my GitHub repository.

Also, be sure to check out part two of this tutorial where you’ll learn more about advanced features and patterns in using the ctypes library to interface Python with C code.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Generator Expressions in Python: An Introduction

Tue, 29 Aug 2017 00:00:00 GMT

Generator Expressions in Python: An Introduction

Generator expressions are a high-performance, memory–efficient generalization of list comprehensions and generators. In this tutorial you’ll learn how to use them from the ground up.

In one of my previous tutorials you saw how Python’s generator functions and the yield keyword provide syntactic sugar for writing class-based iterators more easily.

The generator expressions we’ll cover in this tutorial add another layer of syntactic sugar on top—they give you an even more effective shortcut for writing iterators:

With a simple and concise syntax that looks like a list comprehension, you’ll be able to define iterators in a single line of code.

Here’s an example:

iterator = ('Hello' for i in range(3))

Python Generator Expressions 101 – The Basics

When iterated over, the above generator expression yields the same sequence of values as the bounded_repeater generator function we implemented in my generators tutorial. Here it is again to refresh your memory:

def bounded_repeater(value, max_repeats):
    for i in range(max_repeats):
        yield value

iterator = bounded_repeater('Hello', 3)

Isn’t it amazing how a single-line generator expression now does a job that previously required a four-line generator function or a much longer class-based iterator?

But I’m getting ahead of myself. Let’s make sure our iterator defined with a generator expression actually works as expected:

>>> iterator = ('Hello' for i in range(3))
>>> for x in iterator:
...     print(x)
'Hello'
'Hello'
'Hello'

That looks pretty good to me! We seem to get the same results from our one-line generator expression that we got from the bounded_repeater generator function.

There’s one small caveat though:

Once a generator expression has been consumed, it can’t be restarted or reused. So in some cases there is an advantage to using generator functions or class-based iterators.

Generator Expressions vs List Comprehensions

As you can tell, generator expressions are somewhat similar to list comprehensions:

>>> listcomp = ['Hello' for i in range(3)]
>>> genexpr = ('Hello' for i in range(3))

Unlike list comprehensions, however, generator expressions don’t construct list objects. Instead, they generate values “just in time” like a class-based iterator or generator function would.

All you get by assigning a generator expression to a variable is an iterable “generator object”:

>>> listcomp
['Hello', 'Hello', 'Hello']

>>> genexpr
<generator object <genexpr> at 0x1036c3200>

To access the values produced by the generator expression, you need to call next() on it, just like you would with any other iterator:

>>> next(genexpr)
'Hello'
>>> next(genexpr)
'Hello'
>>> next(genexpr)
'Hello'
>>> next(genexpr)
StopIteration

Alternatively, you can also call the list() function on a generator expression to construct a list object holding all generated values:

>>> genexpr = ('Hello' for i in range(3))
>>> list(genexpr)
['Hello', 'Hello', 'Hello']

Of course, this was just a toy example to show how you can “convert” a generator expression (or any other iterator for that matter) into a list. If you need a list object right away, you’d normally just write a list comprehension from the get-go.

Let’s take a closer look at the syntactic structure of this simple generator expression. The pattern you should begin to see looks like this:

genexpr = (expression for item in collection)

The above generator expression “template” corresponds to the following generator function:

def generator():
    for item in collection:
        yield expression

Just like with list comprehensions, this gives you a “cookie-cutter pattern” you can apply to many generator functions in order to transform them into concise generator expressions.

⏰ Sidebar: Pythonic Syntactic Sugar

As I learned more about Python’s iterator protocol and the different ways to implement it in my own code, I realized that “syntactic sugar” was a recurring theme.

You see, class-based iterators and generator functions are two expressions of the same underlying design pattern.

Generator functions give you a shortcut for supporting the iterator protocol in your own code, and they avoid much of the verbosity of class-based iterators. With a little bit of specialized syntax, or syntactic sugar, they save you time and make your life as a developer easier:

This is a recurring theme in Python and in other programming languages. As more developers use a design pattern in their programs, there’s a growing incentive for the language creators to provide abstractions and implementation shortcuts for it.

That’s how programming languages evolve over time—and as developers, we reap the benefits. We get to work with more and more powerful building blocks, which reduces busywork and lets us achieve more in less time.

Filtering Values

There’s one more useful addition we can make to this template, and that’s element filtering with conditions. Here’s an example:

>>> even_squares = (x * x for x in range(10)
                    if x % 2 == 0)

This generator yields the square numbers of all even integers from zero to nine. The filtering condition using the % (modulo) operator will reject any value not divisible by two:

>>> for x in even_squares:
...     print(x)
0
4
16
36
64

Let’s update our generator expression template. After adding element filtering via if-conditions, the template now looks like this:

genexpr = (expression for item in collection
           if condition)

And once again, this pattern corresponds to a relatively straightforward, but longer, generator function. Syntactic sugar at its best:

def generator():
    for item in collection:
        if condition:
            yield expression

In-line Generator Expressions

Because generator expressions are, well…expressions, you can use them in-line with other statements. For example, you can define an iterator and consume it right away with a for-loop:

for x in ('Bom dia' for i in range(3)):
    print(x)

There’s another syntactic trick you can use to make your generator expressions more beautiful. The parentheses surrounding a generator expression can be dropped if the generator expression is used as the single argument to a function:

>>> sum((x * 2 for x in range(10)))
90

# Versus:

>>> sum(x * 2 for x in range(10))
90

This allows you to write concise and performant code. Because generator expressions generate values “just in time” like a class-based iterator or a generator function would, they are very memory efficient.

Too Much of a Good Thing…

Like list comprehensions, generator expressions allow for more complexity than what we’ve covered so far. Through nested for-loops and chained filtering clauses, they can cover a wider range of use cases:

(expr for x in xs if cond1
      for y in ys if cond2
      ...
      for z in zs if condN)

The above pattern translates to the following generator function logic:

for x in xs:
    if cond1:
       for y in ys:
            if cond2:
                ...
                    for z in zs:
                        if condN:
                             yield expr

And this is where I’d like to place a big caveat:

Please don’t write deeply nested generator expressions like that. They can be very difficult to maintain in the long run.

This is one of those “the dose makes the poison” situations where a beautiful and simple tool can be overused to create hard to read and difficult to debug programs.

Just like with list comprehensions, I personally try to stay away from any generator expression that includes more than two levels of nesting.

Generator expressions are a helpful and Pythonic tool in your toolbox, but that doesn’t mean they should be used for every single problem you’re facing. For complex iterators, it’s often better to write a generator function or even a class-based iterator.

If you need to use nested generators and complex filtering conditions, it’s usually better to factor out sub-generators (so you can name them) and then to chain them together again at the top level.

If you’re on the fence, try out different implementations and then select the one that seems the most readable. Trust me, it’ll save you time in the long run.

Generator Expressions in Python – Summary

Generator expressions are similar to list comprehensions. However, they don’t construct list objects. Instead, generator expressions generate values “just in time” like a class-based iterator or generator function would.
Once a generator expression has been consumed, it can’t be restarted or reused.
Generator expressions are best for implementing simple “ad hoc” iterators. For complex iterators, it’s better to write a generator function or a class-based iterator.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

What Are Python Generators?

Tue, 22 Aug 2017 00:00:00 GMT

What Are Python Generators?

Generators are a tricky subject in Python. With this tutorial you’ll make the leap from class-based iterators to using generator functions and the “yield” statement in no time.

If you’ve ever implemented a class-based iterator from scratch in Python, you know that this endeavour requires writing quite a bit of boilerplate code.

And yet, iterators are so useful in Python: They allow you to write pretty for-in loops and help you make your code more Pythonic and efficient.

As a (proud) “lazy” Python developer, I don’t like tedious and repetitive work. And so, I often found myself wondering:

If there only was a more convenient way to write these Python iterators in the first place…

Surprise, there is! Once again, Python helps us out with some syntactic sugar to make writing iterators easier.

In this tutorial you’ll see how to write Python iterators faster and with less code using generators and the yield keyword.

Ready? Let’s go!

Python Generators 101 – The Basics

Let’s start by looking again at the Repeater example that I previously used to introduce the idea of iterators. It implemented a class-based iterator cycling through an infinite sequence of values.

This is what the class looked like in its second (simplified) version:

class Repeater:
    def __init__(self, value):
        self.value = value

    def __iter__(self):
        return self

    def __next__(self):
        return self.value

If you’re thinking, “that’s quite a lot of code for such a simple iterator,” you’re absolutely right. Parts of this class seem rather formulaic, as if they would be written in exactly the same way from one class-based iterator to the next.

This is where Python’s generators enter the scene. If I rewrite this iterator class as a generator, it looks like this:

def repeater(value):
    while True:
        yield value

We just went from seven lines of code to three.

Not bad, eh? As you can see, generators look like regular functions but instead of using the return statement, they use yield to pass data back to the caller.

Will this new generator implementation still work the same way as our class-based iterator did? Let’s bust out the for-in loop test to find out:

>>> for x in repeater('Hi'):
...    print(x)
'Hi'
'Hi'
'Hi'
'Hi'
'Hi'
...

Yep! We’re still looping through our greetings forever. This much shorter generator implementation seems to perform the same way that the Repeater class did.

(Remember to hit Ctrl+C if you want out of the infinite loop in an interpreter session.)

Now, how do these generators work? They look like normal functions, but their behavior is quite different. For starters, calling a generator function doesn’t even run the function. It merely creates and returns a generator object:

>>> repeater('Hey')
<generator object repeater at 0x107bcdbf8>

The code in the generator function only executes when next() is called on the generator object:

>>> generator_obj = repeater('Hey')
>>> next(generator_obj)
'Hey'

If you read the code of the repeater function again, it looks like the yield keyword in there somehow stops this generator function in mid-execution and then resumes it at a later point in time:

def repeater(value):
    while True:
        yield value

And that’s quite a fitting mental model for what happens here. You see, when a return statement is invoked inside a function, it permanently passes control back to the caller of the function. When a yield is invoked, it also passes control back to the caller of the function—but it only does so temporarily.

Whereas a return statement disposes of a function’s local state, a yield statement suspends the function and retains its local state.

In practical terms, this means local variables and the execution state of the generator function are only stashed away temporarily and not thrown out completely.

Execution can be resumed at any time by calling next() on the generator:

>>> iterator = repeater('Hi')
>>> next(iterator)
'Hi'
>>> next(iterator)
'Hi'
>>> next(iterator)
'Hi'

This makes generators fully compatible with the iterator protocol. For this reason, I like to think of them primarily as syntactic sugar for implementing iterators.

You’ll find that for most types of iterators, writing a generator function will be easier and more readable than defining a long-winded class-based iterator.

Python Generators That Stop Generating

In this tutorial we started out by writing an infinite generator once again. By now you’re probably wondering how to write a generator that stops producing values after a while, instead of going on and on forever.

Remember, in our class-based iterator we were able to signal the end of iteration by manually raising a StopIteration exception. Because generators are fully compatible with class-based iterators, that’s still what happens behind the scenes.

Thankfully, as programmers we get to work with a nicer interface this time around. Generators stop generating values as soon as control flow returns from the generator function by any means other than a yield statement. This means you no longer have to worry about raising StopIteration at all!

Here’s an example:

def repeat_three_times(value):
    yield value
    yield value
    yield value

Notice how this generator function doesn’t include any kind of loop. In fact it’s dead simple and only consists of three yield statements. If a yield temporarily suspends execution of the function and passes back a value to the caller, what will happen when we reach the end of this generator?

Let’s find out:

>>> for x in repeat_three_times('Hey there'):
...     print(x)
'Hey there'
'Hey there'
'Hey there'

As you may have expected, this generator stopped producing new values after three iterations. We can assume that it did so by raising a StopIteration exception when execution reached the end of the function.

But to be sure, let’s confirm that with another experiment:

>>> iterator = repeat_three_times('Hey there')
>>> next(iterator)
'Hey there'
>>> next(iterator)
'Hey there'
>>> next(iterator)
'Hey there'
>>> next(iterator)
StopIteration
>>> next(iterator)
StopIteration

This iterator behaved just like we expected. As soon as we reach the end of the generator function, it keeps raising StopIteration to signal that it has no more values to provide.

Let’s come back to another example from my Python iterators tutorials. The BoundedIterator class implemented an iterator that would only repeat a value a set number of times:

class BoundedRepeater:
    def __init__(self, value, max_repeats):
        self.value = value
        self.max_repeats = max_repeats
        self.count = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.count >= self.max_repeats:
            raise StopIteration
        self.count += 1
        return self.value

Why don’t we try to re-implement this BoundedRepeater class as a generator function. Here’s my first take on it:

def bounded_repeater(value, max_repeats):
    count = 0
    while True:
        if count >= max_repeats:
            return
        count += 1
        yield value

I intentionally made the while loop in this function a little unwieldy. I wanted to demonstrate how invoking a return statement from a generator causes iteration to stop with a StopIteration exception. We’ll soon clean up and simplify this generator function some more, but first let’s try out what we’ve got so far:

>>> for x in bounded_repeater('Hi', 4):
...     print(x)
'Hi'
'Hi'
'Hi'
'Hi'

Great! Now we have a generator that stops producing values after a configurable number of repetitions. It uses the yield statement to pass back values until it finally hits the return statement and iteration stops.

Like I promised you, we can further simplify this generator. We’ll take advantage of the fact that Python adds an implicit return None statement to the end of every function. This is what our final implementation looks like:

def bounded_repeater(value, max_repeats):
    for i in range(max_repeats):
        yield value

Feel free to confirm that this simplified generator still works the same way. All things considered, we went from a 12-line iterator in the BoundedRepeater class to a three-line generator-based implementation providing the same functionality.

That’s a 75% reduction in the number of lines of code—not too shabby!

Generator functions are a great feature in Python, and you shouldn’t hesitate to use them in your own programs.

As you just saw, generators help you “abstract away” most of the boilerplate code otherwise needed when writing class-based iterators. Generators can make your life as a Pythonista much easier and allow you to write cleaner, shorter, and more maintainable iterators.

Python Generators – A Quick Summary

Generator functions are syntactic sugar for writing objects that support the iterator protocol. Generators abstract away much of the boilerplate code needed when writing class-based iterators.
The yield statement allows you to temporarily suspend execution of a generator function and to pass back values from it.
Generators start raising StopIteration exceptions after control flow leaves the generator function by any means other than a yield statement.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Unpacking Nested Data Structures in Python

Tue, 15 Aug 2017 00:00:00 GMT

Unpacking Nested Data Structures in Python

A tutorial on Python’s advanced data unpacking features: How to unpack data with the “=” operator and for-loops.

Have you ever seen Python’s enumerate function being used like this?

for (i, value) in enumerate(values):
   ...

In Python, you can unpack nested data structures in sophisticated ways, but the syntax might seem complicated: Why does the for statement have two variables in this example, and why are they written inside parentheses?

This article answers those questions and many more. I wrote it in two parts:

First, you’ll see how Python’s “=” assignment operator iterates over complex data structures. You’ll learn about the syntax of multiple assignments, recursive variable unpacking, and starred targets.
Second, you’ll discover how the for-statement unpacks data using the same rules as the = operator. Again, we’ll go over the syntax rules first and then dive into some hands-on examples.

Ready? Let’s start with a quick primer on the “BNF” syntax notation used in the Python language specification.

BNF Notation – A Primer for Pythonistas

This section is a bit technical, but it will help you understand the examples to come. The Python 2.7 Language Reference defines all the rules for the assignment statement using a modified form of Backus Naur notation.

The Language Reference explains how to read BNF notation. In short:

symbol_name ::= starts the definition of a symbol
( ) is used to group symbols
* means appearing zero or more times
+ means appearing one or more times
(a|b) means either a or b
[ ] means optional
"text" means the literal text. For example, "," means a literal comma character.

Here is the complete grammar for the assignment statement in Python 2.7. It looks a little complicated because Python allows many different forms of assignment:

An assignment statement consists of

one or more (target_list "=") groups
followed by either an expression_list or a yield_expression

assignment_stmt ::= (target_list "=")+ (expression_list | yield_expression)

A target list consists of

a target
followed by zero or more ("," target) groups
followed by an optional trailing comma

target_list ::= target ("," target)* [","]

Finally, a target consists of any of the following

a variable name
a nested target list enclosed in ( ) or [ ]
a class or instance attribute
a subscripted list or dictionary
a list slice

target ::= identifier
           | "(" target_list ")"
           | "[" [target_list] "]"
           | attributeref
           | subscription
           | slicing

As you’ll see, this syntax allows you to take some clever shortcuts in your code. Let’s take a look at them now:

#1 – Unpacking and the “=” Assignment Operator

First, you’ll see how Python’s “=” assignment operator iterates over complex data structures. You’ll learn about the syntax of multiple assignments, recursive variable unpacking, and starred targets.

Multiple Assignments in Python:

Multiple assignment is a shorthand way of assigning the same value to many variables. An assignment statement usually assigns one value to one variable:

x = 0
y = 0
z = 0

But in Python you can combine these three assignments into one expression:

x = y = z = 0

Recursive Variable Unpacking:

I’m sure you’ve written [ ] and ( ) on the right side of an assignment statement to pack values into a data structure. But did you know that you can literally flip the script by writing [ ] and ( ) on the left side?

Here’s an example:

[target, target, target, ...] =
or
(target, target, target, ...) =

Remember, the grammar rules allow [ ] and ( ) characters as part of a target:

target ::= identifier
           | "(" target_list ")"
           | "[" [target_list] "]"
           | attributeref
           | subscription
           | slicing

Packing and unpacking are symmetrical and they can be nested to any level. Nested objects are unpacked recursively by iterating over the nested objects and assigning their values to the nested targets.

Here’s what this looks like in action:

(a, b) = (1, 2)
# a == 1
# b == 2

(a, b) = ([1, 2], [3, 4])
# a == [1, 2]
# b == [3, 4]

(a, [b, c]) = (1, [2, 3])
# a == 1
# b == 2
# c == 3

Unpacking in Python is powerful and works with any iterable object. You can unpack:

tuples
lists
dictionaries
strings
ranges
generators
comprehensions
file handles.

Test Your Knowledge: Unpacking

What are the values of a, x, y, and z in the example below?

a = (x, y, z) = 1, 2, 3

Hint: this expression uses both multiple assignment and unpacking.

Starred Targets (Python 3.x Only):

In Python 2.x the number of targets and values must match. This code will produce an error:

x, y, z = 1, 2, 3, 4   # Too many values

Python 3.x introduced starred variables. Python first assigns values to the unstarred targets. After that, it forms a list of any remaining values and assigns it to the starred variable. This code does not produce an error:

x, *y, z = 1, 2, 3, 4
# y == [2,3]

Test Your Knowledge: Starred Variables

Is there any difference between the variables b and *b in these two statements? If so, what is it?

(a, b, c) = 1, 2, 3
(a, *b, c) = 1, 2, 3

#2 – Unpacking and `for`-loops

Now that you know all about target list assignment, it’s time to look at unpacking used in conjunction with for-loops.

In this section you’ll see how the for-statement unpacks data using the same rules as the = operator. Again, we’ll go over the syntax rules first and then we’ll look at a few hands-on examples.

Let’s examine the syntax of the for statement in Python:

for_stmt ::= "for" target_list "in" expression_list ":" suite
             ["else" ":" suite]

Do the symbols target_list and expression_list look familiar? You saw them earlier in the syntax of the assignment statement.

This has massive implications:

Everything you’ve just learned about assignments and nested targets also applies to for loops!

Standard Rules for Assignments:

Let’s take another look at the standard rules for assignments in Python. The Python Language Reference says:

The for statement is used to iterate over the elements of a sequence (such as a string, tuple or list) or other iterable objects … Each item, in turn, is assigned to the target list using the standard rules for assignments.

You already know the standard rules for assignments. You learned them earlier when we talked about the = operator. They are:

assignment to a single target
assignment to multiple targets
assignment to a nested target list
assignment to a starred variable (Python 3.x only)

In the introduction, I promised I would explain this code:

for (i,value) in enumerate(values):
   ...

Now you know enough to figure it out yourself:

enumerate returns a sequence of (number, item) tuples
when Python sees the target list (i,value) it unpacks (number, item) tuple into the target list.

Examples:

I’ll finish by showing you a few more examples that use Python’s unpacking features with for-loops. Here’s some test data we’ll use in this section:

# Test data:
negative_numbers = (-1, -2, -3, -4, -5)
positive_numbers = (1, 2, 3, 4, 5)

The built-in zip function returns pairs of numbers:

>>> list(zip(negative_numbers, positive_numbers))
[(-1, 1), (-2, 2), (-3, 3), (-4, 4), (-5, 5)]

I can loop over the pairs:

for z in zip(negative_numbers, positive_numbers):
    print(z)

Which produces this output:

(-1, 1)
(-2, 2)
(-3, 3)
(-4, 4)
(-5, 5)

I can also unpack the pairs if I wish:

>>> for (neg, pos) in zip(negative_numbers, positive_numbers):
...     print(neg, pos)

-1 1
-2 2
-3 3
-4 4
-5 5

What about starred variables? This example finds a string’s first and last character. The underscore character is often used in Python when we need a dummy placeholder variable:

>>> animals = [
...    'bird',
...    'fish',
...    'elephant',
... ]

>>> for (first_char, *_, last_char) in animals:
...    print(first_char, last_char)

b d
f h
e t

Unpacking Nested Data Structures – Conclusion

In Python, you can unpack nested data structures in sophisticated ways, but the syntax might seem complicated. I hope that with this tutorial I’ve given you a clearer picture of how it all works. Here’s a quick recap of what we covered:

You just saw how Python’s “=” assignment operator iterates over complex data structures. You learned about the syntax of multiple assignments, recursive variable unpacking, and starred targets.
You also learned how Python’s for-statement unpacks data using the same rules as the = operator and worked through a number of examples.

It pays off to go back to the basics and to read the language reference closely—you might find some hidden gems there!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python String Conversion 101: Why Every Class Needs a “repr”

Tue, 08 Aug 2017 00:00:00 GMT

Python String Conversion 101: Why Every Class Needs a “repr”

How and why to implement Python “to string” conversion in your own classes using Python’s “repr” and “str” mechanisms and associated coding conventions.

When you define a custom class in Python and then try to print one of its instances to the console (or inspect it in an interpreter session), you get a relatively unsatisfying result.

The default “to string” conversion behavior is basic and lacks detail:

class Car:
    def __init__(self, color, mileage):
        self.color = color
        self.mileage = mileage

>>> my_car = Car('red', 37281)
>>> print(my_car)
<__console__.Car object at 0x109b73da0>
>>> my_car
<__console__.Car object at 0x109b73da0>

By default all you get is a string containing the class name and the id of the object instance (which is the object’s memory address in CPython.) That’s better than nothing, but it’s also not very useful.

You might find yourself trying to work around this by printing attributes of the class directly, or even by adding a custom to_string() method to your classes:

>>> print(my_car.color, my_car.mileage)
red 37281

The general idea here is the right one—but it ignores the conventions and built-in mechanisms Python uses to handle how objects are represented as strings.

How to Support “To String” Conversion in Your Python Classes?

Instead of building your own class-to-string conversion machinery, modelled after Java’s toString() methods, you’ll be better off adding the __str__ and __repr__ “dunder” methods to your class. They are the Pythonic way to control how objects are converted to strings in different situations. You can learn more about this in the Python data model documentation.

Let’s take a look at how these methods work in practice. To get started, we’re going to add a __str__ method to the Car class we defined earlier:

class Car:
    def __init__(self, color, mileage):
        self.color = color
        self.mileage = mileage

    def __str__(self):
        return f'a {self.color} car'

When you try printing or inspecting a Car instance now, you’ll get a different, slightly improved result:

>>> my_car = Car('red', 37281)
>>> print(my_car)
'a red car'
>>> my_car
<__console__.Car object at 0x109ca24e0>

Inspecting the car object in the console still gives us the previous result containing the object’s id. But printing the object resulted in the string returned by the __str__ method we added.

__str__ is one of Python’s “dunder” (double-underscore) methods and gets called when you try to convert an object into a string through the various means that are available:

>>> print(my_car)
a red car
>>> str(my_car)
'a red car'
>>> '{}'.format(my_car)
'a red car'

With a proper __str__ implementation, you won’t have to worry about printing object attributes directly or writing a separate to_string() function. It’s the Pythonic way to control string conversion.

By the way, some people refer to Python’s “dunder” methods as “magic methods.” But these methods are not supposed to be magical in any way. The fact that these methods start and end in double underscores is simply a naming convention to flag them as core Python features. It also helps avoid naming collisions with your own methods and attributes. The object constructor __init__ follows the same convention, and there’s nothing magical or arcane about it.

Don’t be afraid to use Python’s dunder methods—they’re meant to help you.

Python’s `repr` vs `str`: What Is the Difference Between Them?

Now, our string conversion story doesn’t end there. Did you see how inspecting my_car in an interpreter session still gave that odd <Car object at ...> result?

This happened because there are actually two dunder methods that control how objects are converted to strings in Python 3. The first one is __str__, and you just learned about it. The second one is __repr__, and the way it works is similar to __str__, but it is used in different situations. (Python 2.x also has a __unicode__ method that I’ll touch on a little later.)

Here’s a simple experiment you can use to get a feel for when __str__ or __repr__ is used. Let’s redefine our car class so it contains both to-string dunder methods with outputs that are easy to distinguish:

class Car:
    def __init__(self, color, mileage):
        self.color = color
        self.mileage = mileage

    def __repr__(self):
        return '__repr__ for Car'

    def __str__(self):
        return '__str__ for Car'

Now, when you play through the previous examples you can see which method controls the string conversion result in each case:

>>> my_car = Car('red', 37281)
>>> print(my_car)
__str__ for Car
>>> '{}'.format(my_car)
'__str__ for Car'
>>> my_car
__repr__ for Car

This experiment confirms that inspecting an object in a Python interpreter session simply prints the result of the object’s __repr__.

Interestingly, containers like lists and dicts always use the result of __repr__ to represent the objects they contain. Even if you call str on the container itself:

str([my_car])
'[__repr__ for Car]'

To manually choose between both string conversion methods, for example, to express your code’s intent more clearly, it’s best to use the built-in str() and repr() functions. Using them is preferable over calling the object’s __str__ or __repr__ directly, as it looks nicer and gives the same result:

>>> str(my_car)
'__str__ for Car'
>>> repr(my_car)
'__repr__ for Car'

Even with this investigation complete, you might be wondering what the “real-world” difference is between __str__ and __repr__. They both seem to serve the same purpose, so it might be unclear when to use each.

With questions like that, it’s usually a good idea to look into what the Python standard library does. Time to devise another experiment. We’ll create a datetime.date object and find out how it uses __repr__ and __str__ to control string conversion:

>>> import datetime
>>> today = datetime.date.today()

The result of the date object’s __str__ function should primarily be readable.

It’s meant to return a concise textual representation for human consumption—something you’d feel comfortable displaying to a user. Therefore, we get something that looks like an ISO date format when we call str() on the date object:

>>> str(today)
'2017-02-02'

With __repr__, the idea is that its result should be, above all, unambiguous.

The resulting string is intended more as a debugging aid for developers. And for that it needs to be as explicit as possible about what this object is. That’s why you’ll get a more elaborate result calling repr() on the object. It even includes the full module and class name:

>>> repr(today)
'datetime.date(2017, 2, 2)'

We could copy and paste the string returned by __repr__ and execute it as valid Python to recreate the original date object. This is a neat approach and a good goal to keep in mind while writing your own reprs.

On the other hand, I find that it is quite difficult to put into practice. Usually it won’t be worth the trouble and it’ll just create extra work for you. My rule of thumb is to make my __repr__ strings unambiguous and helpful for developers, but I don’t expect them to be able to restore an object’s complete state.

Why Every Python Class Needs a `repr`

If you don’t add a __str__ method, Python falls back on the result of __repr__ when looking for __str__. Therefore, I recommend that you always add at least a __repr__ method to your classes. This will guarantee a useful string conversion result in almost all cases, with a minimum of implementation work.

Here’s how to add basic string conversion support to your classes quickly and efficiently. For our Car class we might start with the following __repr__:

def __repr__(self):
    return f'Car({self.color!r}, {self.mileage!r})'

Please note that I’m using the !r conversion flag to make sure the output string uses repr(self.color) and repr(self.mileage) instead of str(self.color) and str(self.mileage).

This works nicely, but one downside is that we’ve repeated the class name inside the format string. A trick you can use here to avoid this repetition is to use the object’s __class__.__name__ attribute, which will always reflect the class’ name as a string.

The benefit is you won’t have to modify the __repr__ implementation when the class name changes. This makes it easy to adhere to the Don’t Repeat Yourself (DRY) principle:

def __repr__(self):
   return (f'{self.__class__.__name__}('
           f'{self.color!r}, {self.mileage!r})')

The downside of this implementation is that the format string is quite long and unwieldy. But with careful formatting, you can keep the code nice and PEP 8 compliant.

With the above __repr__ implementation, we get a useful result when we inspect the object or call repr() on it directly:

>>> repr(my_car)
'Car(red, 37281)'

Printing the object or calling str() on it returns the same string because the default __str__ implementation simply calls __repr__:

>>> print(my_car)
'Car(red, 37281)'
>>> str(my_car)
'Car(red, 37281)'

I believe this approach provides the most value with a modest amount of implementation work. It’s also a fairly cookie-cutter approach that can be applied without much deliberation. For this reason, I always try to add a basic __repr__ implementation to my classes.

Here’s a complete example for Python 3, including an optional __str__ implementation:

class Car:
    def __init__(self, color, mileage):
        self.color = color
        self.mileage = mileage

    def __repr__(self):
       return (f'{self.__class__.__name__}('
               f'{self.color!r}, {self.mileage!r})')

    def __str__(self):
        return f'a {self.color} car'

Python 2.x Differences: `unicode`

In Python 3 there’s one data type to represent text across the board: str. It holds unicode characters and can represent most of the world’s writing systems.

Python 2.x uses a different data model for strings. There are two types to represent text: str, which is limited to the ASCII character set, and unicode, which is equivalent to Python 3’s str.

Due to this difference, there’s yet another dunder method in the mix for controlling string conversion in Python 2: __unicode__. In Python 2, __str__ returns bytes, whereas __unicode__ returns characters.

For most intents and purposes, __unicode__ is the newer and preferred method to control string conversion. There’s also a built-in unicode() function to go along with it. It calls the respective dunder method, similar to how str() and repr() work.

So far so good. Now, it gets a little more quirky when you look at the rules for when __str__ and __unicode__ are called in Python 2:

The print statement and str() call __str__. The unicode() built-in calls __unicode__ if it exists, and otherwise falls back to __str__ and decodes the result with the system text encoding.

Compared to Python 3, these special cases complicate the text conversion rules somewhat. But there is a way to simplify things again for practical purposes. Unicode is the preferred and future-proof way of handling text in your Python programs.

So generally, what I would recommend you do in Python 2.x is to put all of your string formatting code inside the __unicode__ method and then create a stub __str__ implementation that returns the unicode representation encoded as UTF-8:

def __str__(self):
    return unicode(self).encode('utf-8')

The __str__ stub will be the same for most classes you write, so you can just copy and paste it around as needed (or put it into a base class where it makes sense). All of your string conversion code that is meant for non-developer use then lives in __unicode__.

Here’s a complete example for Python 2.x:

class Car(object):
    def __init__(self, color, mileage):
        self.color = color
        self.mileage = mileage

    def __repr__(self):
       return '{}({!r}, {!r})'.format(
           self.__class__.__name__,
           self.color, self.mileage)

    def __unicode__(self):
        return u'a {self.color} car'.format(
            self=self)

    def __str__(self):
        return unicode(self).encode('utf-8')

When to use `str` vs `repr` in Python:

You can control to-string conversion in your own classes using the __str__ and __repr__ “dunder” methods. Writing your own Java-esque “tostring” methods is considered unpythonic.
The result of the __str__ method should be readable. The result of __repr__ should be unambiguous.
You should always add a __repr__ to your classes. The default implementation for __str__ just calls __repr__ internally, so by implementing repr support you’ll get the biggest benefit.
On Python 2.x you’ll want to use __unicode__ instead of __str__.

If you’d like to dig deeper into the subject, be sure to watch my related YouTube tutorial on when to use __repr__ vs __str__. It’s also embedded at the top of the article. Happy Pythoning!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python Iterators: A Step-By-Step Introduction

Tue, 01 Aug 2017 00:00:00 GMT

Python Iterators: A Step-By-Step Introduction

Understanding iterators is a milestone for any serious Pythonista. With this step-by-step tutorial you’ll understanding class-based iterators in Python, completely from scratch.

I love how beautiful and clear Python’s syntax is compared to many other programming languages.

Let’s take the humble for-in loop, for example. It speaks for Python’s beauty that you can read a Pythonic loop like this as if it was an English sentence:

numbers = [1, 2, 3]
for n in numbers:
    print(n)

But how do Python’s elegant loop constructs work behind the scenes? How does the loop fetch individual elements from the object it is looping over? And how can you support the same programming style in your own Python objects?

You’ll find the answer to these questions in Python’s iterator protocol:

Objects that support the __iter__ and __next__ dunder methods automatically work with for-in loops.

But let’s take things step by step. Just like decorators, iterators and their related techniques can appear quite arcane and complicated on first glance. So we’ll ease into it.

In this tutorial you’ll see how to write several Python classes that support the iterator protocol. They’ll serve as “non-magical” examples and test implementations you can build upon and deepen your understanding with.

We’ll focus on the core mechanics of iterators in Python 3 first and leave out any unnecessary complications, so you can see clearly how iterators behave at the fundamental level.

I’ll tie each example back to the for-in loop question we started out with. And at the end of this tutorial we’ll go over some differences that exist between Python 2 and 3 when it comes to iterators.

Ready? Let’s jump right in!

Python Iterators That Iterate Forever

We’ll begin by writing a class that demonstrates the bare-bones iterator protocol in Python. The example I’m using here might look different from the examples you’ve seen in other iterator tutorials, but bear with me. I think doing it this way gives you a more applicable understanding of how iterators work in Python.

Over the next few paragraphs we’re going to implement a class called Repeater that can be iterated over with a for-in loop, like so:

repeater = Repeater('Hello')
for item in repeater:
    print(item)

Like its name suggests, instances of this Repeater class will repeatedly return a single value when iterated over. So the above example code would print the string Hello to the console forever.

To start with the implementation we’ll define and flesh out the Repeater class first:

class Repeater:
    def __init__(self, value):
        self.value = value

    def __iter__(self):
        return RepeaterIterator(self)

On first inspection, Repeater looks like a bog-standard Python class. But notice how it also includes the __iter__ dunder method.

What’s the RepeaterIterator object we’re creating and returning from __iter__? It’s a helper class we also need to define for our for-in iteration example to work:

class RepeaterIterator:
    def __init__(self, source):
        self.source = source

    def __next__(self):
        return self.source.value

Again, RepeaterIterator looks like a straightforward Python class, but you might want to take note of the following two things:

In the __init__ method we link each RepeaterIterator instance to the Repeater object that created it. That way we can hold on to the “source” object that’s being iterated over.
In RepeaterIterator.__next__, we reach back into the “source” Repeater instance and return the value associated with it.

In this code example, Repeater and RepeaterIterator are working together to support Python’s iterator protocol. The two dunder methods we defined, __iter__ and __next__, are the key to making a Python object iterable.

We’ll take a closer look at these two methods and how they work together after some hands-on experimentation with the code we’ve got so far.

Let’s confirm that this two-class setup really made Repeater objects compatible with for-in loop iteration. To do that we’ll first create an instance of Repeater that would return the string 'Hello' indefinitely:

>>> repeater = Repeater('Hello')

And now we’re going to try iterating over this repeater object with a for-in loop. What’s going to happen when you run the following code snippet?

>>> for item in repeater:
...     print(item)

Right on! You’ll see 'Hello' printed to the screen…a lot. Repeater keeps on returning the same string value, and so, this loop will never complete. Our little program is doomed to print 'Hello' to the console forever:

Hello
Hello
Hello
Hello
Hello
...

But congratulations—you just wrote a working iterator in Python and used it with a for-in loop. The loop may not terminate yet…but so far, so good!

Next up we’ll tease this example apart to understand how the __iter__ and __next__ methods work together to make a Python object iterable.

Pro tip: If you ran the last example inside a Python REPL session or from the terminal and you want to stop it, hit Ctrl + C a few times to break out of the infinite loop.

How do for-in loops work in Python?

At this point we’ve got our Repeater class that apparently supports the iterator protocol, and we just ran a for-in loop to prove it:

repeater = Repeater('Hello')
for item in repeater:
    print(item)

Now, what does this for-in loop really do behind the scenes? How does it communicate with the repeater object to fetch new elements from it?

To dispel some of that “magic” we can expand this loop into a slightly longer code snippet that gives the same result:

repeater = Repeater('Hello')
iterator = repeater.__iter__()
while True:
    item = iterator.__next__()
    print(item)

As you can see, the for-in was just syntactic sugar for a simple while loop:

It first prepared the repeater object for iteration by calling its __iter__ method. This returned the actual iterator object.
After that, the loop repeatedly calls the iterator object’s __next__ method to retrieve values from it.

If you’ve ever worked with database cursors, this mental model will seem familiar: We first initialize the cursor and prepare it for reading, and then we can fetch data into local variables as needed from it, one element at a time.

Because there’s never more than one element “in flight”, this approach is highly memory-efficient. Our Repeater class provides an infinite sequence of elements and we can iterate over it just fine. Emulating the same with a Python list would be impossible—there’s no way we could create a list with an infinite number of elements in the first place. This makes iterators a very powerful concept.

On more abstract terms, iterators provide a common interface that allows you to process every element of a container while being completely isolated from the container’s internal structure.

Whether you’re dealing with a list of elements, a dictionary, an infinite sequence like the one provided by our Repeater class, or another sequence type—all of that is just an implementation detail. Every single one of these objects can be traversed in the same way by the power of iterators.

And as you’ve seen, there’s nothing special about for-in loops in Python. If you peek behind the curtain, it all comes down to calling the right dunder methods at the right time.

In fact, you can manually “emulate” how the loop used the iterator protocol in a Python interpreter session:

>>> repeater = Repeater('Hello')
>>> iterator = iter(repeater)
>>> next(iterator)
'Hello'
>>> next(iterator)
'Hello'
>>> next(iterator)
'Hello'
...

This gives the same result: An infinite stream of hellos. Every time you call next() the iterator hands out the same greeting again.

By the way, I took the opportunity here to replace the calls to __iter__ and __next__ with calls to Python’s built-in functions iter() and next().

Internally these built-ins invoke the same dunder methods, but they make this code a little prettier and easier to read by providing a clean “facade” to the iterator protocol.

Python offers these facades for other functionality as well. For example, len(x) is a shortcut for calling x.__len__. Similarly, calling iter(x) invokes x.__iter__ and calling next(x) invokes x.__next__.

Generally it’s a good idea to use the built-in facade functions rather than directly accessing the dunder methods implementing a protocol. It just makes the code a little easier to read.

A Simpler Iterator Class

Up until now our iterator example consisted of two separate classes, Repeater and RepeaterIterator. They corresponded directly to the two phases used by Python’s iterator protocol:

First setting up and retrieving the iterator object with an iter() call, and then repeatedly fetching values from it via next().

Many times both of these responsibilities can be shouldered by a single class. Doing this allows you to reduce the amount of code necessary to write a class-based iterator.

I chose not to do this with the first example in this tutorial, because it mixes up the cleanliness of the mental model behind the iterator protocol. But now that you’ve seen how to write a class-based iterator the longer and more complicated way, let’s take a minute to simplify what we’ve got so far.

Remember why we needed the RepeaterIterator class again? We needed it to host the __next__ method for fetching new values from the iterator. But it doesn’t really matter where __next__ is defined. In the iterator protocol, all that matters is that __iter__ returns any object with a __next__ method on it.

So here’s an idea: RepeaterIterator returns the same value over and over, and it doesn’t have to keep track of any internal state. What if we added the __next__ method directly to the Repeater class instead?

That way we could get rid of RepeaterIterator altogether and implement an iterable object with a single Python class. Let’s try it out! Our new and simplified iterator example looks as follows:

class Repeater:
    def __init__(self, value):
        self.value = value

    def __iter__(self):
        return self

    def __next__(self):
        return self.value

We just went from two separate classes and 10 lines of code to to just one class and 7 lines of code. Our simplified implementation still supports the iterator protocol just fine:

>>> repeater = Repeater('Hello')
>>> for item in repeater:
...    print(item)

Hello
Hello
Hello
...

Streamlining a class-based iterator like that often makes sense. In fact, most Python iterator tutorials start out that way. But I always felt that explaining iterators with a single class from the get-go hides the underlying principles of the iterator protocol—and thus makes it more difficult to understand.

Who Wants to Iterate Forever

At this point you’ll have a pretty good understanding of how iterators work in Python. But so far we’ve only implemented iterators that kept on iterating forever.

Clearly, infinite repetition isn’t the main use case for iterators in Python. In fact, when you look back all the way to the beginning of this tutorial, I used the following snippet as a motivating example:

numbers = [1, 2, 3]
for n in numbers:
    print(n)

You’ll rightfully expect this code to print the numbers 1, 2, and 3 and then stop. And you probably don’t expect it to go on spamming your terminal window by printing threes forever until you mash Ctrl+C a few times in a wild panic…

And so, it’s time to find out how to write an iterator that eventually stops generating new values instead of iterating forever. Because that’s what Python objects typically do when we use them in a for-in loop.

We’ll now write another iterator class that we’ll call BoundedRepeater. It’ll be similar to our previous Repeater example, but this time we’ll want it to stop after a predefined number of repetitions.

Let’s think about this for a bit. How do we do this? How does an iterator signal that it’s exhausted and out of elements to iterate over? Maybe you’re thinking, “Hmm, we could just return None from the __next__ method.”

And that’s not a bad idea—but the trouble is, what are we going to do if we want some iterators to be able to return None as an acceptable value?

Let’s see what other Python iterators do to solve this problem. I’m going to construct a simple container, a list with a few elements, and then I’ll iterate over it until it runs out of elements to see what happens:

>>> my_list = [1, 2, 3]
>>> iterator = iter(my_list)

>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3

Careful now! We’ve consumed all of the three available elements in the list. Watch what happens if I call next on the iterator again:

>>> next(iterator)
StopIteration

Aha! It raises a StopIteration exception to signal we’ve exhausted all of the available values in the iterator.

That’s right: Iterators use exceptions to structure control flow. To signal the end of iteration, a Python iterator simply raises the built-in StopIteration exception.

If I keep requesting more values from the iterator it’ll keep raising StopIteration exceptions to signal that there are no more values available to iterate over:

>>> next(iterator)
StopIteration
>>> next(iterator)
StopIteration
...

Python iterators normally can’t be “reset”—once they’re exhausted they’re supposed to raise StopIteration every time next() is called on them. To iterate anew you’ll need to request a fresh iterator object with the iter() function.

Now we know everything we need to write our BoundedRepeater class that stops iterating after a set number of repetitions:

class BoundedRepeater:
    def __init__(self, value, max_repeats):
        self.value = value
        self.max_repeats = max_repeats
        self.count = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.count >= self.max_repeats:
            raise StopIteration
        self.count += 1
        return self.value

This gives us the desired result. Iteration stops after the number of repetitions defined in the max_repeats parameter:

>>> repeater = BoundedRepeater('Hello', 3)
>>> for item in repeater:
        print(item)
Hello
Hello
Hello

If we rewrite this last for-in loop example to take away some of the syntactic sugar, we end up with the following expanded code snippet:

repeater = BoundedRepeater('Hello', 3)
iterator = iter(repeater)
while True:
    try:
        item = next(iterator)
    except StopIteration:
        break
    print(item)

Every time next() is called in this loop we check for a StopIteration exception and break the while loop if necessary.

Being able to write a three-line for-in loop instead of an eight lines long while loop is quite a nice improvement. It makes the code easier to read and more maintainable. And this is another reason why iterators in Python are such a powerful tool.

Python 2.x Compatible Iterators

All the code examples I showed here were written in Python 3. There’s a small but important difference between Python 2 and 3 when it comes to implementing class-based iterators:

In Python 3, the method that retrieves the next value from an iterator is called __next__.
In Python 2, the same method is called next (no underscores).

This naming difference can lead to some trouble if you’re trying to write class-based iterators that should work on both versions of Python. Luckily there’s a simple approach you can take to work around this difference.

Here’s an updated version of the InfiniteRepeater class that will work on both Python 2 and Python 3:

class InfiniteRepeater(object):
    def __init__(self, value):
        self.value = value

    def __iter__(self):
        return self

    def __next__(self):
        return self.value

    # Python 2 compatibility:
    def next(self):
        return self.__next__()

To make this iterator class compatible with Python 2 I’ve made two small changes to it:

First, I added a next method that simply calls the original __next__ and forwards its return value. This essentially creates an alias for the existing __next__ implementation so that Python 2 finds it. That way we can support both versions of Python while still keeping all of the actual implementation details in one place.

And second, I modified the class definition to inherit from object in order to ensure we’re creating a new-style class on Python 2. This has nothing to do with iterators specifically, but it’s a good practice nonetheless.

Python Iterators – A Quick Summary

Iterators provide a sequence interface to Python objects that’s memory efficient and considered Pythonic. Behold the beauty of the for-in loop!
To support iteration an object needs to implement the iterator protocol by providing the __iter__ and __next__ dunder methods.
Class-based iterators are only one way to write iterable objects in Python. Also consider generators and generator expressions.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to Install and Uninstall Python Packages Using Pip

Tue, 25 Jul 2017 00:00:00 GMT

How to Install and Uninstall Python Packages Using Pip

A step-by-step introduction to basic Python package management skills with the “pip” command. Learn how to install and remove third-party modules from PyPI.

Python is approaching its third decade of good old age, and over the years many people have contributed to the creation of Python packages that perform specific functions and operations.

As of this writing, there are ~112K packages listed on the PyPI website. PyPI is short for “Python Package Index”, a central repository for free third-party Python modules.

This large and convenient module ecosystem is what makes Python so great to work with:

You see, most Python programmers are really assemblers of Python packages, which take care of a big chunk of the programming load required by modern applications.

Chances are that there is more than one Python package ready to be unleashed and help you with your specific programming needs.

For instance, while reading dbader.org, you may notice that the pages on the site render emoji quite nicely. You may wonder…

I’d like to use emoji on my Python app!

Is there a Python package for that?

Let’s find out!

Here’s what we’ll cover in this tutorial:

Finding Python Packages
What to Look for in a Python Package
Installing Python Packages With Pip
Capturing Installed Python Packages with Requirements Files
Visualizing Installed Packages
Installing Python Packages From a requirements.txt File
Uninstalling Python Packages With Pip
Summary & Conclusion

Finding Python Packages

Let’s use the emoji use case as an example. We find emoji related Python packages by visiting the PyPI website and searching for emoji via the search box on the top right corner of the page.

As of this writing, PyPI lists 94 packages, of which a partial list is shown below.

Notice the “Weight*” header of the middle column. That’s a key piece of information. The weight value is basically a search scoring number, which the site calculates for each package to rank them and list them accordingly.

If we read the footnote it tells us that the number is calculated by “the occurrence of search term weighted by field (name, summary, keywords, description, author, maintainer).”

Does that mean that the top one is the best package?

Not necessarily. Although uncommon, a package maintainer may stuff emoji into every field to try to top rank the package, which could well happen.

Conversely, many developers don’t do their homework and don’t bother filling out all the fields for their packages, which results in those packages being ranked lower.

You still need to research the packages listed, including a consideration for what your specific end use may be. For instance, a key question could be:

Which environment do you want to implement emoji on? A terminal-based app, or perhaps a Django web app?

If you are trying to display emoji on a django web app, you may be better off with the 10th package down the list shown above (package django-emoji 2.2.0).

For our use case, let’s assume that we are interested in emoji for a terminal based Python app.

Let’s check out the first one on our list (package emoji 0.4.5) by clicking on it.

What to Look for in a Python Package

The following are characteristics of a good Python package:

Decent documentation: By reading it we can get a clue as to whether the package could meet our need or not;
Maturity and stability: It’s been around for some time, proven by both its age and its successive versions;
Number of contributors: Healthy packages (especially complex ones) tend to have a healthy number of maintainers;
Maintenance: It undergoes maintenance on a regular basis (we live in an ever-evolving world).

Although I would check it out, I wouldn’t rely too much on the development status listed for each package, that is, whether it’s a 4 - Beta or 5 - Production/Stable package. That classification is in the eye of the package creator and not necessarily reliable.

On our emoji example, the documentation seems decent. At the top of the page, we get a graphical indication of the package at work (see snippet below), which shows emoji on a Python interpreter. Yay!

The documentation for our emoji package also tells us about installing it, how to contribute to its development, etc., and points us to a GitHub page for the package, which is a great source of useful information about it.

By visiting its GitHub page, we can glean from it that the package has been around for at least two years, was last maintained in the past couple of months, has been starred 300+ times, has been forked 58 times, and has 10 contributors.

It’s looking good! We have identified a good candidate to incorporate emoji-ing into our Python terminal app.

How do we go about installing it?

Installing Python Packages With Pip

At this time, I am assuming that you already have Python installed on your system. There is plenty of info out there as to how to accomplish that.

Once you install Python, you can check whether pip is installed by running pip --version on a terminal.

I get the following output:

$ pip --version
pip 9.0.1 from /Library/Frameworks/Python.framework/↵
Versions/3.5/lib/python3.5/site-packages (python 3.5)

Since Python 3.4, pip is bundled with the Python installation package. If for some reason it is not installed, go ahead and get it installed.

I highly recommend also that you use a virtual environment (and more specifically, virtualenvwrapper), a set of extensions that…

…include wrappers for creating and deleting virtual environments and otherwise managing your development workflow, making it easier to work on more than one project at a time without introducing conflicts in their dependencies.

For this tutorial, I have created a virtual environment called pip-tutorial, which you will see going forward. My other tutorial walks you through setting up Python and virtualenvwrapper on Windows.

Below you’ll see how package dependencies can bring complexity into our already complex development environments, which is the reason why using virtual environments is a must for Python development.

A great place to start learning about a terminal program is by running it without any options on the terminal. So, on your terminal, run pip. You would get a list of Commands and General Options.

Below is a partial list of the results on my terminal:

From there on you could run pip install --help to read on what the install command does and what you need to specify to run it, for example. Of course, reading the pip documentation is another great place to start.

$ pip install --help

Usage:
  pip install [options] <requirement specifier> [package-index-options] ...
  pip install [options] -r <requirements file> [package-index-options] ...
  pip install [options] [-e] <vcs project url> ...
  pip install [options] [-e] <local project path> ...
  pip install [options] <archive url/path> ...

Description:
  Install packages from:

  - PyPI (and other indexes) using requirement specifiers.
  - VCS project urls.
  - Local project directories.
  - Local or remote source archives.

  pip also supports installing from "requirements files", which provide
  an easy way to specify a whole environment to be installed.

Install Options:
  ...

Let’s take a quick detour and focus on the freeze command next, which will be a key one in dealing with dependencies. Running pip freeze displays a list of all installed Python packages. If I run it with my freshly installed virtual environment active, I should get an empty list, which is the case:

$ pip freeze

Now we can get the Python interpreter going by typing python on our terminal. Once that’s done, let’s try to import the emoji module, upon which python will complain that there isn’t such a module installed, and rightfully so for we haven’t installed it, yet:

$ python
Python 3.5.0 (default)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import emoji
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'emoji'

To finally install the package, we can go ahead and run pip install emoji on our terminal. I get the following output:

$ pip install emoji==0.4.5
Collecting emoji==0.4.5
Installing collected packages: emoji
Successfully installed emoji-0.4.5

🚫 Getting a pip install “invalid syntax” error?

Please note that the pip install command needs to be run from the command-line inside your terminal program, and not inside the Python interpreter.

If you’re getting a “SyntaxError: invalid syntax” message from running pip install, then try leaving the interpreter with Ctrl+C and run the pip command again from the terminal prompt.

Pip is a program that installs modules, so you can use them from Python. Once you have installed the module, then you can open the Python shell and import the module.

When installing packages with pip, we can constrain pip to install a version of our preference, by using the following operators:

A specific version of the package (==):

$ pip install emoji==0.4.1

A version other than the specified one (!=):

$ pip install emoji!=0.4.1

A version equal to or greater than a specific version (>=):

$ pip install emoji>=0.4.0

A version of the package in the specified range (>=X.Y.T, <=X.Y.Z):

$ pip install emoji>=0.4.0, <=0.4.9

For a full specification of the version specifiers, refer to this page. Generally the most useful specifier here is == to pip install a specific version of a package. If we don’t constrain pip, it will grab the latest version of the package.

You may be wondering why you would want to install an older version of a Python package in the first place:

One good example of that is if you are following a tutorial which might have used the latest version of a package when it was written but which could be an older version by the time you are reading it. If you want to follow along, you would do well by installing the same version that the author used.
Another example is that if you start writing code for an app today, chances are that the packages that you use today will evolve, and new versions of it will be released in the future (while your app is “stuck” with the versions that you use today).

Programmers freeze requirements to keep track of the versions of the different packages that are installed on development and production environments. One of the objectives is to be able to replicate the environments as needed. Dan’s course on Python dependency management goes into more detail on that topic.

Let’s continue on and run pip freeze again after installing the emoji package. You should now see it included in the list of all installed Python modules:

$ pip freeze
emoji==0.4.5

As expected, pip freeze now lists the emoji package as an added dependency with a specific version number.

I now go back to my Python interpreter session, and run import emoji, and this time around Python doesn’t complain, which is a good sign. I test it, and get the following output:

Success, at last! We just installed and then imported a third-party Python module. Great job 🙂

It’s typical for an application to have several interdependent packages. For instance, running pip freeze on the virtual environment that I use to develop tumblingprogrammer.com, will output the following list of modules:

appdirs==1.4.3
beautifulsoup4==4.6.0
Django==1.11.1
django-bootstrap3==8.2.3
django-crispy-forms==1.6.1
django-debug-toolbar==1.8
(...)
pyparsing==2.2.0
pytz==2017.2
PyYAML==3.12
selenium==3.4.1
six==1.10.0
sqlparse==0.2.3
tornado==4.5.1

That’s a total of 25 Python packages. And it’s a fairly simple application. Later on I’ll describe a way to visualize the interdependency between packages.

Capturing Installed Python Packages with Requirements Files

Developers get in the habit of freezing requirements every time that a package or a dependency gets installed on their projects. We do that by running the following pip command:

$ pip freeze > requirements.txt

This dumps the output of pip freeze into a requirements.txt file on the working directory.

Let’s assume now that for some reason we need to install MarkupSafe version 0.11. Let’s assume also that we have gotten ahead, installed it, tested it, and that our app behaves as we expect it to.

Let’s run pip freeze, which would only output our two packages, as shown below:

$ pip freeze
emoji==0.4.5
MarkupSafe==0.11

To continue with our learning, let’s go ahead and install Flask, a popular web microframework. We’ll grab the latest version of it by running pip install flask.

I get the following output (if you are following along, yours might differ a little bit, for my computer had cached the files from a previous install):

$ pip install flask
Collecting flask
  Using cached Flask-0.12.2-py2.py3-none-any.whl
Collecting itsdangerous>=0.21 (from flask)
Collecting Jinja2>=2.4 (from flask)
  Using cached Jinja2-2.9.6-py2.py3-none-any.whl
Collecting click>=2.0 (from flask)
  Using cached click-6.7-py2.py3-none-any.whl
Collecting Werkzeug>=0.7 (from flask)
  Using cached Werkzeug-0.12.2-py2.py3-none-any.whl
Collecting MarkupSafe>=0.23 (from Jinja2>=2.4->flask)
Installing collected packages: itsdangerous, MarkupSafe, Jinja2, click, Werkzeug, flask
  Found existing installation: MarkupSafe 0.11
    Uninstalling MarkupSafe-0.11:
      Successfully uninstalled MarkupSafe-0.11
Successfully installed Jinja2-2.9.6 MarkupSafe-1.0 Werkzeug-0.12.2 click-6.7 flask-0.12.2 itsdangerous-0.24

Flask, being a more complex package, has some dependencies (Werkzeug, itsdangerous, etc.) which are installed with it automatically through the pip install command.

I want to call your attention to the following lines, extracted from the above listing:

...
  Found existing installation: MarkupSafe 0.11
    Uninstalling MarkupSafe-0.11:
      Successfully uninstalled MarkupSafe-0.11
...

Take a close look…

You’ll see that pip doesn’t have a way of reconciling conflicting dependencies. Without even warning us, it went ahead and replaced version 0.11 with version 1.0 of our MarkupSafe package. And that could be trouble for our application.

At that point in time, we run our app tests (assuming that have them), and dig into our application to make sure that the changes between 0.11 and 1.0 of the MarkupSafe package don’t break it.

If I were to face this situation in real life, I would roll back the changes first by uninstalling Flask and its dependencies and restore the packages that I had before. Then I would upgrade MarkupSafe to 1.0, test to make sure that the application works as expected. And then—and only then—would I re-install Flask.

Assuming that we have gone through rolling back, upgrading, testing, and re-installing Flask, if we run pip freeze now, we get 7 packages in total:

$ pip freeze
click==6.7
emoji==0.4.5
Flask==0.12.2
itsdangerous==0.24
Jinja2==2.9.6
MarkupSafe==1.0
Werkzeug==0.12.2

Let’s go ahead and freeze our requirements into a requirements.txt file by running pip freeze > requirements.txt.

Now we’re going to add another package with dependencies to increase the complexity of our setup. We’ll install version 6.0 of a package called alembic by running:

$ pip install alembic==0.6
Collecting alembic==0.6
Collecting Mako (from alembic==0.6)
Collecting SQLAlchemy>=0.7.3 (from alembic==0.6)
Requirement already satisfied: MarkupSafe>=0.9.2 in /Users/puma/.virtualenvs/pip-tutorial/lib/python3.5/site-packages (from Mako->alembic==0.6)
Installing collected packages: Mako, SQLAlchemy, alembic
Successfully installed Mako-1.0.7 SQLAlchemy-1.1.11 alembic-0.6.0

I now call your attention to the following line from the above listing:

...
Requirement already satisfied: MarkupSafe>=0.9.2 in /Users/puma/.virtualenvs/pip-tutorial/lib/python3.5/site-packages (from Mako->alembic==0.6)
...

Which means that alembic also depends on MarkupSafe. More complexity, huh? Let’s run pip freeze:

$ pip freeze
alembic==0.6.0
click==6.7
emoji==0.4.5
Flask==0.12.2
itsdangerous==0.24
Jinja2==2.9.6
Mako==1.0.7
MarkupSafe==1.0
SQLAlchemy==1.1.11
Werkzeug==0.12.2

The listing above showing all the packages on our emoji application is not very helpful at the moment, for it doesn’t give us info on dependencies (it only lists packages in alphabetical order). Let’s fix that.

Visualizing Installed Packages

One good package to have installed on our environment is pipdeptree, which shows the dependency tree of packages. Let’s go ahead and install the latest version of it by running the following command:

$ pip install pipdeptree

Once it’s done, let’s run pip freeze to see what we get:

$ pip freeze
alembic==0.6.0
click==6.7
emoji==0.4.5
Flask==0.12.2
itsdangerous==0.24
Jinja2==2.9.6
Mako==1.0.7
MarkupSafe==1.0
pipdeptree==0.10.1
SQLAlchemy==1.1.11
Werkzeug==0.12.2

We now get 11 packages, as we have added pipdeptree, which had no dependencies. Let’s run pipdeptree on the terminal to see what it does. Below is the output that I get on my machine:

$ pipdeptree
alembic==0.6.0
  - Mako [required: Any, installed: 1.0.7]
    - MarkupSafe [required: >=0.9.2, installed: 1.0]
  - SQLAlchemy [required: >=0.7.3, installed: 1.1.11]
emoji==0.4.5
Flask==0.12.2
  - click [required: >=2.0, installed: 6.7]
  - itsdangerous [required: >=0.21, installed: 0.24]
  - Jinja2 [required: >=2.4, installed: 2.9.6]
    - MarkupSafe [required: >=0.23, installed: 1.0]
  - Werkzeug [required: >=0.7, installed: 0.12.2]
pipdeptree==0.10.1
  - pip [required: >=6.0.0, installed: 9.0.1]
setuptools==36.2.0
wheel==0.29.0

We notice much more useful information here, including dependencies, and the minimum versions required for dependent packages to work properly.

Notice, once again, how MarkupSafe is listed twice, as both Jinja2 (and Flask) and Mako (and alembic) depend on it. That’s very useful information to troubleshoot things gone ugly.

We also notice other packages here that pip freeze doesn’t list, including pip, setuptools and wheel. The reason is that by default pip freeze doesn’t list packages that pip itself depends on.

We can use the --all flag to show also those packages. Let’s test this by running pip freeze --all, in which case we get:

$ pip freeze --all
alembic==0.6.0
click==6.7
emoji==0.4.5
Flask==0.12.2
itsdangerous==0.24
Jinja2==2.9.6
Mako==1.0.7
MarkupSafe==1.0
pip==9.0.1
pipdeptree==0.10.1
setuptools==36.2.0
SQLAlchemy==1.1.11
Werkzeug==0.12.2
wheel==0.29.0

Another benefit of using pipdeptree is that it warns us about conflicting dependencies, including circular ones (where packages depend on one another), but I have yet to see that in action. So far I couldn’t replicate the functionality on my system. You can find more info about the tool on its PyPI page.

Installing Python Packages From a `requirements.txt` File

If you have a requirements.txt file, you can install all the packages listed there by running the following command:

$ pip install -r /path/to/the/file/requirements.txt

This is very handy when we want to replicate environments and have access to a requirements.txt that reflects the makeup of it.

Uninstalling Python Packages With Pip

In this section you’ll see how to uninstall individual Python packages from your system or active virtual environment, how you can remove multiple packages at once with a single command, and how you can remove all installed Python packages.

Uninstalling individual packages:

You can do so by running, as an example, pip uninstall alembic. Let’s do that on our setup to see what happens. Here is the output on my end:

$ pip uninstall alembic
Uninstalling alembic-0.6.0:
  /Users/puma/.virtualenvs/pip-tutorial/bin/alembic
  ... a bunch on other files ...
  /Users/puma/.virtualenvs/pip-tutorial/lib/python3.5/site-packages/alembic/util.py
Proceed (y/n)? y
  Successfully uninstalled alembic-0.6.0

Let’s run pipdeptree to see what our setup looks like:

$ pipdeptree
emoji==0.4.5
Flask==0.12.2
  - click [required: >=2.0, installed: 6.7]
  - itsdangerous [required: >=0.21, installed: 0.24]
  - Jinja2 [required: >=2.4, installed: 2.9.6]
    - MarkupSafe [required: >=0.23, installed: 1.0]
  - Werkzeug [required: >=0.7, installed: 0.12.2]
Mako==1.0.7
  - MarkupSafe [required: >=0.9.2, installed: 1.0]
pipdeptree==0.10.1
  - pip [required: >=6.0.0, installed: 9.0.1]
setuptools==36.2.0
SQLAlchemy==1.1.11
wheel==0.29.0

If you look carefully, you may notice that the alembic dependencies are still present, because pip uninstall does not get rid of them, by design.

We have to manually do that (there are other options, which we will cover below). Therefore, it is extremely important that we freeze our requirements and commit changes to our requirements.txt file every time that we install or uninstall packages so we know what our setup should look like if we need to roll back changes.

Uninstalling multiple Python packages at once:

You can also uninstall several packages at once, by using the following command-line syntax:

$ pip uninstall package1 package2 ...

Another option is reading the list of packages to uninstall from a requirements file. Similar to its install counterpart, if you have a requirements.txt file, you can uninstall all the packages listed there like so:

$ pip uninstall -r /path/to/the/file/requirements.txt

Note that we could wipe out all the packages on our setup, which could actually be quite useful. Let’s take a look at an example.

The output below is a list of my git commits log (gl is an alias on my bash profile for a prettified git log):

$ gl
* 40f4f37 - (HEAD -> master) all packages in (37 minutes ago) <Jose Pumar>
* 2d00cf5 - emoji + markupsafe + flask + alembic (56 minutes ago) <Jose Pumar>
* e52002b - emoji + MarkupSafe + Flask (84 minutes ago) <Jose Pumar>
* 9c48895 - emoji + MarkupSafe (86 minutes ago) <Jose Pumar>
* 3a797b3 - emoji + MarkSafe (2 hours ago) <Jose Pumar>
* ... other commits...

If I change my mind and decide that I don’t need alembic any more, I could delete all the packages by running pip uninstall -r requirements.txt while on commit 40f4f37 (the HEAD).

If I do it, it gives me a bunch of warnings and asks me if I want to proceed several times (once for each package), to which I say yes. I could have avoided that by using the flag -y, as in:

$ pip uninstall -y -r requirements.txt

The -y flag tells pip not to ask for confirmation of uninstall deletions. If we run pip freeze after this operation, we’ll get an empty packages list, which is what we want.

We then checkout commit e52002b (the last safe commit before we installed alembic), and run pip install -r requirements.txt to reinstate the packages that we had at that point in time.

Removing all installed Python packages:

Sometimes it can be useful to remove all installed packages in a virtual environment or on your system Python install. It can help you get back to a clean slate.

Running the following command will uninstall all Python packages in the currently active environment:

$ pip freeze | xargs pip uninstall -y

This command works by first listing all installed packages using the freeze command, and then feeding the list of packages into the pip uninstall command to remove them.

Adding the -y flag automatically confirms the uninstallation so you don’t have to stick around hammering the “y” key on your keyboard.

Installing and Uninstalling Python Packages with the “pip” Package Manager – Conclusion

Although we covered a lot of ground and shed light on key commands and major challenges that you may face when dealing with installing and uninstalling Python packages and their dependencies.

In summary, the workflow for the installation of a Python package with pip is as follows:

Make sure that you are using a virtual environment.
Identify the need for a new package.
Research potential candidate packages: Look for the maturity of the package, its documentation, etc. See what you can find regarding its dependencies. For example, other packages that have to be installed so the package works properly. Sometimes the documentation refers to them.
Install the package and its dependent packages: pip will do this for you. Look for version upgrades in the pip installation log.
Test your application to make sure that the package meets your needs and that the package and or its dependent packages don’t break it.
Freeze your requirements: Run pip freeze > requirements.txt if tests show your application is still okay and works as intended.
Commit the changes to Git or the version control tool of your choice.
Repeat.

There is a lot more to cover, especially when it comes to dependency management, which has long-term implications on how we setup and configure our Python projects.

Such a complexity is one of the factors that make necessary to implement different settings and configurations to account for the distinct needs of our development, staging, and production environments.

Happy pip-ing!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to Check if a File Exists in Python

Tue, 18 Jul 2017 00:00:00 GMT

How to Check if a File Exists in Python

A tutorial on how to find out whether a file (or directory) exists using Python built-ins and functions from the standard library.

The ability to check whether a file exists on disk or not is important for many types of Python programs:

Maybe you want to make sure a data file is available before you try to load it, or maybe you want to prevent overwriting an existing file. The same is true for directories—maybe you need to ensure an output folder is available before your program runs.

In Python, there are several ways to verify a file or directory exists using functions built into the core language and the Python standard library.

In this tutorial you’ll see three different techniques for file existence checks in Python, with code examples and their individual pros and cons.

Let’s take a look!

Option #1: `os.path.exists()` and `os.path.isfile()`

The most common way to check for the existence of a file in Python is using the exists() and isfile() methods from the os.path module in the standard library.

These functions are available on Python 2 and 3, and they’re usually the first suggestion that comes up when you consult the Python docs or a search engine on how to solve this problem.

Here’s a demo of how to work with the os.path.exists() function. I’m checking several paths (files and directories) for existence in the example below:

>>> import os.path
>>> os.path.exists('mydirectory/myfile.txt')
True
>>> os.path.exists('does-not-exist.txt')
False
>>> os.path.exists('mydirectory')
True

As you just saw, calling os.path.exists() will return True for files and directories. If you want to ensure that a given path points to a file and not to a directory, you can use the os.path.isfile() function:

>>> import os.path
>>> os.path.isfile('mydirectory/myfile.txt')
True
>>> os.path.isfile('does-not-exist.txt')
False
>>> os.path.isfile('mydirectory')
False

With both functions it’s important to keep in mind that they will only check if a file exists—and not if the program actually has access to it. If verifying access is important then you should consider simply opening the file while looking out for an I/O exception (IOError) to be raised.

We’ll come back to this technique in the summary at the end of the tutorial. But before we do that, let’s take a look at another option for doing file existence checks in Python.

Option #2: `open()` and `try...except`

You just saw how functions in the os.path module can be used to check for the existence of a file or a folder.

Here’s another straightforward Python algorithm for checking whether a file exists: You simply attempt to open the file with the built-in open() function, like so:

>>> open('does-not-exist.txt')
FileNotFoundError:
"[Errno 2] No such file or directory: 'does-not-exist.txt'"

If the file exists the open call will complete successfully and return a valid file handle. If the file does not exist however, a FileNotFoundError exception will be raised:

“FileNotFoundError is raised when a file or directory is requested but doesn’t exist. Corresponds to errno ENOENT.” (Source: Python Docs)

This means you can watch for this FileNotFoundError exception type in your own code, and use it to detect whether a file exists or not. Here’s a code example that demonstrates this technique:

try:
    f = open('myfile.txt')
    f.close()
except FileNotFoundError:
    print('File does not exist')

Notice how I’m immediately calling the close() method on the file object to release the underlying file handle. This is generally considered a good practice when working with files in Python:

If you don’t close the file handle explicitly it is difficult to know when exactly it will be closed automatically by the Python runtime. This wastes system resources and can make your programs run less efficiently.

Instead of closing the file explicitly with the close() method, another option here would be to use the context manager protocol and the with statement to auto-close the file.

Now, the same “just attempt to open it” technique also works for ensuring a file is both readable and accessible. Instead of watching for FileNotFoundError exceptions you’ll want to look out for any kind of IOError:

try:
    f = open('myfile.txt')
    f.close()
except IOError:
    print('File is not accessible')
print('File is accessible')

If you frequently use this pattern you can factor it out into a helper function that will allow you to test whether a file exists and is accessible at the same time:

def is_accessible(path, mode='r'):
    """
    Check if the file or directory at `path` can
    be accessed by the program using `mode` open flags.
    """
    try:
        f = open(path, mode)
        f.close()
    except IOError:
        return False
    return True

Alternatively, you can use the os.access() function in the standard library to check whether a file exists and is accessible at the same time. This would be more similar to using the os.path.exists() function for checking if a file exists.

Using open() and a try...except clause has some advantages when it comes to file handling in Python. It can help you avoid bugs caused by file existence race conditions:

Imagine a file exists in the instant you run the check, only to get removed a millisecond later. When you actually want to open the file to work with it, it’s gone and your program aborts with an error.

I’ll cover this edge case in some more detail in the summary below. But before we get down another rabbit hole—let’s take a look at one more option for checking if a file or folder exists in Python.

Option #3: `pathlib.Path.exists()` (Python 3.4+)

Python 3.4 and above include the pathlib module that provides an object-oriented interface for dealing with file system paths. Using this module is much nicer than treating file paths as simple string objects.

It provides abstractions and helper functions for many file system operations, including existence checks and finding out whether a path points to a file or a directory.

To check whether a path points to a valid file you can use the Path.exists() method. To find out whether a path is a file or a symbolic link, instead of a directory, you’ll want to use Path.is_file().

Here’s a working example for both pathlib.Path methods:

>>> import pathlib
>>> path = pathlib.Path('myfile.txt')
>>> path.exists()
True
>>> path.is_file()
True

As you can tell, this approach is very similar to doing an existence check with functions from the os.path module.

The key difference is that pathlib provides a cleaner object-oriented interface for working with the file system. You’re no longer dealing with plain str objects representing file paths—but instead you’re handling Path objects with relevant methods and attributes on them.

Using pathlib and taking advantage of its object-oriented interface can make your file handling code more readable and more maintainable. I’m not going to lie to you and say this is a panacea. But in some cases it can help you write “better” Python programs.

The pathlib module is also available as a backported third-party module on PyPI that works on Python 2.x and 3.x. You can find it here: pathlib2

Summary: Checking if a File Exists in Python

In this tutorial we compared three different methods for determining whether a file exists in Python. One method also allowed us to check if a file exists and is accessible at the same time.

Of course, with three implementations to choose from you might be wondering:

What’s the preferred way to check if a file exists using Python?

In most cases where you need a file existence check I’d recommend you use the built-in pathlib.Path.exists() method on Python 3.4 and above, or the os.path.exists() function on Python 2.

However, there’s one important caveat to consider:

Keep in mind that just because a file existed when the check ran won’t guarantee that it will still be there when you’re ready to open it:

While unlikely under normal circumstances, it’s entirely possible for a file to exist in the instant the existence check runs, only to get deleted immediately afterwards.

To avoid this type of race condition, it helps to not only rely on a “Does this file exist?” check. Instead it’s usually better to simply attempt to carry out the desired operation right away. This is also called an “easier to ask for forgiveness than permission” (EAFP) style that’s usually recommended in Python.

For example, instead of checking first if a file exists before opening it, you’ll want to simply try to open it right away and be prepared to catch a FileNotFoundError exception that tells you the file wasn’t available. This avoids the race condition.

So, if you plan on working with a file immediately afterwards, for example by reading its contents or by appending new data to it, I would recommend that you do the existence check via the open() method and exception handling in an EAFP style. This will help you avoid race conditions in your Python file handling code.

If you’d like to dig deeper into the subject, be sure to watch my YouTube tutorial on file existence checks in Python. It’s also embedded at the top of the article. Happy Pythoning!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to Reverse a List in Python

Tue, 11 Jul 2017 00:00:00 GMT

How to Reverse a List in Python

A step-by-step tutorial on the three main ways to reverse a Python list or array: in-place reversal, list slicing, and reverse iteration.

Reversing a list is a common operation in Python programming.

For example, imagine you had a sorted list of customer names that your program displays in alphabetical (A-Z) order. Some of your users would like to view the customer list so that the names are in reverse alphabetical order. How are you going to flip the order of this existing list on its head? Or in other words:

What’s the best way to reverse the order of a list in Python?

In this article you’ll see three different ways to achieve this result in “plain vanilla” Python, meaning without the use of any third-party libraries:

Reversing a list in-place with the list.reverse() method
Using the “[::-1]” list slicing trick to create a reversed copy
Creating a reverse iterator with the reversed() built-in function

All examples I’m using here will be based on the following list object containing the numbers 1 through 5:

# You have this:
[1, 2, 3, 4, 5]

# And you want that:
[5, 4, 3, 2, 1]

Ready? Let’s reverse some lists together!

Option #1: Reversing a List In-Place With the `list.reverse()` Method

Every list in Python has a built-in reverse() method you can call to reverse the contents of the list object in-place. Reversing the list in-place means won’t create a new list and copy the existing elements to it in reverse order. Instead, it directly modifies the original list object.

Here’s an example:

>>> mylist = [1, 2, 3, 4, 5]
>>> mylist
[1, 2, 3, 4, 5]

>>> mylist.reverse()
None

>>> mylist
[5, 4, 3, 2, 1]

As you can see, calling mylist.reverse() returned None, but modified the original list object. This implementation was chosen deliberately by the developers of the Python standard library:

The reverse() method modifies the sequence in place for economy of space when reversing a large sequence. To remind users that it operates by side effect, it does not return the reversed sequence. (Source: Python 3 Docs)

In-place reversal has some benefits and some downsides. On the plus side, it’s a fast operation—shuffling the list elements around doesn’t require much extra memory, as we’re not creating a full copy of the list.

However, reversing a list in-place overwrites the original sort order. This could be a potential downside. (Of course, to restore the original order you coud simply reverse the same list again.)

From a code readability standpoint, I like this approach. The syntax is clear and easy to understand, even for developers new to Python or someone who comes from another language background.

Option #2: Using the “`[::-1]`” Slicing Trick to Reverse a Python List

Python’s list objects have an interesting feature called slicing. You can view it as an extension of the square-brackets indexing syntax. It includes a special case where slicing a list with “[::-1]” produces a reversed copy:

>>> mylist
[1, 2, 3, 4, 5]

>>> mylist[::-1]
[5, 4, 3, 2, 1]

Reversing a list this way takes up a more memory compared to an in-place reversal because it creates a (shallow) copy of the list. And creating the copy requires allocating enough space to hold all of the existing elements.

Note that this only creates a “shallow” copy where the container is duplicated, but not the individual list elements. Instead of duplicating the list elements themselves, references to the original elements are reused in the new copy of the container. If the elements are mutable, modifying an element in the original list will also be reflected in the copy.

The biggest downside to reversing a list with the slicing syntax is that it uses a more advanced Python feature that some people would say is “arcane.” I don’t blame them—list slicing is fast, but also a little difficult to understand the first time you encounter its quirky syntax.

When I’m reading Python code that makes use of list slicing I often have to slow down and concentrate to “mentally parse” the statement, to make sure I understand what’s going on. My biggest gripe here is that the “[::-1]” slicing syntax does not communicate clearly enough that it creates a reversed copy of the original list.

Using Python’s slicing feature to reverse a list is a decent solution, but it can be a difficult to read to the uninitiated. Be sure to remember the wise words of master Yoda: With great power, great responsibility comes 🙂

Sidebar: How does list slicing work in Python?

Reversing a list this way takes advantage of Python’s “slicing” syntax that can be used to do a number of interesting things. List slicing uses the “[]” indexing syntax with the following “[start:stop:step]” pattern:

>>> mylist[start:end:step]
>>> mylist
[1, 2, 3, 4, 5]
>>> mylist[1:3]
[2, 3]

Adding the “[1:3]” index tells Python to give us a slice of the list from index 1 to index 2. To avoid off-by-one errors it’s important to remember that the upper bound is exclusive—this is why we only got [2, 3] as the sub-list from the [1:3] slice.

All of the indexes are optional, by the way. You can leave them out and, for example, create a full (shallow) copy of a list like this:

>>> mylist[:]
[1, 2, 3, 4, 5]

The step parameter, sometimes called the stride, is also interesting. Here’s how you can create a copy of a list that only includes every other element of the original:

>>> mylist[::2]
[1, 3, 5]

Earlier we used the same “step” trick to reverse a list using slicing:

>>> mylist[::-1]
[5, 4, 3, 2, 1]

We ask Python to give us the full list (::), but to go over all of the elements from back to front by setting the step to -1. Pretty neat, eh?

Option #3: Creating a Reverse Iterator With the `reversed()` Built-In Function

Reversing a list using reverse iteration with the reversed() built-in is another option. It neither reverses a list in-place, nor does it create a full copy. Instead we get a reverse iterator we can use to cycle through the elements of the list in reverse order:

>>> mylist = [1, 2, 3, 4, 5]
>>> for item in reversed(mylist):
...     print(item)
5
4
3
2
1
>>> mylist
>>> [1, 2, 3, 4, 5]

Using reversed() does not modify the original list. In fact all we get is a “view” into the existing list that we can use to look at all the elements in reverse order. This is a powerful technique that takes advantage of Python’s iterator protocol.

So far, all we did was iterating over the elements of a list in reverse order. But how can you create a reversed copy of a list using Python’s reversed() function?

Here’s how:

>>> mylist = [1, 2, 3, 4, 5]
>>> list(reversed(mylist))
[5, 4, 3, 2, 1]

Notice how I’m calling the list() constructor on the result of the reversed() function?

Using the list constructor built-in keeps iterating until the (reverse) iterator is exhausted, and puts all the elements fetched from the iterator into a new list object. And this gives us the desired result: A reversed shallow copy of the original list.

I really like this reverse iterator approach for reversing lists in Python. It communicates clearly what is going on, and even someone new to the language would intuitively understand we’re creating a reversed copy of the list. And while understanding how iterators work at a deeper level is helpful, it’s not absolutely necessary to use this technique.

Summary: Reversing Lists in Python

List reversal is a fairly common operation in programming. In this tutorial we covered three different approaches for reversing a list or array in Python. Let’s do a quick recap on each of those approaches before I’ll give you my final verdict on which option I recommend the most:

Option 1: list.reverse()

Python lists can be reversed in-place with the list.reverse() method. This is a great option to reverse the order of a list (or any mutable sequence) in Python. It modifies the original container in-place which means no additional memory is required. However the downside is, of course, that the original list is modified.

>>> lst = [1, 2, 3, 4, 5]
>>> lst.reverse()
>>> lst
[5, 4, 3, 2, 1]

Reverses the list in-place
Fast, doesn’t take up extra memory
Modifies the original list

Option 2: List Slicing Trick

You can use Python’s list slicing syntax to create a reversed copy of a list. This works well, however it is slightly arcane and therefore not very Pythonic, in my opinion.

>>> lst = [1, 2, 3, 4, 5]
>>> lst[::-1]
[5, 4, 3, 2, 1]

Creates a reversed copy of the list
Takes up memory but doesn’t modify the original

Option 3: reversed()

Python’s built-in reversed() function allows you to create a reverse iterator for an existing list or sequence object. This is a flexible and clean solution that relies on some advanced Python features—but it remains readable due to the clear naming of the reversed() function.

>>> lst = [1, 2, 3, 4, 5]
>>> list(reversed(lst))
[5, 4, 3, 2, 1]

Returns an iterator that returns elements in reverse order
Doesn’t modify the original
Might need to be converted into a list object again

If you’re wondering what the “best” way is to reverse a list in Python my answer will be: “It depends.” Personally, I like the first and third approach:

The list.reverse() method is fast, clear and speaks for itself. Whenever you have a situation where you want to reverse a list in-place and don’t want a copy and it’s okay to modify the original, then I would go with this option.
If that isn’t possible, I would lean towards the reversed iterator approach where you call reversed() on the list object and you either cycle through the elements one by one, or you call the list() function to create a reversed copy. I like this solution because it’s fast and clearly states its intent.

I don’t like the list slicing trick as much. It feels “arcane” and it can be difficult to see at a glance what’s going on. I try to avoid using it for this reason.

Note that there are other approaches like implementing list reversal from scratch or reversing a list using a recursive algorithm that are common interview questions, but not very good solutions for Python programming in the “real world.” That’s why I didn’t cover them in this tutorial.

If you’d like to dig deeper into the subject, be sure to watch my YouTube tutorial on list reversal in Python. It’s also embedded at the top of the article. Happy Pythoning!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Array Data Structures in Python

Tue, 04 Jul 2017 00:00:00 GMT

Array Data Structures in Python

How to implement arrays in Python using only built-in data types and classes from the standard library. Includes code examples and recommendations.

An array is a fundamental data structure available in most programming languages and it has a wide range of uses across different algorithms.

In this article we’ll take a look at array implementations in Python that only use core language features or functionality included in the Python standard library.

You’ll see the strengths and weaknesses of each approach so you can decide which implementation is right for your use case.

But before we jump in—let’s cover some of the basics first.

So, how do arrays work in Python and what are they used for?

Arrays consist of fixed-size data records that allow each element to be efficiently located based on its index.

Because arrays store information in adjoining blocks of memory they’re considered contiguous data structures (as opposed to a linked data structure like a linked list, for example.)

A real world analogy for an array data structure is a parking lot:

You can look at the parking lot as a whole and treat it as a single object. But inside the lot there are parking spots indexed by a unique number. Parking spots are containers for vehicles—each parking spot can either be empty or have a car, a motorbike, or some other vehicle parked on it.

But not all parking lots are the same:

Some parking lots may be restricted to only one type of vehicle. For example, a motorhome parking lot wouldn’t allow bikes to be parked on it. A “restricted” parking lot corresponds to a “typed array” data structure that only allows elements that have the same data type stored in them.

Performance-wise it’s very fast to look up an element contained in an array given the element’s index. A proper array implementation guarantees a constant O(1) access time for this case.

Python includes several array-like data structures in its standard library that each have slightly different characteristics. If you’re wondering how to declare an array in Python, this list will help pick the right data structure.

Let’s take a look at the available options:

✅ `list` – Mutable Dynamic Arrays

Lists are a part of the core Python language. Despite their name, Python’s lists are implemented as dynamic arrays behind the scenes. This means lists allow elements to be added or removed and they will automatically adjust the backing store that holds these elements by allocating or releasing memory.

Python lists can hold arbitrary elements—“everything” is an object in Python, including functions. Therefore you can mix and match different kinds of data types and store them all in a single list.

This can be a powerful feature, but the downside is that supporting multiple data types at the same time means that data is generally less tightly packed and the whole structure takes up more space as a result.

>>> arr = ['one', 'two', 'three']
>>> arr[0]
'one'

# Lists have a nice repr:
>>> arr
['one', 'two', 'three']

# Lists are mutable:
>>> arr[1] = 'hello'
>>> arr
['one', 'hello', 'three']

>>> del arr[1]
>>> arr
['one', 'three']

# Lists can hold arbitrary data types:
>>> arr.append(23)
>>> arr
['one', 'three', 23]

✅ `tuple` – Immutable Containers

Tuples are a part of the Python core language. Unlike lists Python’s tuple objects are immutable, this means elements can’t be added or removed dynamically—all elements in a tuple must be defined at creation time.

Just like lists, tuples can hold elements of arbitrary data types. Having this flexibility is powerful, but again it also means that data is less tightly packed than it would be in a typed array.

>>> arr = 'one', 'two', 'three'
>>> arr[0]
'one'

# Tuples have a nice repr:
>>> arr
('one', 'two', 'three')

# Tuples are immutable:
>>> arr[1] = 'hello'
TypeError: "'tuple' object does not support item assignment"

>>> del arr[1]
TypeError: "'tuple' object doesn't support item deletion"

# Tuples can hold arbitrary data types:
# (Adding elements creates a copy of the tuple)
>>> arr + (23,)
('one', 'two', 'three', 23)

✅ `array.array` – Basic Typed Arrays

Python’s array module provides space-efficient storage of basic C-style data types like bytes, 32-bit integers, floating point numbers, and so on.

Arrays created with the array.array class are mutable and behave similarly to lists—except they are “typed arrays” constrained to a single data type.

Because of this constraint array.array objects with many elements are more space-efficient than lists and tuples. The elements stored in them are tightly packed and this can be useful if you need to store many elements of the same type.

Also, arrays support many of the same methods as regular lists. For example, to append to an array in Python you can just use the familiar array.append() method.

As a result of this similarity between Python lists and array objects, you might be able to use it as a “drop-in replacement” without requiring major changes to your application.

>>> import array
>>> arr = array.array('f', (1.0, 1.5, 2.0, 2.5))
>>> arr[1]
1.5

# Arrays have a nice repr:
>>> arr
array('f', [1.0, 1.5, 2.0, 2.5])

# Arrays are mutable:
>>> arr[1] = 23.0
>>> arr
array('f', [1.0, 23.0, 2.0, 2.5])

>>> del arr[1]
>>> arr
array('f', [1.0, 2.0, 2.5])

>>> arr.append(42.0)
>>> arr
array('f', [1.0, 2.0, 2.5, 42.0])

# Arrays are "typed":
>>> arr[1] = 'hello'
TypeError: "must be real number, not str"

✅ `str` – Immutable Arrays of Unicode Characters

Python 3.x uses str objects to store textual data as immutable sequences of Unicode characters. Practically speaking that means a str is an immutable array of characters. Oddly enough it’s also a recursive data structure—each character in a string is a str object of length 1 itself.

String objects are space-efficient because they’re tightly packed and specialize in a single data type. If you’re storing Unicode text you should use them. Because strings are immutable in Python modifying a string requires creating a modified copy. The closest equivalent to a “mutable string” is storing individual characters inside a list.

>>> arr = 'abcd'
>>> arr[1]
'b'

>>> arr
'abcd'

# Strings are immutable:
>>> arr[1] = 'e'
TypeError: "'str' object does not support item assignment"

>>> del arr[1]
TypeError: "'str' object doesn't support item deletion"

# Strings can be unpacked into a list to
# get a mutable representation:
>>> list('abcd')
['a', 'b', 'c', 'd']
>>> ''.join(list('abcd'))
'abcd'

# Strings are recursive data structures:
>>> type('abc')
"<class 'str'>"
>>> type('abc'[0])
"<class 'str'>"

✅ `bytes` – Immutable Arrays of Single Bytes

Bytes objects are immutable sequences of single bytes (integers in the range of 0 <= x <= 255). Conceptually they’re similar to str objects and you can also think of them as immutable arrays of bytes.

Like strings, bytes have their own literal syntax for creating objects and they’re space-efficient. Bytes objects are immutable, but unlike strings there’s a dedicated “mutable byte array” data type called bytearray that they can be unpacked into. You’ll hear more about that in the next section.

>>> arr = bytes((0, 1, 2, 3))
>>> arr[1]
1

# Bytes literals have their own syntax:
>>> arr
b'\x00\x01\x02\x03'
>>> arr = b'\x00\x01\x02\x03'

# Only valid "bytes" are allowed:
>>> bytes((0, 300))
ValueError: "bytes must be in range(0, 256)"

# Bytes are immutable:
>>> arr[1] = 23
TypeError: "'bytes' object does not support item assignment"

>>> del arr[1]
TypeError: "'bytes' object doesn't support item deletion"

✅ `bytearray` – Mutable Arrays of Single Bytes

The bytearray type is a mutable sequence of integers in the range 0 <= x <= 255. They’re closely related to bytes objects with the main difference being that bytearrays can be modified freely—you can overwrite elements, remove existing elements, or add new ones. The bytearray object will grow and shrink appropriately.

Bytearrays can be converted back into immutable bytes objects but this incurs copying the stored data in full—an operation taking O(n) time.

>>> arr = bytearray((0, 1, 2, 3))
>>> arr[1]
1

# The bytearray repr:
>>> arr
bytearray(b'\x00\x01\x02\x03')

# Bytearrays are mutable:
>>> arr[1] = 23
>>> arr
bytearray(b'\x00\x17\x02\x03')

>>> arr[1]
23

# Bytearrays can grow and shrink in size:
>>> del arr[1]
>>> arr
bytearray(b'\x00\x02\x03')

>>> arr.append(42)
>>> arr
bytearray(b'\x00\x02\x03*')

# Bytearrays can only hold "bytes"
# (integers in the range 0 <= x <= 255)
>>> arr[1] = 'hello'
TypeError: "an integer is required"

>>> arr[1] = 300
ValueError: "byte must be in range(0, 256)"

# Bytearrays can be converted back into bytes objects:
# (This will copy the data)
>>> bytes(arr)
b'\x00\x02\x03*'

Which array implementation should I use in Python?

There are a number of built-in data structures you can choose from when it comes to implementing arrays in Python. In this article we’ve concentrated on core language features and data structures included in the standard library only.

If you’re willing to go beyond the Python standard library, third-party packages like NumPy offer a wide range of fast array implementations for scientific computing.

But focusing on the array data structures included with Python, here’s what your choice comes down to:

You need to store arbitrary objects, potentially with mixed data types? Use a list or a tuple, depending on whether you want an immutable data structure or not.
You have numeric (integer / floating point) data and tight packing and performance is important? Try out array.array and see if it does everything you need. Consider going beyond the standard library and try out packages like NumPy.
You have textual data represented as Unicode characters? Use Python’s built-in str. If you need a “mutable string” use a list of characters.
You want to store a contiguous block of bytes? Use bytes (immutable) or bytearray (mutable).

Personally, I like to start out with a simple list in most cases and only specializing later on if performance or storage space becomes an issue.

This is especially important when you need to make a choice between using a Python list vs an array. The key difference here is that Python arrays are more space-efficient than lists, but that doesn’t automatically make them the right choice in your specific use case.

Most of the time using a general purpose array data structure like list in Python gives you the fastest development speed and the most programming convenience.

I found that this is usually much more important in the beginning than squeezing out every last drop of performance from the start.

Read the full “Fundamental Data Structures in Python” article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Linked Lists in Python

Tue, 27 Jun 2017 00:00:00 GMT

Linked Lists in Python

Learn how to implement a linked list data structure in Python, using only built-in data types and functionality from the standard library.

Every Python programmer should know about linked lists:

They are among the simplest and most common data structures used in programming.

So, if you ever found yourself wondering, “Does Python have a built-in or ‘native’ linked list data structure?” or, “How do I write a linked list in Python?” then this tutorial will help you.

Python doesn’t ship with a built-in linked list data type in the “classical” sense. Python’s list type is implemented as a dynamic array—which means it doesn’t suit the typical scenarios where you’d want to use a “proper” linked list data structure for performance reasons.

Please note that this tutorial only considers linked list implementations that work on a “plain vanilla” Python install. I’m leaving out third-party packages intentionally. They don’t apply during coding interviews and it’s difficult to keep an up-to-date list that considers all packages available on Python packaging repositories.

Before we get into the weeds and look at linked list implementations in Python, let’s do a quick recap of what a linked list data structure is—and how it compares to an array.

What are the characteristics of a linked list?

A linked list is an ordered collection of values. Linked lists are similar to arrays in the sense that they contain objects in a linear order. However they differ from arrays in their memory layout.

Arrays are contiguous data structures and they’re composed of fixed-size data records stored in adjoining blocks of memory. In an array, data is tightly packed—and we know the size of each data record which allows us to quickly look up an element given its index in the array:

Linked lists, however, are made up of data records linked together by pointers. This means that the data records that hold the actual “data payload” can be stored anywhere in memory—what creates the linear ordering is how each data record “points” to the next one:

There are two different kinds of linked lists: singly-linked lists and doubly-linked lists. What you saw in the previous example was a singly-linked list—each element in it has a reference to (a “pointer”) to the next element in the list.

In a doubly-linked list, each element has a reference to both the next and the previous element. Why is this useful? Having a reference to the previous element can speed up some operations, like removing (“unlinking”) an element from a list or traversing the list in reverse order.

How do linked lists and arrays compare performance-wise?

You just saw how linked lists and arrays use different data layouts behind the scenes to store information. This data layout difference reflects in the performance characteristics of linked lists and arrays:

Element Insertion & Removal: Inserting and removing elements from a (doubly) linked list has time complexity O(1), whereas doing the same on an array requires an O(n) copy operation in the worst case. On a linked list we can simply “hook in” a new element anywhere we want by adjusting the pointers from one data record to the next. On an array we have to allocate a bigger storage area first and copy around the existing elements, leaving a blank space to insert the new element into.
Element Lookup: Similarly, looking up an element given its index is a slow O(n) time operation on a linked list—but a fast O(1) lookup on an array. With a linked list we must jump from element to element and search the structure from the “head” of the list to find the index we want. But with an array we can calculate the exact address of an element in memory based on its index and the (fixed) size of each data record.
Memory Efficiency: Because the data stored in arrays is tightly packed they’re generally more space-efficient than linked lists. This mostly applies to static arrays, however. Dynamic arrays typically over-allocate their backing store slightly to speed up element insertions in the average case, thus increasing the memory footprint.

Now, how does this performance difference come into play with Python? Remember that Python’s built-in list type is in fact a dynamic array. This means the performance differences we just discussed apply to it. Likewise, Python’s immutable tuple data type can be considered a static array in this case—with similar performance trade-offs compared to a proper linked list.

Does Python have a built-in or “native” linked list data structure?

Let’s come back to the original question. If you want to use a linked list in Python, is there a built-in data type you can use directly?

The answer is: “It depends.”

As of Python 3.6 (CPython), doesn’t provide a dedicated linked list data type. There’s nothing like Java’s LinkedList built into Python or into the Python standard library.

Python does however include the collections.deque class which provides a double-ended queue and is implemented as a doubly-linked list internally. Under some specific circumstances you might be able to use it as a “makeshift” linked list. If that’s not an option you’ll need to write your own linked list implementation from scratch.

How do I write a linked list using Python?

If you want to stick with functionality built into the core language and into the Python standard library you have two options for implementing a linked list:

You could either use the collections.deque class from the Python standard library and take advantage of the fact that it’s implemented as a doubly-linked list behind the scenes. But this will only work for some use cases—I’ll go into more details on that further down in the article.
Alternatively, you could define your own linked list type in Python by writing it from scratch using other built-in data types. You’d implement your own custom linked list class or base your implementation of Lisp-style chains of tuple objects. Again, see below for more details.

Now that we’ve covered some general questions on linked lists and their availability in Python, read on for examples of how to make both of the above approaches work.

Option 1: Using `collections.deque` as a Linked List

This approach might seem a little odd at first because the collections.deque class implements a double-ended queue, and it’s typically used as the go-to stack or queue implementation in Python.

But using this class as a “makeshift” linked list might make sense under some circumstances. You see, CPython’s deque is powered by a doubly-linked list behind the scenes and it provides a full “list-like” set of functionality.

Under some circumstances, this makes treating deque objects as linked list replacements a valid option. Here are some of the key performance characteristics of this approach:

Inserting and removing elements at the front and back of a deque is a fast O(1) operation. However, inserting or removing in the middle takes O(n) time because we don’t have access to the previous-element or next-element linked list pointers. That’s abstracted away by the deque interface.
Storage is O(n)—but not every element gets its own list node. The deque class uses blocks that hold multiple elements at once and then these blocks are linked together as a doubly-linked list. As of CPython 3.6 the block size is 64 elements. This incurs some space overhead but retains the general performance characteristics given a large enough number of elements.
In-place reversal: In Python 3.2+ the elements in a deque instance can be reversed in-place with the reverse() method. This takes O(n) time and no extra space.

Using collections.deque as a linked list in Python can be a valid choice if you mostly care about insertion performance at the beginning or the end of the list, and you don’t need access to the previous-element and next-element pointers on each object directly.

Don’t use a deque if you need O(1) performance when removing elements. Removing elements by key or by index requires an O(n) search, even if you have already have a reference to the element to be removed. This is the main downside of using a deque like a linked list.

If you’re looking for a linked list in Python because you want to implement queues or a stacks then a deque is a great choice, however.

Here are some examples on how you can use Python’s deque class as a replacement for a linked list:

>>> import collections
>>> lst = collections.deque()

# Inserting elements at the front
# or back takes O(1) time:
>>> lst.append('B')
>>> lst.append('C')
>>> lst.appendleft('A')
>>> lst
deque(['A', 'B', 'C'])

# However, inserting elements at
# arbitrary indexes takes O(n) time:
>>> lst.insert(2, 'X')
>>> lst
deque(['A', 'B', 'X', 'C'])

# Removing elements at the front
# or back takes O(1) time:
>>> lst.pop()
'C'
>>> lst.popleft()
'A'
>>> lst
deque(['B', 'X'])

# Removing elements at arbitrary
# indexes or by key takes O(n) time again:
>>> del lst[1]
>>> lst.remove('B')

# Deques can be reversed in-place:
>>> lst = collections.deque(['A', 'B', 'X', 'C'])
>>> lst.reverse()
deque(['C', 'X', 'B', 'A'])

# Searching for elements takes
# O(n) time:
>>> lst.index('X')
1

Option 2: Writing Your Own Python Linked Lists

If you need full control over the layout of each linked list node then there’s no perfect solution available in the Python standard library. If you want to stick with the standard library and built-in data types then writing your own linked list is your best bet.

You’ll have to make a choice between implementing a singly-linked or a doubly-linked list. I’ll give examples of both, including some of the common operations like how to search for elements, or how to reverse a linked list.

Let’s take a look at two concrete Python linked list examples. One for a singly-linked list, and one for a double-linked list.

✅ A Singly-Linked List Class in Python

Here’s how you might implement a class-based singly-linked list in Python, including some of the standard algorithms:

class ListNode:
    """
    A node in a singly-linked list.
    """
    def __init__(self, data=None, next=None):
        self.data = data
        self.next = next

    def __repr__(self):
        return repr(self.data)


class SinglyLinkedList:
    def __init__(self):
        """
        Create a new singly-linked list.
        Takes O(1) time.
        """
        self.head = None

    def __repr__(self):
        """
        Return a string representation of the list.
        Takes O(n) time.
        """
        nodes = []
        curr = self.head
        while curr:
            nodes.append(repr(curr))
            curr = curr.next
        return '[' + ', '.join(nodes) + ']'

    def prepend(self, data):
        """
        Insert a new element at the beginning of the list.
        Takes O(1) time.
        """
        self.head = ListNode(data=data, next=self.head)

    def append(self, data):
        """
        Insert a new element at the end of the list.
        Takes O(n) time.
        """
        if not self.head:
            self.head = ListNode(data=data)
            return
        curr = self.head
        while curr.next:
            curr = curr.next
        curr.next = ListNode(data=data)

    def find(self, key):
        """
        Search for the first element with `data` matching
        `key`. Return the element or `None` if not found.
        Takes O(n) time.
        """
        curr = self.head
        while curr and curr.data != key:
            curr = curr.next
        return curr  # Will be None if not found

    def remove(self, key):
        """
        Remove the first occurrence of `key` in the list.
        Takes O(n) time.
        """
        # Find the element and keep a
        # reference to the element preceding it
        curr = self.head
        prev = None
        while curr and curr.data != key:
            prev = curr
            curr = curr.next
        # Unlink it from the list
        if prev is None:
            self.head = curr.next
        elif curr:
            prev.next = curr.next
            curr.next = None

    def reverse(self):
        """
        Reverse the list in-place.
        Takes O(n) time.
        """
        curr = self.head
        prev_node = None
        next_node = None
        while curr:
            next_node = curr.next
            curr.next = prev_node
            prev_node = curr
            curr = next_node
        self.head = prev_node

And here’s how you’d use this linked list class in practice:

>>> lst = SinglyLinkedList()
>>> lst
[]

>>> lst.prepend(23)
>>> lst.prepend('a')
>>> lst.prepend(42)
>>> lst.prepend('X')
>>> lst.append('the')
>>> lst.append('end')

>>> lst
['X', 42, 'a', 23, 'the', 'end']

>>> lst.find('X')
'X'
>>> lst.find('y')
None

>>> lst.reverse()
>>> lst
['end', 'the', 23, 'a', 42, 'X']

>>> lst.remove(42)
>>> lst
['end', 'the', 23, 'a', 'X']

>>> lst.remove('not found')

Note that removing an element in this implementation is still an O(n) time operation, even if you already have a reference to a ListNode object.

In a singly-linked list removing an element typically requires searching the list because we need to know the previous and the next element. With a double-linked list you could write a remove_elem() method that unlinks and removes a node from the list in O(1) time.

✅ A Doubly-Linked List Class in Python

Let’s have a look at how to implement a doubly-linked list in Python. The following DoublyLinkedList class should point you in the right direction:

class DListNode:
    """
    A node in a doubly-linked list.
    """
    def __init__(self, data=None, prev=None, next=None):
        self.data = data
        self.prev = prev
        self.next = next

    def __repr__(self):
        return repr(self.data)


class DoublyLinkedList:
    def __init__(self):
        """
        Create a new doubly linked list.
        Takes O(1) time.
        """
        self.head = None

    def __repr__(self):
        """
        Return a string representation of the list.
        Takes O(n) time.
        """
        nodes = []
        curr = self.head
        while curr:
            nodes.append(repr(curr))
            curr = curr.next
        return '[' + ', '.join(nodes) + ']'

    def prepend(self, data):
        """
        Insert a new element at the beginning of the list.
        Takes O(1) time.
        """
        new_head = DListNode(data=data, next=self.head)
        if self.head:
            self.head.prev = new_head
        self.head = new_head

    def append(self, data):
        """
        Insert a new element at the end of the list.
        Takes O(n) time.
        """
        if not self.head:
            self.head = DListNode(data=data)
            return
        curr = self.head
        while curr.next:
            curr = curr.next
        curr.next = DListNode(data=data, prev=curr)

    def find(self, key):
        """
        Search for the first element with `data` matching
        `key`. Return the element or `None` if not found.
        Takes O(n) time.
        """
        curr = self.head
        while curr and curr.data != key:
            curr = curr.next
        return curr  # Will be None if not found

    def remove_elem(self, node):
        """
        Unlink an element from the list.
        Takes O(1) time.
        """
        if node.prev:
            node.prev.next = node.next
        if node.next:
            node.next.prev = node.prev
        if node is self.head:
            self.head = node.next
        node.prev = None
        node.next = None

    def remove(self, key):
        """
        Remove the first occurrence of `key` in the list.
        Takes O(n) time.
        """
        elem = self.find(key)
        if not elem:
            return
        self.remove_elem(elem)

    def reverse(self):
        """
        Reverse the list in-place.
        Takes O(n) time.
        """
        curr = self.head
        prev_node = None
        while curr:
            prev_node = curr.prev
            curr.prev = curr.next
            curr.next = prev_node
            curr = curr.prev
        self.head = prev_node.prev

Here are a few examples on how to use this class. Notice how we can now remove elements in O(1) time with the remove_elem() function if we already hold a reference to the list node representing the element:

>>> lst = DoublyLinkedList()
>>> lst
[]

>>> lst.prepend(23)
>>> lst.prepend('a')
>>> lst.prepend(42)
>>> lst.prepend('X')
>>> lst.append('the')
>>> lst.append('end')

>>> lst
['X', 42, 'a', 23, 'the', 'end']

>>> lst.find('X')
'X'
>>> lst.find('y')
None

>>> lst.reverse()
>>> lst
['end', 'the', 23, 'a', 42, 'X']

>>> elem = lst.find(42)
>>> lst.remove_elem(elem)

>>> lst.remove('X')
>>> lst.remove('not found')
>>> lst
['end', 'the', 23, 'a']

Both example for Python linked lists you saw here were class-based. An alternative approach would be to implement a Lisp-style linked list in Python using tuples as the core building blocks (“cons pairs”). Here’s a tutorial that goes into more detail: Functional Linked Lists in Python.

Python Linked Lists: Recap & Recommendations

We just looked at a number of approaches to implement a singly- and doubly-linked list in Python. You also saw some code examples of the standard operations and algorithms, for example how to reverse a linked list in-place.

You should only consider using a linked list in Python when you’ve determined that you absolutely need a linked data structure for performance reasons (or you’ve been asked to use one in a coding interview.)

In many cases the same algorithm implemented on top of Python’s highly optimized list objects will be sufficiently fast. If you know a dynamic array won’t cut it and you need a linked list, then check first if you can take advantage of Python’s built-in deque class.

If none of these options work for you, and you want to stay within the standard library, only then should you write your own Python linked list.

In an interview situation I’d also advise you to write your own implementation from scratch because that’s usually what the interviewer wants to see. However it can be beneficial to mention that collections.deque offers similar performance under the right circumstances. Good luck and…Happy Pythoning!

Read the full “Fundamental Data Structures in Python” article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Enriching Your Python Classes With Dunder (Magic, Special) Methods

Tue, 20 Jun 2017 00:00:00 GMT

Enriching Your Python Classes With Dunder (Magic, Special) Methods

What Python’s “magic methods” are and how you would use them to make a simple account class more Pythonic.

What Are Dunder Methods?

In Python, special methods are a set of predefined methods you can use to enrich your classes. They are easy to recognize because they start and end with double underscores, for example __init__ or __str__.

As it quickly became tiresome to say under-under-method-under-under Pythonistas adopted the term “dunder methods”, a short form of “double under.”

These “dunders” or “special methods” in Python are also sometimes called “magic methods.” But using this terminology can make them seem more complicated than they really are—at the end of the day there’s nothing “magical” about them. You should treat these methods like a normal language feature.

Dunder methods let you emulate the behavior of built-in types. For example, to get the length of a string you can call len('string'). But an empty class definition doesn’t support this behavior out of the box:

class NoLenSupport:
    pass

>>> obj = NoLenSupport()
>>> len(obj)
TypeError: "object of type 'NoLenSupport' has no len()"

To fix this, you can add a __len__ dunder method to your class:

class LenSupport:
    def __len__(self):
        return 42

>>> obj = LenSupport()
>>> len(obj)
42

Another example is slicing. You can implement a __getitem__ method which allows you to use Python’s list slicing syntax: obj[start:stop].

Special Methods and the Python Data Model

This elegant design is known as the Python data model and lets developers tap into rich language features like sequences, iteration, operator overloading, attribute access, etc.

You can see Python’s data model as a powerful API you can interface with by implementing one or more dunder methods. If you want to write more Pythonic code, knowing how and when to use dunder methods is an important step.

For a beginner this might be slightly overwhelming at first though. No worries, in this article I will guide you through the use of dunder methods using a simple Account class as an example.

Enriching a Simple Account Class

Throughout this article I will enrich a simple Python class with various dunder methods to unlock the following language features:

Initialization of new objects
Object representation
Enable iteration
Operator overloading (comparison)
Operator overloading (addition)
Method invocation
Context manager support (with statement)

You can find the final code example here. I’ve also put together a Jupyter notebook so you can more easily play with the examples.

Object Initialization: `init`

Right upon starting my class I already need a special method. To construct account objects from the Account class I need a constructor which in Python is the __init__ dunder:

class Account:
    """A simple account class"""

    def __init__(self, owner, amount=0):
        """
        This is the constructor that lets us create
        objects from this class
        """
        self.owner = owner
        self.amount = amount
        self._transactions = []

The constructor takes care of setting up the object. In this case it receives the owner name, an optional start amount and defines an internal transactions list to keep track of deposits and withdrawals.

This allows us to create new accounts like this:

>>> acc = Account('bob')  # default amount = 0
>>> acc = Account('bob', 10)

Object Representation: `str`, `repr`

It’s common practice in Python to provide a string representation of your object for the consumer of your class (a bit like API documentation.) There are two ways to do this using dunder methods:

__repr__: The “official” string representation of an object. This is how you would make an object of the class. The goal of __repr__ is to be unambiguous.
__str__: The “informal” or nicely printable string representation of an object. This is for the enduser.

Let’s implement these two methods on the Account class:

class Account:
    # ... (see above)

    def __repr__(self):
        return 'Account({!r}, {!r})'.format(self.owner, self.amount)

    def __str__(self):
        return 'Account of {} with starting amount: {}'.format(
            self.owner, self.amount)

If you don’t want to hardcode "Account" as the name for the class you can also use self.__class__.__name__ to access it programmatically.

If you wanted to implement just one of these to-string methods on a Python class, make sure it’s __repr__.

Now I can query the object in various ways and always get a nice string representation:

>>> str(acc)
'Account of bob with starting amount: 10'

>>> print(acc)
"Account of bob with starting amount: 10"

>>> repr(acc)
"Account('bob', 10)"

Iteration: `len`, `getitem`, `reversed`

In order to iterate over our account object I need to add some transactions. So first, I’ll define a simple method to add transactions. I’ll keep it simple because this is just setup code to explain dunder methods, and not a production-ready accounting system:

def add_transaction(self, amount):
    if not isinstance(amount, int):
        raise ValueError('please use int for amount')
    self._transactions.append(amount)

I also defined a property to calculate the balance on the account so I can conveniently access it with account.balance. This method takes the start amount and adds a sum of all the transactions:

@property
def balance(self):
    return self.amount + sum(self._transactions)

Let’s do some deposits and withdrawals on the account:

>>> acc = Account('bob', 10)

>>> acc.add_transaction(20)
>>> acc.add_transaction(-10)
>>> acc.add_transaction(50)
>>> acc.add_transaction(-20)
>>> acc.add_transaction(30)

>>> acc.balance
80

Now I have some data and I want to know:

How many transactions were there?
Index the account object to get transaction number …
Loop over the transactions

With the class definition I have this is currently not possible. All of the following statements raise TypeError exceptions:

>>> len(acc)
TypeError

>>> for t in acc:
...    print(t)
TypeError

>>> acc[1]
TypeError

Dunder methods to the rescue! It only takes a little bit of code to make the class iterable:

class Account:
    # ... (see above)

    def __len__(self):
        return len(self._transactions)

    def __getitem__(self, position):
        return self._transactions[position]

Now the previous statements work:

>>> len(acc)
5

>>> for t in acc:
...    print(t)
20
-10
50
-20
30

>>> acc[1]
-10

To iterate over transactions in reversed order you can implement the __reversed__ special method:

def __reversed__(self):
    return self[::-1]

>>> list(reversed(acc))
[30, -20, 50, -10, 20]

To reverse the list of transactions I used Python’s reverse list slice syntax. I also had to wrapp the result of reversed(acc) in a list() call because reversed() returns a a reverse iterator, not a list object we can print nicely in the REPL. Check out this tutorial on iterators in Python if you’d like to learn more about how this approach works in detail.

All in all, this account class is starting to look quite Pythonic to me now.

Operator Overloading for Comparing Accounts: `eq`, `lt`

We all write dozens of statements daily to compare Python objects:

>>> 2 > 1
True

>>> 'a' > 'b'
False

This feels completely natural, but it’s actually quite amazing what happens behind the scenes here. Why does > work equally well on integers, strings and other objects (as long as they are the same type)? This polymorphic behavior is possible because these objects implement one or more comparison dunder methods.

An easy way to verify this is to use the dir() builtin:

>>> dir('a')
['__add__',
...
'__eq__',    <---------------
'__format__',
'__ge__',    <---------------
'__getattribute__',
'__getitem__',
'__getnewargs__',
'__gt__',    <---------------
...]

Let’s build a second account object and compare it to the first one (I am adding a couple of transactions for later use):

>>> acc2 = Account('tim', 100)
>>> acc2.add_transaction(20)
>>> acc2.add_transaction(40)
>>> acc2.balance
160

>>> acc2 > acc
TypeError:
"'>' not supported between instances of 'Account' and 'Account'"

What happened here? We got a TypeError because I have not implemented any comparison dunders nor inherited them from a parent class.

Let’s add them. To not have to implement all of the comparison dunder methods, I use the functools.total_ordering decorator which allows me to take a shortcut, only implementing __eq__ and __lt__:

from functools import total_ordering

@total_ordering
class Account:
    # ... (see above)

    def __eq__(self, other):
        return self.balance == other.balance

    def __lt__(self, other):
        return self.balance < other.balance

And now I can compare Account instances no problem:

>>> acc2 > acc
True

>>> acc2 < acc
False

>>> acc == acc2
False

Operator Overloading for Merging Accounts: `add`

In Python, everything is an object. We are completely fine adding two integers or two strings with the + (plus) operator, it behaves in expected ways:

>>> 1 + 2
3

>>> 'hello' + ' world'
'hello world'

Again, we see polymorphism at play: Did you notice how + behaves different depending the type of the object? For integers it sums, for strings it concatenates. Again doing a quick dir() on the object reveals the corresponding “dunder” interface into the data model:

>>> dir(1)
[...
'__add__',
...
'__radd__',
...]

Our Account object does not support addition yet, so when you try to add two instances of it there’s a TypeError:

>>> acc + acc2
TypeError: "unsupported operand type(s) for +: 'Account' and 'Account'"

Let’s implement __add__ to be able to merge two accounts. The expected behavior would be to merge all attributes together: the owner name, as well as starting amounts and transactions. To do this we can benefit from the iteration support we implemented earlier:

def __add__(self, other):
    owner = '{}&{}'.format(self.owner, other.owner)
    start_amount = self.amount + other.amount
    acc = Account(owner, start_amount)
    for t in list(self) + list(other):
        acc.add_transaction(t)
    return acc

Yes, it is a bit more involved than the other dunder implementations so far. It should show you though that you are in the driver’s seat. You can implement addition however you please. If we wanted to ignore historic transactions—fine, you can also implement it like this:

def __add__(self, other):
    owner = self.owner + other.owner
    start_amount = self.balance + other.balance
    return Account(owner, start_amount)

I think the former implementation would be more realistic though, in terms of what a consumer of this class would expect to happen.

Now we have a new merged account with starting amount $110 (10 + 100) and balance of $240 (80 + 160):

>>> acc3 = acc2 + acc
>>> acc3
Account('tim&bob', 110)

>>> acc3.amount
110
>>> acc3.balance
240
>>> acc3._transactions
[20, 40, 20, -10, 50, -20, 30]

Note this works in both directions because we’re adding objects of the same type. In general, if you would add your object to a builtin (int, str, …) the __add__ method of the builtin wouldn’t know anything about your object. In that case you need to implement the reverse add method (__radd__) as well. You can see an example for that here.

Callable Python Objects: `call`

You can make an object callable like a regular function by adding the __call__ dunder method. For our account class we could print a nice report of all the transactions that make up its balance:

class Account:
    # ... (see above)

    def __call__(self):
        print('Start amount: {}'.format(self.amount))
        print('Transactions: ')
        for transaction in self:
            print(transaction)
        print('\nBalance: {}'.format(self.balance))

Now when I call the object with the double-parentheses acc() syntax, I get a nice account statement with an overview of all transactions and the current balance:

>>> acc = Account('bob', 10)
>>> acc.add_transaction(20)
>>> acc.add_transaction(-10)
>>> acc.add_transaction(50)
>>> acc.add_transaction(-20)
>>> acc.add_transaction(30)

>>> acc()
Start amount: 10
Transactions:
20
-10
50
-20
30
Balance: 80

Please keep in mind that this is just a toy example. A “real” account class probably wouldn’t print to the console when you use the function call syntax on one of its instances. In general, the downside of having a __call__ method on your objects is that it can be hard to see what the purpose of calling the object is.

Most of the time it’s therefore better to add an explicit method to the class. In this case it probably would’ve been more transparent to have a separate Account.print_statement() method.

Context Manager Support and the `With` Statement: `enter`, `exit`

My final example in this tutorial is about a slightly more advanced concept in Python: Context managers and adding support for the with statement.

Now, what is a “context manager” in Python? Here’s a quick overview:

A context manager is a simple “protocol” (or interface) that your object needs to follow so it can be used with the with statement. Basically all you need to do is add __enter__ and __exit__ methods to an object if you want it to function as a context manager.

Let’s use context manager support to add a rollback mechanism to our Account class. If the balance goes negative upon adding another transaction we rollback to the previous state.

We can leverage the Pythonic with statement by adding two more dunder methods. I’m also adding some print calls to make the example clearer when we demo it:

class Account:
    # ... (see above)

    def __enter__(self):
        print('ENTER WITH: Making backup of transactions for rollback')
        self._copy_transactions = list(self._transactions)
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        print('EXIT WITH:', end=' ')
        if exc_type:
            self._transactions = self._copy_transactions
            print('Rolling back to previous transactions')
            print('Transaction resulted in {} ({})'.format(
                exc_type.__name__, exc_val))
        else:
            print('Transaction OK')

As an exception has to be raised to trigger a rollback, I define a quick helper method to validate the transactions in an account:

def validate_transaction(acc, amount_to_add):
    with acc as a:
        print('Adding {} to account'.format(amount_to_add))
        a.add_transaction(amount_to_add)
        print('New balance would be: {}'.format(a.balance))
        if a.balance < 0:
            raise ValueError('sorry cannot go in debt!')

Now I can use an Account object with the with statement. When I make a transaction to add a positive amount, all is good:

acc4 = Account('sue', 10)

print('\nBalance start: {}'.format(acc4.balance))
validate_transaction(acc4, 20)

print('\nBalance end: {}'.format(acc4.balance))

Executing the above Python snippet produces the following printout:

Balance start: 10
ENTER WITH: Making backup of transactions for rollback
Adding 20 to account
New balance would be: 30
EXIT WITH: Transaction OK
Balance end: 30

However when I try to withdraw too much money, the code in __exit__ kicks in and rolls back the transaction:

acc4 = Account('sue', 10)

print('\nBalance start: {}'.format(acc4.balance))
try:
    validate_transaction(acc4, -50)
except ValueError as exc:
    print(exc)

print('\nBalance end: {}'.format(acc4.balance))

In this case we get a different result:

Balance start: 10
ENTER WITH: Making backup of transactions for rollback
Adding -50 to account
New balance would be: -40
EXIT WITH: Rolling back to previous transactions
ValueError: sorry cannot go in debt!
Balance end: 10

Conclusion

I hope you feel a little less afraid of dunder methods after reading this article. A strategic use of them makes your classes more Pythonic, because they emulate builtin types with Python-like behaviors.

As with any feature, please don’t overuse it. Operator overloading, for example, can get pretty obscure. Adding “karma” to a person object with +bob or tim << 3 is definitely possible using dunders—but might not be the most obvious or appropriate way to use these special methods. However, for common operations like comparison and additions they can be an elegant approach.

Showing each and every dunder method would make for a very long tutorial. If you want to learn more about dunder methods and the Python data model I recommend you go through the Python reference documentation.

Also, be sure to check out our dunder method coding challenge where you can experiment and put your newfound “dunder skills” to practice.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Understanding Asynchronous Programming in Python

Tue, 13 Jun 2017 00:00:00 GMT

Understanding Asynchronous Programming in Python

How to use Python to write asynchronous programs, and why you’d want to do such a thing.

A synchronous program is what most of us started out writing, and can be thought of as performing one execution step at a time, one after another.

Even with conditional branching, loops and function calls, we can still think about the code in terms of taking one execution step at a time, and when complete, moving on to the next.

Here are couple of example programs that would work this way:

Batch processing programs are often created as synchronous programs: get some input, process it, create some output. One step logically follows another till we create the desired output. There’s really nothing else the program has to pay attention to besides those steps, and in that order.
Command-line programs are often small, quick processes to “transform” something into something else. This can be expressed as a series of program steps executed serially and done.

An asynchronous program behaves differently. It still takes one execution step at a time. However the difference is the system may not wait for an execution step to be complete before moving on.

This means we are continuing onward through execution steps of the program, even though a previous execution step (or multiple steps) is running “elsewhere”. This also implies when one of those execution steps is running “elsewhere” completes, our program code somehow has to handle it.

Why would we want to write a program in this manner? The simple answer is it helps us handle particular kinds of programming problems.

Here’s a conceptual program that might be a candidate for asynchronous programming:

Let’s Take a Look at a Simplistic Web Server

Its basic unit of work is the same as we described above for batch processing; get some input, process it, create the output. Written as a synchronous program this would create a working web server.

It would also be an absolutely terrible web server.

Why? In the case of a web server one unit of work (input, process, output) is not its only purpose. Its real purpose is to handle hundreds, perhaps thousands, of units of work at the same time, and for long periods of time.

Can we make our synchronous web server better? Sure, we can optimize our execution steps to make them as fast as possible. Unfortunately there are very real limits to this approach that leads to a web server that can’t respond fast enough, and can’t handle enough current users.

What are the real limits of optimizing the above approach? The speed of the network, file IO speed, database query speed, the speed of other connected services, etc. The common feature of this list is they are all IO functions. All of these items are many orders of magnitude slower than our CPU’s processing speed.

In a synchronous program if an execution step starts a database query (for example), the CPU is essentially idle for a long time before the query returns with some data and it can continue with the next execution step.

For batch oriented programs this isn’t a priority, processing the results of that IO is the goal, and often takes far longer than the IO. Any optimization efforts would be focused on the processing work, not the IO.

File, network and database IO are all pretty fast, but still way slower than the CPU. Asynchronous programming techniques allow our programs to take advantage of the relatively slow IO processes, and free the CPU to do other work.

When I started trying to understand asynchronous programming, people I asked and documentation I read talked a lot about the importance of writing non-blocking code. Yeah, this never helped me either.

What’s non-blocking code? What’s blocking code? That information was like having a reference manual without any practical context about how to use that technical detail in a meaningful way.

The Real World is Asynchronous

Writing asynchronous programs is different, and kind of hard to get your head around. And that’s interesting because the world we live in, and how we interact with it, is almost entirely asynchronous.

Here’s an example a lot of you can relate to: being a parent trying to do several things at once; balance the checkbook, do some laundry and keep an eye on the kids.

We do this without even thinking about it, but let’s break it down somewhat:

Balancing the checkbook is a task we’re trying to get done, and we could think of it as a synchronous task; one step follows another till it’s done.
However, we can break away from it to do laundry, unloading the dryer, moving clothes from the washer to the dryer and starting another load in the washer. However, these tasks can be done asynchronously.
While we’re actually working with the washer and dryer that’s a synchronous task and we’re working, but the bulk of the task happens after we start the washer and dryer and walk away to get back to work on the checkbook task. Now the task is asynchronous, the washer and dryer will run independently till the buzzer goes off, notifying us that one or the other needs attention.
Watching the kids is another asynchronous task. Once they are set up and playing, they do so independently (sort of) until they need attention; someone’s hungry, someone gets hurt, someone yells in alarm, and as parents we react to it. The kids are a long running task with high priority, superceding any other task we might be doing, like the checkbook or laundry.

This example illustrates both blocking and non-blocking code. While we’re moving laudry around, for example, the CPU (the parent) is busy and blocked from doing other work.

But it’s okay because the CPU is busy and the task is relatively quick. When we start the washer and dryer and go back to do something else, now the laundry task has become asynchronous because the CPU is doing something else, has changed context if you will, and will be notified when the laundry task is complete by the machine buzzers.

As people this is how we work, we’re naturally always juggling multiple things at once, often without thinking about it. As programmers the trick is how to translate this kind of behavior into code that does kind of the same thing.

Let’s try to “program” this using code ideas you might be familiar with:

Thought Experiment #1: The “Batching” Parent

Think about trying to do these tasks in a completely synchronous manner. If we’re a good parent in this scenario we just watch the kids, waiting for something to happen needing our attention. Nothing else, like the checkbook or laundry, would get done in this scenario.

We could re-prioritize the tasks any way we want, but only one of them would happen at a time in a synchronous, one after another, manner. This would be like the synchronous web server described above, it would work, but it would be a terrible way to live.

Nothing except watching the kids would get done till they were asleep, all other tasks would happen after that, well into the night. A couple of weeks of this and most parents would jump out the window.

Thought Experiment #2: The “Polling” Parent

Let’s change things up so mulitple things could get done by using polling. In this approach the parent periodically breaks away from any current task and checks to see if any of the other tasks need attention.

Since we’re programming a parent, let’s make our polling interval something like fifteen minutes. So here every fifteen minutes the parent goes to check if the washer, dryer or kids need any attention, and then goes back to work on the checkbook. If any of those things do need attention, the work it gets done and the parent goes back to the checkbook task and continues on with the polling loop.

This works, tasks are getting done, but has a couple of problems. The CPU (parent) is spending a lot of time checking on things that don’t need attention because they aren’t done, like the washer and dryer. Given the polling interval, it’s entirely possible for tasks to be finished, but they wouldn’t get attention for some time, upto fifteen minutes. And the high priority watching the kids task probably couldn’t tolerate a possible window of fifteen minutes with no attention when something might be going drastically wrong.

We could address this by shortening our polling interval, but now the CPU is spending even more time context switching between tasks, and we start to hit a point of diminishing returns. And again, a couple of weeks of living like this and, well, see my previous comment about window and jumping.

Thought Experiment #3: The “Threading” Parent

As parents it’s often heard, “if I could only clone myself”. Since we’re pretending we can program parents, we can essentially do this by using threading.

If we think of all the tasks as one “program”, we can break up the tasks and run them as threads, cloning the parent so to speak. Now there is a parent instance for each task; watching the kids, monitoring the dryer, monitoring the washer and doing the checkbook, all running independently. This sounds like a pretty nice solution to the program problem.

But is it? Since we have to tell the parent instances (CPUs) explicitely what to do in a program, we can run into some problems because all instances share everything in the program space.

For example, the parent monitoring the dryer sees the clothes are dry, takes control of the dryer and starts unloading. Let’s say that while the dryer parent is unloading clothes, the washer parent sees the washer is done, takes control of the washer, and then wants to take control of the dryer to move clothes from the washer to the dryer. When the dryer parent is finished unloading clothes that parent wants to take control of the washer and move clothes from the washer to the dryer.

Now those two parents are deadlocked.

Both have control of their own resource, and want control of the other resource. They will wait forever for the other to release control. As programmers we’d have to write code to work this situation out.

Here’s another issue that might arise from parent threading. Suppose that unfortunately a child gets hurt and that parent has to take the child to emergent care. That happens right away because that parent clone is dedicated to watching the kids. But at emergent care the parent has to write a fairly large check to cover the deductible.

Meanwhile, the parent working on the checkbook is unaware of this large check being written, and suddenly the family account is overdrawn. Because the parent clones work within the same program, and the family money (checkbook) is a shared resource in that world, we’d have to work out a way to for the kid watching parent to inform the checkbook parent of what’s going on. Or provide some kind of locking mechanism so the resource can be used by only one parent at a time, with updates.

All of these things are manageable in program threading code, but it’s difficult to get right, and hard to debug when it’s wrong.

Let’s Write Some Python Code

Now we’re going to take some of the approaches outlined in these “thought experiments” and we’ll turn them into functioning Python programs.

You can download all of the example code from this GitHub repository.

All the examples in this article have been tested with Python 3.6.1, and the requirements.txt file included with the code examples indicates what modules you’ll need to run all the examples.

I would strongly suggest setting up a Python virtual environment to run the code so as not to interfere with your system Python.

Example 1: Synchronous Programming

This first example shows a somewhat contrived way of having a task pull “work” off a queue and do that work. In this case the work is just getting a number, and the task counts up to that number. It also prints it’s running at every count step, and prints the total at the end. The contrived part is this program provides a naive basis for multiple tasks to process the work on the queue.

"""
example_1.py

Just a short example showing synchronous running of 'tasks'
"""

import queue

def task(name, work_queue):
    if work_queue.empty():
        print(f'Task {name} nothing to do')
    else:
        while not work_queue.empty():
            count = work_queue.get()
            total = 0
            for x in range(count):
                print(f'Task {name} running')
                total += 1
            print(f'Task {name} total: {total}')


def main():
    """
    This is the main entry point for the program
    """
    # create the queue of 'work'
    work_queue = queue.Queue()

    # put some 'work' in the queue
    for work in [15, 10, 5, 2]:
        work_queue.put(work)

    # create some tasks
    tasks = [
        (task, 'One', work_queue),
        (task, 'Two', work_queue)
    ]

    # run the tasks
    for t, n, q in tasks:
        t(n, q)

if __name__ == '__main__':
    main()

The “task” in this program is just a function that accepts a string and a queue. When executed it looks to see if there is anything in the queue to process, and if so it pulls values off the queue, starts a for loop to count up to that value, and prints the total at the end. It continues this till there is nothing left in the queue, and exits.

When we run this task we get a listing showing that task one does all the work. The loop within it consumes all the work on the queue, and performs it. When that loop exits, task two gets a chance to run, but finds the queue empty, so it prints a statement to that affect and exits. There is nothing in the code that allows task one and task two to play nice together and switch between them.

Example 2: Simple Cooperative Concurrency

The next version of the program (example_2.py) adds the ability of the two tasks to play nice together through the use of generators. The addition of the yield statement in the task function means the loop exits at that point, but maintains its context so it can be restarted later. The “run the tasks” loop later in the program takes advantage of this when it calls t.next(). This statement restarts the task at the point where it previously yielded.

This is a form of cooperative concurrency. The program is yielding control of its current context so something else can run. In this case it allows our primative “run the tasks” scheduler to run two instances of the task function, each one consuming work from the same queue. This is sort of clever, but a lot of work to get the same results as the first program.

"""
example_2.py

Just a short example demonstrating a simple state machine in Python
"""

import queue

def task(name, queue):
    while not queue.empty():
        count = queue.get()
        total = 0
        for x in range(count):
            print(f'Task {name} running')
            total += 1
            yield
        print(f'Task {name} total: {total}')

def main():
    """
    This is the main entry point for the program
    """
    # create the queue of 'work'
    work_queue = queue.Queue()

    # put some 'work' in the queue
    for work in [15, 10, 5, 2]:
        work_queue.put(work)

    # create some tasks
    tasks = [
        task('One', work_queue),
        task('Two', work_queue)
    ]

    # run the tasks
    done = False
    while not done:
        for t in tasks:
            try:
                next(t)
            except StopIteration:
                tasks.remove(t)
            if len(tasks) == 0:
                done = True


if __name__ == '__main__':
    main()

When this program is run the output shows that both task one and two are running, consuming work from the queue and processing it. This is what’s intended, both tasks are processing work, and each ends up processing two items from the queue. But again, quite a bit of work to achieve the results.

The trick here is using the yield statement, which turns the task function into a generator, to perform a “context switch”. The program uses this context switch in order to run two instances of the task.

Example 3: Cooperative Concurreny With Blocking Calls

The next version of the program (example_3.py) is exactly the same as the last, except for the addition of a time.sleep(1) call in the body of our task loop. This adds a one second delay to every iteration of the task loop. The delay was added to simulate the affect of a slow IO process occurring in our task.

I’ve also included a simple Elapsed Time class to handle the start time/elapsed time features used in the reporting.

"""
example_3.py

Just a short example demonstraing a simple state machine in Python
However, this one has delays that affect it
"""

import time
import queue
from lib.elapsed_time import ET


def task(name, queue):
    while not queue.empty():
        count = queue.get()
        total = 0
        et = ET()
        for x in range(count):
            print(f'Task {name} running')
            time.sleep(1)
            total += 1
            yield
        print(f'Task {name} total: {total}')
        print(f'Task {name} total elapsed time: {et():.1f}')


def main():
    """
    This is the main entry point for the program
    """
    # create the queue of 'work'
    work_queue = queue.Queue()

    # put some 'work' in the queue
    for work in [15, 10, 5, 2]:
        work_queue.put(work)


    tasks = [
        task('One', work_queue),
        task('Two', work_queue)
    ]
    # run the scheduler to run the tasks
    et = ET()
    done = False
    while not done:
        for t in tasks:
            try:
                next(t)
            except StopIteration:
                tasks.remove(t)
            if len(tasks) == 0:
                done = True

    print()
    print('Total elapsed time: {}'.format(et()))


if __name__ == '__main__':
    main()

When this program is run the output shows that both task one and two are running, consuming work from the queue and processing it as before. With the addition of the mock IO delay, we’re seeing that our cooperative concurrency hasn’t gotten us anything, the delay stops the processing of the entire program, and the CPU just waits for the IO delay to be over.

This is exactly what’s meant by “blocking code” in asynchronous documentation. Notice the time it takes to the run the entire program, this is the cummulative time of the all the delays. This again shows running things this way is not a win.

Example 4: Cooperative Concurrency With Non-Blocking Calls (gevent)

The next version of the program (example_4.py) has been modified quite a bit. It makes use of the gevent asynchronous programming module right at the top of the program. The module is imported, along with a module called monkey.

Then a method of the monkey module is called, patch_all(). What in the world is that doing? The simple explanation is it sets the program up so any other module imported having blocking (synchronous) code in it is “patched” to make it asynchronous.

Like most simple explanations, this isn’t very helpful. What it means in relation to our example program is the time.sleep(1) (our mock IO delay) no longer “blocks” the program. Instead it yields control cooperatively back to the system. Notice the “yield” statement from example_3.py is no longer present, it’s now part of the time.sleep(1) call.

So, if the time.sleep(1) function has been patched by gevent to yield control, where is the control going? One of the effects of using gevent is that it starts an event loop thread in the program. For our purposes this is like the “run the tasks” loop from example_3.py. When the time.sleep(1) delay ends, it returns control to the next executable statement after the time.sleep(1) statement. The advantage of this behavior is the CPU is no longer blocked by the delay, but is free to execute other code.

Our “run the tasks” loop no longer exists, instead our task array contains two calls to gevent.spawn(...). These two calls start two gevent threads (called greenlets), which are lightweight microthreads that context switch cooperatively, rather than as a result of the system switching contexts like regular threads.

Notice the gevent.joinall(tasks) right after our tasks are spawned. This statement causes our program to wait till task one and task two are both finished. Without this our program would have continued on through the print statements, but with essentially nothing to do.

"""
example_4.py

Just a short example demonstrating a simple state machine in Python
However, this one has delays that affect it
"""

import gevent
from gevent import monkey
monkey.patch_all()

import time
import queue
from lib.elapsed_time import ET


def task(name, work_queue):
    while not work_queue.empty():
        count = work_queue.get()
        total = 0
        et = ET()
        for x in range(count):
            print(f'Task {name} running')
            time.sleep(1)
            total += 1
        print(f'Task {name} total: {total}')
        print(f'Task {name} total elapsed time: {et():.1f}')


def main():
    """
    This is the main entry point for the program
    """
    # create the queue of 'work'
    work_queue = queue.Queue()

    # put some 'work' in the queue
    for work in [15, 10, 5, 2]:
        work_queue.put(work)

    # run the tasks
    et = ET()
    tasks = [
        gevent.spawn(task, 'One', work_queue),
        gevent.spawn(task, 'Two', work_queue)
    ]
    gevent.joinall(tasks)
    print()
    print(f'Total elapsed time: {et():.1f}')


if __name__ == '__main__':
    main()

When this program runs, notice both task one and two start at the same time, then wait at the mock IO call. This is an indication the time.sleep(1) call is no longer blocking, and other work is being done.

At the end of the program notice the total elapsed time, it’s essentially half the time it took for example_3.py to run. Now we’re starting to see the advantages of an asynchronous program.

Being able to run two, or more, things concurrently by running IO processes in a non-blocking manner. By using gevent greenlets and controlling the context switches, we’re able to multiplex between tasks without to much trouble.

Example 5: Synchronous (Blocking) HTTP Downloads

The next version of the program (example_5.py) is kind of a step forward and step back. The program now is doing some actual work with real IO, making HTTP requests to a list of URLs and getting the page contents, but it’s doing so in a blocking (synchronous) manner.

We’ve modified the program to import the wonderful requests module to make the actual HTTP requests, and added a list of URLs to the queue rather than numbers. Inside the task, rather than increment a counter, we’re using the requests module to get the contents of a URL gotten from the queue, and printing how long it took to do so.

"""
example_5.py

Just a short example demonstrating a simple state machine in Python
This version is doing actual work, downloading the contents of
URL's it gets from a queue
"""

import queue
import requests
from lib.elapsed_time import ET


def task(name, work_queue):
    while not work_queue.empty():
        url = work_queue.get()
        print(f'Task {name} getting URL: {url}')
        et = ET()
        requests.get(url)
        print(f'Task {name} got URL: {url}')
        print(f'Task {name} total elapsed time: {et():.1f}')
        yield


def main():
    """
    This is the main entry point for the program
    """
    # create the queue of 'work'
    work_queue = queue.Queue()

    # put some 'work' in the queue
    for url in [
        "http://google.com",
        "http://yahoo.com",
        "http://linkedin.com",
        "http://shutterfly.com",
        "http://mypublisher.com",
        "http://facebook.com"
    ]:
        work_queue.put(url)

    tasks = [
        task('One', work_queue),
        task('Two', work_queue)
    ]
    # run the scheduler to run the tasks
    et = ET()
    done = False
    while not done:
        for t in tasks:
            try:
                next(t)
            except StopIteration:
                tasks.remove(t)
            if len(tasks) == 0:
                done = True

    print()
    print(f'Total elapsed time: {et():.1f}')


if __name__ == '__main__':
    main()

As in an earlier version of the program, we’re using a yield to turn our task function into a generator, and perform a context switch in order to let the other task instance run.

Each task gets a URL from the work queue, gets the contents of the page pointed to by the URL and reports how long it took to get that content.

As before, the yield allows both our tasks to run, but because this program is running synchronously, each requests.get() call blocks the CPU till the page is retrieved. Notice the total time to run the entire program at the end, this will be meaningful for the next example.

Example 6: Asynchronous (Non-Blocking) HTTP Downloads With gevent

This version of the program (example_6.py) modifies the previous version to use the gevent module again. Remember the gevent monkey.patch_all() call modifies any following modules so synchronous code becomes asynchronous, this includes requests.

Now the tasks have been modified to remove the yield call because the requests.get(url) call is no longer blocking, but performs a context switch back to the gevent event loop. In the “run the task” section we use gevent to spawn two instance of the task generator, then use joinall() to wait for them to complete.

"""
example_6.py

Just a short example demonstrating a simple state machine in Python
This version is doing actual work, downloading the contents of
URL's it gets from a queue. It's also using gevent to get the
URL's in an asynchronous manner.
"""

import gevent
from gevent import monkey
monkey.patch_all()

import queue
import requests
from lib.elapsed_time import ET


def task(name, work_queue):
    while not work_queue.empty():
        url = work_queue.get()
        print(f'Task {name} getting URL: {url}')
        et = ET()
        requests.get(url)
        print(f'Task {name} got URL: {url}')
        print(f'Task {name} total elapsed time: {et():.1f}')

def main():
    """
    This is the main entry point for the program
    """
    # create the queue of 'work'
    work_queue = queue.Queue()

    # put some 'work' in the queue
    for url in [
        "http://google.com",
        "http://yahoo.com",
        "http://linkedin.com",
        "http://shutterfly.com",
        "http://mypublisher.com",
        "http://facebook.com"
    ]:
        work_queue.put(url)

    # run the tasks
    et = ET()
    tasks = [
        gevent.spawn(task, 'One', work_queue),
        gevent.spawn(task, 'Two', work_queue)
    ]
    gevent.joinall(tasks)
    print()
    print(f'Total elapsed time: {et():.1f}')

if __name__ == '__main__':
    main()

At the end of this program run, take a look at the total time and the individual times to get the contents of the URL’s. You’ll see the total time is less than the cummulative time of all the requests.get() calls.

This is because those calls are running asynchronously, so we’re effectively taking better advantage of the CPU by allowing it to make multiple requests at once.

Example 7: Asynchronous (Non-Blocking) HTTP Downloads With Twisted

This version of the program (example_7.py) uses the Twisted module to do essentially the same thing as the gevent module, download the URL contents in a non-blocking manner.

Twisted is a very powerful system, and takes a fundementally different approach to create asynchronous programs. Where gevent modifies modules to make their synchronous code asynchronous, Twisted provides it’s own functions and methods to reach the same ends.

Where example_6.py used the patched requests.get(url) call to get the contents of the URLs, here we use the Twisted function getPage(url).

In this version the @defer.inlineCallbacks function decorator works together with the yield getPage(url) to perform a context switch into the Twisted event loop.

In gevent the event loop was implied, but in Twisted it’s explicitely provided by the reactor.run() statement line near the bottom of the program.

"""
example_7.py

Just a short example demonstrating a simple state machine in Python
This version is doing actual work, downloading the contents of
URL's it gets from a work_queue. This version uses the Twisted
framework to provide the concurrency
"""

from twisted.internet import defer
from twisted.web.client import getPage
from twisted.internet import reactor, task

import queue
from lib.elapsed_time import ET


@defer.inlineCallbacks
def my_task(name, work_queue):
    try:
        while not work_queue.empty():
            url = work_queue.get()
            print(f'Task {name} getting URL: {url}')
            et = ET()
            yield getPage(url)
            print(f'Task {name} got URL: {url}')
            print(f'Task {name} total elapsed time: {et():.1f}')
    except Exception as e:
        print(str(e))


def main():
    """
    This is the main entry point for the program
    """
    # create the work_queue of 'work'
    work_queue = queue.Queue()

    # put some 'work' in the work_queue
    for url in [
        b"http://google.com",
        b"http://yahoo.com",
        b"http://linkedin.com",
        b"http://shutterfly.com",
        b"http://mypublisher.com",
        b"http://facebook.com"
    ]:
        work_queue.put(url)

    # run the tasks
    et = ET()
    defer.DeferredList([
        task.deferLater(reactor, 0, my_task, 'One', work_queue),
        task.deferLater(reactor, 0, my_task, 'Two', work_queue)
    ]).addCallback(lambda _: reactor.stop())

    # run the event loop
    reactor.run()

    print()
    print(f'Total elapsed time: {et():.1f}')


if __name__ == '__main__':
    main()

Notice the end result is the same as the gevent version, the total program run time is less than the cummulative time for each URL to be retrieved.

Example 8: Asynchronous (Non-Blocking) HTTP Downloads With Twisted Callbacks

This version of the program (example_8.py) also uses the Twisted library, but shows a more traditional approach to using Twisted.

By this I mean rather than using the @defer.inlineCallbacks / yield style of coding, this version uses explicit callbacks. A “callback” is a function that is passed to the system and can be called later in reaction to an event. In the example below the success_callback() function is provided to Twisted to be called when the getPage(url) call completes.

Notice in the program the @defer.inlineCallbacks decorator is no longer present on the my_task() function. In addition, the function is yielding a variable called d, shortand for something called a deferred, which is what is returned by the getPage(url) function call.

A deferred is Twisted’s way of handling asynchronous programming, and is what the callback is attached to. When this deferred “fires” (when the getPage(url) completes), the callback function will be called with the variables defined at the time the callback was attached.

"""
example_8.py

Just a short example demonstrating a simple state machine in Python
This version is doing actual work, downloading the contents of
URL's it gets from a queue. This version uses the Twisted
framework to provide the concurrency
"""

from twisted.internet import defer
from twisted.web.client import getPage
from twisted.internet import reactor, task

import queue
from lib.elapsed_time import ET


def success_callback(results, name, url, et):
    print(f'Task {name} got URL: {url}')
    print(f'Task {name} total elapsed time: {et():.1f}')


def my_task(name, queue):
    if not queue.empty():
        while not queue.empty():
            url = queue.get()
            print(f'Task {name} getting URL: {url}')
            et = ET()
            d = getPage(url)
            d.addCallback(success_callback, name, url, et)
            yield d


def main():
    """
    This is the main entry point for the program
    """
    # create the queue of 'work'
    work_queue = queue.Queue()

    # put some 'work' in the queue
    for url in [
        b"http://google.com",
        b"http://yahoo.com",
        b"http://linkedin.com",
        b"http://shutterfly.com",
        b"http://mypublisher.com",
        b"http://facebook.com"
    ]:
        work_queue.put(url)

    # run the tasks
    et = ET()

    # create cooperator
    coop = task.Cooperator()

    defer.DeferredList([
        coop.coiterate(my_task('One', work_queue)),
        coop.coiterate(my_task('Two', work_queue)),
    ]).addCallback(lambda _: reactor.stop())

    # run the event loop
    reactor.run()

    print()
    print(f'Total elapsed time: {et():.1f}')


if __name__ == '__main__':
    main()

The end result of running this program is the same as the previous two examples, the total time of the program is less than the cummulative time of getting the URLs.

Whether you use gevent or Twisted is a matter of personal preference and coding style. Both are powerful libaries that provide mechanisms allowing the programmer to create asynchronous code.

Conclusion

I hope this has helped you see and understand where and how asynchronous programming can be useful. If you’re writing a program that’s calculating PI to the millionth decimal place, asynchronous code isn’t going to help at all.

However, if you’re trying to implement a server, or a program that does a significant amount of IO, it could make a huge difference. It’s a powerful technique that can take your programs to the next level.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

A Story About Python Mastery

Thu, 08 Jun 2017 00:00:00 GMT

A Story About Python Mastery

A couple of years ago I’d become quite interested in martial arts. Hours upon hours of watching “The Karate Kid” growing up must’ve taken their toll on me…

And so, I found myself at this smelly little gym, joining my first couple of karate practice sessions.

(By the way, my “Mr. Miyagi” wasn’t the fatherly philosopher from the Karate Kid movies—our sensei was a complete geek, working a day job as a Borland Delphi programmer somewhere. I liked him.)

So anyway, here I was at this dingy gym, working hard to learn how to count in Japanese and getting my hand-eye coordination under control…

(You know, karate practice actually feels more like learning to dance than learning how to fight. At least when you’re a beginner.)

Moments later my friend kicks me in the face because I turned left when I should’ve turned right—

My interested in karate waned quickly after that.

Yeah…I’m a lover, not a fighter.

Why am I telling you this? Well, the question came up in a recent email exchange:

“How does one MASTER the skill of programming Python?”

I like to think mastering programming as a skill is quite similar to mastering a physical skill like karate. (Although I’ve had more success with the former.)

Here, let me explain.

With both, it takes a long time to build up the right foundation. But once “muscle memory” starts kicking in, your progress can skyrocket. It’s all about making it through that first rough patch of slow learning progress without losing your motivation.

Mastering a programming language means lifelong learning. The topic is fractal—there’s always a way to expand your knowledge in some obscure way. One can hit critical mass in terms of knowledge and be called an expert, but it’s unlikely a single person will “know it all.”

A seasoned programmer acts deliberately and with an economy of movement that a beginner can’t yet understand. Biological differences like age, “IQ”, play less of a role. The more experienced dev still codes circles around the eager newcomer.

There’s road maps but no “one true path” to mastery. Learning progress will depend highly on the motivation and drive of the individual, and the peers they surround themselves with. Mentorship and community play the biggest role in becoming successful.

Like martial “arts” programming is more of an art than a science. It’s a creative endeavour rather than a strictly mechanical affair. Brute force and applying 10,000 “IF this THEN that” rules might get one a job, but doesn’t lead to the true joy of programming.

(I swear one day I will create a Bob Ross-like show called The Joy of Programming: “Let’s put some little curly braces over here…and here…and there.”)

Mastering a skill like programming seeps into all areas of your life. Just like building physical skills will increase confidence, so will mastering programming. It leads to a sense of accomplishment, a deep satisfaction, and confidence through recognition.

Alright, that’s my (philosophical) update for the week.

If you’d like to avoid getting kicked in the head learning Python, then check out some of the Python training products I offer here on dbader.org.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to Speed Up Python Code Reviews With Linting

Thu, 08 Jun 2017 00:00:00 GMT

How to Speed Up Python Code Reviews With Linting

Ever introduced code reviews to an existing Python code base? It can be awesome, or pure hell…

One fateful Thursday morning I sat down with a fresh cup of coffee, ready to dig in and give some feedback on a fix we wanted to ship before the end of the sprint.

When I loaded up the first set of changes into my trusty Sublime Text, my eyes nearly fell out—this was a serious “can’t see the forest for the trees” type of situation:

The formatting for this Python code was…All. Over. The. Place.

There was no consistency whatsoever in how the code was indented, how braces were positioned… even the spacing between operators inside expressions was seemingly randomized:

#the  worst code ever .
value +=10*  othervalue

Ugh.

It just seemed so sloppy! And the inconsistent formatting made it really hard to see what the code did, what the intention behind it was.

It felt like my brain was 90% occupied with parsing out the code, instead of being able to focus on the bigger picture and to hunt for actual bugs.

I must’ve spent at least an hour cleaning up the formatting, before I was able to give any substantial feedback on these changes. It was the most tedious code review of my dev career.

My busywork was of little value to the company, too:

They paid me a software engineer’s salary for nudging around braces and juggling whitespace…

That same day I pulled the whole team together to discuss the mandatory use of a code style checker before code reviews.

And guess what? It worked out great.

Most developers on the team were using Sublime Text so we all installed the SublimeLinter package. It’s the most popular code linting framework for Sublime Text and I like it for its focus, simplicity, and performance.

A code linter is a program that analyses your source code for potential errors. Code linters are great at finding “mechanical” issues like syntax errors, structural problems, such as the use of undefined variables, and also best practice or code style violations.

SublimeLinter let’s you integrate code linting feedback into your editing environment. Setting up SublimeLinter gives you immediate feedback on your code right when you type it:

When you install SublimeLinter it doesn’t actually include any linter engines. It’s more like a “meta linter” that lets you integrate various command-line linter binaries like Flake8 (Python) or JSHint (JavaScript) under one roof.

The linter binaries do the real work. And that way, SublimeLinter can support more than just one programming language. If you’re doing any kind of full-stack web development, for example, you could install code linters for JavaScript, CSS, Ruby, Go, and Python.

SublimeLinter will then pick the right code linter to run on each file you’re editing. Any errors or warnings found by these separate linters would all be integrated with the same look and feel into your Sublime Text editor window by SublimeLinter.

And because we were using command line tools through SublimeLinter we were able to set up the same set of code style checks on our CI build server very easily. That way no badly formatted code could slip through the cracks ever again.

It made the whole team more productive. And it was great for morale: No more time wasted on nudging braces or juggling whitespace 🙂

Additional Resources & Links

Here are a couple of extra links to help you get set up with SublimeLinter. I listed the most common linter binaries and linter plugins so you can get started right away:

My Sublime Python course
SublimeLinter docs
All official linter plugins for SublimeLinter
JavaScript: JSHint, Flow, JSL, JSXHint, JSCS
Ruby: Ruby (built-in), Rubocop
Python: Flake8, Pylint, Pep8, Pyflakes
PHP: PHP, PHPLint
Go: GoLint, GoType
Lua: LuaCheck, Lua (built-in)
Haskell: Ghc
C++: CppCheck, CppLint
CSS: CssLint
HTML: HTMLTidy
Java: Java (built-in)
Plaintext: Proselint

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Records, Structs, and Data Transfer Objects in Python

Tue, 06 Jun 2017 00:00:00 GMT

Records, Structs, and Data Transfer Objects in Python

How to implement records, structs, and “plain old data objects” in Python using only built-in data types and classes from the standard library.

Compared to arrays, record data structures provide a fixed number of fields, each field can have a name, and may have a different type.

I’m using the definition of a “record” loosely in this article. For example, I’m also going to discuss types like Python’s built-in tuple that may or may not be considered “records” in a strict sense because they don’t provide named fields.

Python provides several data types you can use to implement records, structs, and data transfer objects. In this article you’ll get a quick look at each implementation and its unique characteristics. At the end you’ll find a summary and a decision making guide that will help you make your own pick.

Alright, let’s get started:

✅ The `dict` Built-in

Python dictionaries store an arbitrary number of objects, each identified by a unique key. Dictionaries are often also called “maps” or “associative arrays” and allow the efficient lookup, insertion, and deletion of any object associated with a given key.

Using dictionaries as a record data type or data object in Python is possible. Dictionaries are easy to create in Python as they have their own syntactic sugar built into the language in the form of dictionary literals. The dictionary syntax is concise and quite convenient to type.

Data objects created using dictionaries are mutable and there’s little protection against misspelled field names, as fields can be added and removed freely at any time. Both of these properties can introduce surprising bugs and there’s always a trade-off to be made between convenience and error resilience.

car1 = {
    'color': 'red',
    'mileage': 3812.4,
    'automatic': True,
}
car2 = {
    'color': 'blue',
    'mileage': 40231.0,
    'automatic': False,
}

# Dicts have a nice repr:
>>> car2
{'color': 'blue', 'automatic': False, 'mileage': 40231.0}

# Get mileage:
>>> car2['mileage']
40231.0

# Dicts are mutable:
>>> car2['mileage'] = 12
>>> car2['windshield'] = 'broken'
>>> car2
{'windshield': 'broken', 'color': 'blue',
 'automatic': False, 'mileage': 12}

# No protection against wrong field names,
# or missing/extra fields:
car3 = {
    'colr': 'green',
    'automatic': False,
    'windshield': 'broken',
}

✅ The `tuple` Built-in

Python’s tuples are a simple data structure for grouping arbitrary objects. Tuples are immutable—they cannot be modified once they’ve been created.

Performancewise, tuples take up slightly less memory than lists in CPython and they’re faster to construct at instantiation time. As you can see in the bytecode disassembly below, constructing a tuple constant takes a single LOAD_CONST opcode while constructing a list object with the same contents requires several more operations:

>>> import dis
>>> dis.dis(compile("(23, 'a', 'b', 'c')", '', 'eval'))
  1       0 LOAD_CONST           4 ((23, 'a', 'b', 'c'))
          3 RETURN_VALUE

>>> dis.dis(compile("[23, 'a', 'b', 'c']", '', 'eval'))
  1       0 LOAD_CONST           0 (23)
          3 LOAD_CONST           1 ('a')
          6 LOAD_CONST           2 ('b')
          9 LOAD_CONST           3 ('c')
         12 BUILD_LIST           4
         15 RETURN_VALUE

However you shouldn’t place too much emphasis on these differences. In practice the performance difference will often be negligible and trying to squeeze out extra performance out of a program by switching from lists to tuples will likely be the wrong approach.

A potential downside of plain tuples is that the data you store in them can only be pulled out by accessing it through integer indexes. You can’t give names to individual properties stored in a tuple. This can impact code readability.

Also, a tuple is always an ad-hoc structure. It’s difficult to ensure that two tuples have the same number of fields and the same properties stored on them.

This makes it easy to introduce “slip-of-the-mind” bugs by mixing up the field order, for example. Therefore I would recommend you keep the number of fields stored in a tuple as low as possible.

# Fields: color, mileage, automatic
car1 = ('red', 3812.4, True)
car2 = ('blue', 40231.0, False)

# Tuple instances have a nice repr:
>>> car1
('red', 3812.4, True)
>>> car2
('blue', 40231.0, False)

# Get mileage:
>>> car2[1]
40231.0

# Tuples are immutable:
>>> car2[1] = 12
TypeError: "'tuple' object does not support item assignment"

# No protection against missing/extra fields
# or a wrong order:
>>> car3 = (3431.5, 'green', True, 'silver')

✅ Writing a Custom Class

Classes allow you to define reusable “blueprints” for data objects to ensure each object provides the same set of fields.

Using regular Python classes as record data types is feasible, but it also takes manual work to get the convenience features of other implementations. For example, adding new fields to the __init__ constructor is verbose and takes time.

Also, the default string representation for objects instantiated from custom classes is not very helpful. To fix that you may have to add your own __repr__ method, which again is usually quite verbose and must be updated every time you add a new field.

Fields stored on classes are mutable and new fields can be added freely, which may or may not be what you intend. It’s possible to provide more access control and to create read-only fields using the @property decorator, but this requires writing more glue code.

Writing a custom class is a great option whenever you’d like to add business logic and behavior to your record objects using methods. But this means these objects are technically no longer plain data objects.

class Car:
    def __init__(self, color, mileage, automatic):
        self.color = color
        self.mileage = mileage
        self.automatic = automatic

car1 = Car('red', 3812.4, True)
car2 = Car('blue', 40231.0, False)

# Get the mileage:
>>> car2.mileage
40231.0

# Classes are mutable:
>>> car2.mileage = 12
>>> car2.windshield = 'broken'

# String representation is not very useful
# (must add a manually written __repr__ method):
>>> car1
<Car object at 0x1081e69e8>

✅ The collections.namedtuple Class

The namedtuple class available in Python 2.6+ provides an extension of the built-in tuple data type. Similarly to defining a custom class, using namedtuple allows you to define reusable “blueprints” for your records that ensure the correct field names are used.

Namedtuples are immutable just like regular tuples. This means you cannot add new fields or modify existing fields after the namedtuple instance was created.

Besides that, namedtuples are, well…named tuples. Each object stored in them can be accessed through a unique identifier. This frees you from having to remember integer indexes, or resorting to workarounds like defining integer constants as mnemonics for your indexes.

Namedtuple objects are implemented as regular Python classes internally. When it comes to memory usage they are also “better” than regular classes and just as memory efficient as regular tuples:

>>> from collections import namedtuple
>>> from sys import getsizeof

>>> p1 = namedtuple('Point', 'x y z')(1, 2, 3)
>>> p2 = (1, 2, 3)

>>> getsizeof(p1)
72
>>> getsizeof(p2)
72

Namedtuples can be an easy way to clean up your code and to make it more readable by enforcing a better structure for your data.

I find that going from ad-hoc data types like dictionaries with a fixed format to namedtuples helps me express the intent of my code more clearly. Often when I apply this refactoring I magically come up with a better solution for the problem I’m facing.

Using namedtuples over unstructured tuples and dicts can also make my coworkers’ lives easier because namedtuples make the data passed around “self-documenting”, at least to a degree.

For more information and code examples, check out my tutorial on namedtuples here on dbader.org.

from collections import namedtuple

Car = namedtuple('Car' , 'color mileage automatic')

car1 = Car('red', 3812.4, True)

# Instances have a nice repr:
>>> car1
Car(color='red', mileage=3812.4, automatic=True)

# Accessing fields
>>> car1.mileage
3812.4

# Fields are immtuable:
>>> car1.mileage = 12
AttributeError: "can't set attribute"
>>> car1.windshield = 'broken'
AttributeError: "'Car' object has no attribute 'windshield'"

✅ The typing.NamedTuple Class

This class added in Python 3.6 is the younger sibling of collections.namedtuple. It is very similar to namedtuple, the main difference is an updated syntax for defining new record types and added support for type hints.

Please note that type annotations are not enforced without a separate type checking tool like mypy—but even without tool support they can provide useful hints to other programmers (or be terribly confusing if the type hints get out of date.)

from typing import NamedTuple

class Car(NamedTuple):
    color: str
    mileage: float
    automatic: bool

car1 = Car('red', 3812.4, True)

# Instances have a nice repr
>>> car1
Car(color='red', mileage=3812.4, automatic=True)

# Accessing fields
>>> car1.mileage
3812.4

# Fields are immutable
>>> car1.mileage = 12
AttributeError: "can't set attribute"
>>> car1.windshield = 'broken'
AttributeError: "'Car' object has no attribute 'windshield'"

# Type annotations are not enforced without
# a separate type checking tool like mypy:
>>> Car('red', 'NOT_A_FLOAT', 99)
Car(color='red', mileage='NOT_A_FLOAT', automatic=99)

⚠️ The struct.Struct Class

This class performs conversions between Python values and C structs serialized into Python bytes objects. It can be used to handle binary data stored in files or from network connections, for example.

Structs are defined using a format strings-like mini language that allows you to define the arrangement of various C data types, like char, int, and long, as well as their unsigned variants.

The struct module is seldom used to represent data objects that are meant to be handled purely inside Python code. They’re intended primarily as a data exchange format, rather than a way of holding data in memory that’s only used by Python code.

In some cases packing primitive data into structs may use less memory than keeping it in other data types—but that would be a quite advanced (and probably unnecessary) optimization.

from struct import Struct

MyStruct = Struct('i?f')

data = MyStruct.pack(23, False, 42.0)

# All you get is a blob of data:
>>> data
b'\x17\x00\x00\x00\x00\x00\x00\x00\x00\x00(B'

# Data blobs can be unpacked again:
>>> MyStruct.unpack(data)
(23, False, 42.0)

⚠️ The types.SimpleNamespace Class

Here’s one more “esoteric” choice for implementing data objects in Python. This class was added in Python 3.3 and it provides attribute access to its namespace. It also includes a meaningful __repr__ by default.

As its name proclaims, SimpleNamespace is simple—it’s basically a glorified dictionary that allows attribute access and prints nicely. Attributes can be added, modified, and deleted freely.

from types import SimpleNamespace
car1 = SimpleNamespace(color='red', mileage=3812.4, automatic=True)

# The default repr:
>>> car1
namespace(automatic=True, color='red', mileage=3812.4)

# Instances are mutable
>>> car1.mileage = 12
>>> car1.windshield = 'broken'
>>> del car1.automatic
>>> car1
namespace(color='red', mileage=12, windshield='broken')

Which type should I use for data objects in Python?

As you’ve seen there’s quite a number of different options to implement records or data objects in Python. Generally your decision will depend on your use case:

You only have a few (2-3) fields: Using a plain tuple object may be okay because the field order is easy to remember or field names are superfluous. For example, think of an (x, y, z) point in 3D space.
You need immutable fields: In this case plain tuples, collections.namedtuple, typing.NamedTuple would all make good options for implementing this type of data object.
You need to lock down field names to avoid typos: collections.namedtuple and typing.NamedTuple are your friends.
You want to keep things simple: A plain dictionary object might be a good choice due to the convenient syntax that closely resembles JSON.
You need full control over your data structure: It’s time to write a custom class with @property setters and getters.
You need to add behavior (methods) to the object: You should write a custom class. Either from scratch or by extending collections.namedtuple or typing.NamedTuple.
You need to pack data tightly to serialize it to disk or send it over the network: Time to bust out struct.Struct, this is a great use case for it.

If you’re looking for a safe default choice, my general recommendation for implementing a plain record, struct, or data object in Python would be to:

use collections.namedtuple in Python 2.x; and
its younger sibling typing.NamedTuple in Python 3.

Read the full “Fundamental Data Structures in Python” article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

An Overview of Python’s “ipaddress” Module

Thu, 01 Jun 2017 00:00:00 GMT

An Overview of Python’s “ipaddress” Module

An introduction to the ipaddress module available on Python 3.3+ for manipulation of IPv4 and IPv6 addresses.

In this article we’ll take a look at the ipaddress module that is available on Python 3.3 and above. This tutorial is intended to serve as a handy reference for any network engineer wondering how to parse and work with IP addresses in Python.

In this overview article you’ll learn:

What the differnce between IPv4 and IPv6 addresses is.
How to work with IPv4 addresses using Python’s ipaddress module.
How to work with IPv6 addresses using Python’s ipaddress module.

IPv4 vs IPv6 Addresses – A Primer

At a high level, IPv4 and IPv6 addresses are used for like purposes and functions. However, since there are major differences in the address structure for each protocol, this tutorial has separated into separate sections, one each for IPv4 and IPv6.

In today’s Internet, the IPv4 protocol controls the majority of IP processing and will remain so for the near future. The enhancements in scale and functionality that come with IPv6 are necessary for the future of the Internet and adoption is progressing. The adoption rate, however, remains slow to this date.

An IPv4 address is composed of 32 bits, organized into four eight bit groupings referred to as “octets”. The word “octet” is used to identify an eight-bit structure in place of the more common term “byte”, but they carry the same definition. The four octets are referred to as octet1, octet2, octet3, and octet4. This is a “dotted decimal” format where each eight-bit octet can have a decimal value based on eight bits from zero to 255.

IPv4 address example: 192.168.100.10

IPv4 address example (CIDR notation): 192.168.100.10/24

The /24 is CIDR notation to indicate that leading 24 of the 32 bits are used to identify the network portion of the address. Remembering that each octet is 8 bits long, this means that the first three octets (3 × 8 = 24) identify the network (192.168.100.x) and the remaining eight bits of the address identify the node (x.x.x.10).

CIDR notation can be anything from /8 bits through to /30 bits, with an occasional /32 bits (/31 is invalid), but /24 is often used. For example, your home network, or your school or company network is most likely identified with a /24 CIDR.

An older format for expressing the network identification is a network mask where the CIDR is expressed as a separate dotted decimal number. For example, a /24 CIDR equates to a network mask of 255.255.255.0.

An IPv6 address is 128 bits long, which is a significant increase over the 32 bits in an IPv4 address. There are many differences between IPv4 and IPv6, but the notable difference is in the addressing structure. The additional length provides an exponential increase in the number of networks and host that can be supported.

IPv6 address example: 2001:db8:abcd:100::1/64

Where the IPv4 address uses a dotted decimal format, the IPv6 protocol uses hexadecimal notation. Each position in an IPv6 address represents four bits with a value from 0 to f, organized as follows:

The 128 bits are divided into 8 groupings of 16 bits each separated by colons. A group is referred to as a “quartet” or “hextet” each with four hexadecimal characters (4 hex characters times 4 bits = 16 bits). In the above example, the first quartet is “2001”.
Leading zeros in any quartet are suppressed/condensed. In the above example, the second quartet is “db8”, which is actually “0db8”” with the leading zero suppressed. The last quartet is “1”, which is actually “0001”” with three leading zeros suppressed.
If a quartet contains all zeros, it is suppressed to a single zero. For example: a quartet with “:0000:” would be compressed to “:0:”.
If an address contains a contiguous string of quartets that are all zeros, the contiguous string of zeros is condensed and represented with double colons. In the above example, the double colon represents three all zero quartets, or “:0000:0000:0000:” condensed to “::”. Since the example address has five quartets with values, the number of condensed quartets must be three (eight total minus five populated).

All IPv6 address structures used CIDR notation to determine how many of the leading bits are used for network identification with the balance used for host/interface identification. Given 128 bits, many options are available.

Python’s `ipaddress` Module and IPv4 Addresses

The ipaddress module is designed around CIDR notation, which is recommended because of its brevity and ease of use. The ipaddress module also includes methods to revert to a network mask if required.

The original definition of IPv4 addresses includes a “class” that is defined by address ranges in the first octet. The ipaddress module does not recognize IPv4 classes and is therefore not included in this tutorial.

The ipaddress module includes three specific IPv4 address object types:

a “host” or an individual address object that does not include CIDR notation,
an individual interface address object that includes CIDR notation, and
and a network address object that refers to the range of IP addresses for the entire network.

The major difference between a “host” and an “interface” is that a host or ip_address object does not include CIDR notation, whereas an ip_interface object includes the CIDR notation:

The ip_address object is most useful when working with IP packets that do not need nor use CIDR notation.
The ip_interface object is most useful when working with node and interface identification for connection to an IP network which must include network/subnet identification.
The ip_network object includes all addresses within a network and is most useful for network identification.

Creating IPv4 Host Address Objects with ipaddress:

The ipaddress.ip_address() factory function is used to create an ip_address object. This automatically determines whether to create an IPv4 or IPv6 address based on the passed-in value (IPv6 addressing will be discussed at a latter point in this tutorial). As noted above, this object represents an IP Address as found in a packet traversing a network where CIDR is not required.

In many cases, the value used to create an ip_address object will be a string in the IPv4 dotted decimal format as per this example:

>>> import ipaddress
>>> my_ip = ipaddress.ip_address('192.168.100.10')
>>> my_ip
IPv4Address('192.168.100.10')

Alternatively, the IPv4 address may be entered in binary, as a decimal value of the full 32 bit binary value, or in hexadecimal format as per this example:

# All 32 binary bits can be used to create an IPv4 address:
>>> ipaddress.ip_address(0b11000000101010000110010000001010)
IPv4Address('192.168.100.10')

# The decimal value of the 32 bit binary number can also be used:
>>> ipaddress.ip_address(3232261130)
IPv4Address('192.168.100.10')

# As can the hexadecimal value of the 32 bits:
>>> ipaddress.ip_address(0xC0A8640A)
IPv4Address('192.168.100.10')

The first example uses the full 32 bit address, and the second example is the decimal value of the 32 bit address. Both are unwieldy, error-prone and of limited value. The third example uses a hexadecimal value which can be useful as most packet formats from parsing or sniffing are represented in hexadecimal format.

Creating IPv4 Interface Address Objects with ipaddress:

The ipaddress.ip_interface() factory function is used to create an ip_interface object, which automatically determines whether to create an IPv4 or IPv6 address based on the passed-in value (IPv6 addressing will be discussed at a latter point in this tutorial).

As previously discussed, the ip_interface object represents the ip address found on a host or network interface where the CIDR (or mask) is required for proper handling of the packet.

# An ip_interface object is used to represent IP addressing
# for a host or router interface, including the CIDR:
>>> my_ip = ipaddress.ip_interface('192.168.100.10/24')
>>> my_ip
IPv4Interface('192.168.100.10/24')

# This method translates the CIDR into a mask as would normally
# be used on a host or router interface
>>> my_ip.netmask
IPv4Address('255.255.255.0')

One can use the same options in the creation of an ip_interface option as with an ip_address option (binary, decimal value, hexadecimal). However, the only way to effectively create an ip_interface with the proper CIDR notation or mask is with a dotted decimal IPv4 address string.

Creating IPv4 Network Address Objects with ipadress:

The ipaddress.ip_network() factory function is used to create an ip_network object, which automatically determines whether to create an IPv4 or IPv6 address based on the passed-in value (IPv6 addressing will be discussed at a latter point in this tutorial).

An IP network is defined as a range of consecutive IP address that define a network or subnet. Example:

192.168.100.0/24 is the 192.168.100.0 network where the /24 specifies that the first three octets comprise the network identification.
The 4th octet is used for assignment to individual hosts and router interfaces.
The address range is 192.168.100.1 through to .254.
192.168.100.0 is used to define the network/subnet and 192.168.100.255 is the broadcast address for this network. Neither can be used for assignment to a host or router interface.

The creation of an ip_network object follows the same syntax as the creation of an ip_interface object:

# Creates an ip_network object. The IPv4 address and CIDR must be
# a valid network address, the first address in an address range:
>>> ipaddress.ip_network('192.168.100.0/24')
IPv4Network('192.168.100.0/24')

In the above example, the network address used must be a valid network address, which is the first address in the range of IPv4 addresses that constitute the network. If this is not the case, Python will throw an exception:

# Python will throw an exception if the address used is not
# a valid network address. In the following, ".10" is a host address
# not a valid network address ident cation, which is ".0":
>>> ipaddress.ip_network('192.168.100.10/24')
ValueError: "192.168.100.10/24 has host bits set"

When working with host or router interfaces, it is often necessary to determine the network address. This can be calculated, but takes several steps which can be accomplished in a single step using the strict=False option (strict=True is default).

# If the network address needs to be calculated,
# use the strict=False option. This will calculate and populate
# the ip_network object with the network rather than the
# interface address:
>>> my_ip = ipaddress.ip_interface('192.168.100.10/24')
>>> my_ip
IPv4Interface('192.168.100.10/24')

>>> my_ip_net = ipaddress.ip_network(my_ip, strict=False)
>>> my_ip_net
IPv4Network('192.168.100.0/24')

In the above example, the ip_interface address is known (192.168.100.10) but not the ip_network the interface belongs to. Using the strict=False option, the ip_network address (192.168.100.0/24) is calculated and populated in the ip_network object.

Python’s `ipaddress` Module and IPv6 Addresses

As with IPv4, the ipaddress module uses the same three basic factory functions already described for IPv4. includes include:

a “host” or an individual address object that does not include CIDR notation,
an interface address object that includes CIDR notation, and
and a network address object that refers to the range of IP addresses for the entire network.

Since the detail is covered in the section on IPv4, a brief overview is only necessary.

Creating IPv6 Host Address Objects with ipaddress:

The ipaddress.ip_address() factory function is used to create an ip_address object. This automatically knows to use the IPv6 address format based on the passed-in value. Note that the CIDR notation is not used with the ip_address function.

In the majority of cases, the value used to create an ip_address object for IPv6 will be a string in the IPv6 quartet/hextet format as per this example:

# Create an IPv6 Address Object for a Global Address:
>>> ipaddress.ip_address('2001:db8:abcd:100::1')
IPv6Address('2001:db8:abcd:100::1')

# Create an IPv6 Address Object for a link-local address:
>>> ipaddress.ip_address('fe80::1')
IPv6Address('fe80::1')

As with IPv4, it is possible to create an IPv6 address object using the full binary, decimal, or hexadecimal value. This is unwieldy with 32 bits for an IPv4 address and is even more awkward for a 128 bit IPv6 address. As a practical matter, it is anticipated that the string representation of the eight quartets will be the norm.

Creating IPv6 Interface Address Objects with ipaddress:

The ipaddress.ip_interface() factory function is used to create an ip_interface object, which automatically create an IPv6 address based on the passed-in value. Note that the CIDR notation must be included in the function.

# Creates an IP Interface Object for a Global Address:
>>> ipaddress.ip_interface('2001:db8:abcd:100::1/64')
IPv6Interface('2001:db8:abcd:100::1/64')

# Creates an IP Interface Object for a Link-local Address:
ipaddress.ip_interface('fe80::1/64')
IPv6Interface('fe80::1/64')

Creating IPv6 Network Address Objects with ipaddress:

The ipaddress.ip_network() factory function is used to create an ip_network object for IPv6 based on the passed-in value.

As with IPv4, an IPv6 network is defined as a range of consecutive IP address that can be assigned to specific host or router interfaces.

Using our previous example 2001:db8:abcd:100::/64, the /64 CIDR specifies that the four quartets make up the full network identification. Remember that the first three quartets are global ID assigned by the IPS and the fourth quartet identifies the internal subnet number. The balance of the 64 bits are used for host identification with a range from “0000:0000:0000:0001” though to “ffff:ffff:ffff:fffe”.

As with IPv4 addressing, the first and last address in an IPv6 subnet cannot be used for host addressing. Given a /64 CIDR, this means that there are 2 to the 64th power (minus 2) possible host addresses, which is means there are 18,446,744,073,709,551,614 mathematically possible host addresses per network/subnet.

# Creates an IP Network Object for a Global Address:
>>> myIPv6net = ipaddress.ip_network('2001:db8:abcd:100::/64')
>>> myIPv6net
IPv6Network('2001:db8:abcd:100::/64')

# Creates an IP Network Object for a Link-local Address:
>>> myIPv6 = ipaddress.ip_network('fe80::/64')
>>> myIPv6
IPv6Network('fe80::/64')

The above global address is broken down as follows:

Global Identifier assigned by ISP: 2001:db8:abcd::/48
Subnet identification: 2001:db8:abcd:100::/64
First usable address in the subnet: 2001:db8:abcd:100::1/64
Last usable address in the subnet: 2001:db8:abcd:100:ffff:ffff:ffff:fffeffff/64

Additional Resources

These are some additional resources where you can learn about the ipaddress module in Python:

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to Prepare for a Python Coding Interview

Tue, 30 May 2017 00:00:00 GMT

How to Prepare for a Python Coding Interview

A “minimalist guide” on how to prepare for your upcoming Python interview in three steps.

Participating in a “Silicon Valley style” coding interview can feel scary as heck.

Unlike other professionals, it seems to be okay for software developers to expect to get humiliated at a job interview:

“What, you can’t code up a recursive descent parser on a whiteboard in 7.5 minutes? How DARE YOU
even apply for this job!”

Yeah, it’s one of the things that sucks about our industry—

Personally, I believe that 80-90% of the questions that you get asked during a typical coding interview have very little to do with your real performance on the job.

But unfortunately these interviews aren’t going to go away over night.

If you want a well-paid job as a software developer, you’re likely going to encounter some coding quiz as part of your interviewing experience.

For the foreseeable future, interviewers are going to keep squeezing you through the same processes and will keep asking you those same questions…

And if you’re like me, there’s a pretty slim chance you’ll pass an interview like that without some serious prep work—either to learn the right skills or to refresh your memory.

Alright, that all sounds pretty glum, no?

But here’s what you need to realize:

Interviewing is a skill you can
learn like any other.

It’s something you get better at with practice.

It’s true—just remember that all that prep work needs time. So be sure to plan ahead with ample of buffer to get enough study days in before your “big day.”

If I had a coding interview coming up in 1-2 months, here’s a rough outline of what I’d do to prepare:

Step 1:

Buy the following two books:

“Elements of Programming Interviews (Python Ed.)” by Aziz, Lee and Prakash; and
“Cracking the Coding Interview” by Gayle Laakmann McDowell

Step 2:

Buy a whiteboard and some markers. Put the whiteboard on an actual wall, and make sure you get a board with a decent size. This is where 90% of your prep work will happen over the next few weeks.

Step 3:

Every day, stand in front of your whiteboard and work on at least one problem from the books listed in Step 1.

Talk out loud about what you’re doing, and snap a photo of the board when you’re done. Set a 30 minute timer for each problem to put some pressure on yourself.

If you can’t solve a problem, pick up the book and go through all the motions with the solution in front of you. Rinse and repeat.

The closer you get to your interview date the more you want to practice—ramp it up to around 5 problems per day in the last two weeks before your interview.

Just repeat after me:

“Interviewing is a learned skill.”

The more “reps” you can get on each problem, the better your chances of getting a job offer will be. I know this sounds tough—but with persistence and regular practice you can do it. Keep at it and you’ll eventually succeed. It’ll be worth it!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

In Love, War, and Open-Source: Never Give Up

Thu, 25 May 2017 00:00:00 GMT

In Love, War, and Open-Source: Never Give Up

I’ll never forget launching my first open-source project and sharing it publicly on Reddit…

I had spent a couple of days at my parents’ place over Christmas that year and decided to use some of my spare time to work on a Python library I christened schedule.

The idea behind schedule was very simple and had a narrow focus (I find that that that’s always a good idea for libraries by the way):

Developers would use it like a timer to periodically call a function inside their Python programs.

The kicker was that schedule used a funky “natural sounding” syntax to specify the timer interval. For example, if you wanted to run a function every 10 minutes you’d do this:

schedule.every(10).minutes.do(myfunc)

Or, if you wanted to run a particular task every day at 10:30 in the morning, you’d do this:

schedule.every().day.at('10:30').do(mytask)

Because I was so frustrated with Cron’s syntax I thought this approach was really cool. And so I decided this would be the first Python module I’d release as open-source.

I cleaned up the code and spent some time coming up with a nice README file—because that’s really the first thing that your potential users will see when they check out your library.

Once I had my module available on PyPI and the source code on GitHub I decided to call some attention to the project. The same night I posted a link to the repository to Reddit and a couple of other sites.

I still remember that I had shaky hands when I clicked the “submit” button…

It’s scary to put your work out there for the whole world to judge! Also, I didn’t know what to expect.

Would people call me stupid for writing a “simple” library like that?

Would they think my code wasn’t good enough?

Would they find all kinds of bugs and publicly shame me for them? I felt almost a physical sense of dread about pushing the “submit” button on Reddit that night!

The next morning I woke up and immediately checked my email. Were there any comments? Yes, about twenty or so!

I started reading through all of them, faster and faster—

And of course my still frightful mind immediately zoomed in on the negative ones, like

“Cool idea, but not particularly useful”,

and

“The documentation is not enough”,

“Not a big fan of the pseudo-english syntax. Way too clever and gimmicky.”

At this point I was starting to feel a little discouraged… I’d never really shared my code publicly before and to be honest I my skin receiving criticism on it was paper thin. After all, this was just something I wrote in a couple of hours and gave away for free.

The comment that really made my stomach churn was one from a well known member of the Python community:

“And another library with global state :-( … Such an API should not even exist. It sets a bad example.”

Ouch, that stung. I really looked up to that person and had used some of their libraries in other projects… It was almost like my worst fears were now playing out in front of me!

I’d never be able to get another job as a Python developer after this…

At the time I didn’t see the positive and supportive comments in that discussion thread. I didn’t see the almost 70 upvotes. I didn’t see the valuable lessons hidden in the seemingly rude comments. I dwelled on the negative and felt terrible and depressed that whole day.

So how do you think this story ends?

Did I delete the schedule repo, switched careers and never looked at Reddit again?

Wrong!

schedule now has almost 3,000 stars on GitHub and is among the top 70 Python repositories (out of more than 215,000). When PyPI’s download statistics were still working I saw that it got several thousand downloads per month. I get emails every week from people asking questions about it or thanking me for writing it…

Isn’t that crazy!? How’s that possible after all of these disheartening comments?

My answer is “I don’t know”—and I also don’t think that schedule is a particularly great library that deserves all this attention, by the way.

But, it seems to solve a problem for some people. It also seems to have a polarizing effect on developers who see it—some love it, some hate it.

Today I’m glad I shipped schedule that night.

Glad because it was helpful to so many people over the years and glad because it helped me develop a thicker skin when it comes to sharing and launching things publicly.

I’m partly writing this meandering post because not very long ago I found this comment buried in my Reddit message history:

As someone who has posted a number of projects and blog posts in r/Python, just wanted to drop you a line and encourage that you don’t let the comments in your thread get you down. You see all those upvotes?

Those are people that like your library, but don’t really have a comment to make in the thread proper. My biggest issue with /r/Python is that it tends towards cynicism and sometimes cruelty rather than encouragement and constructive criticism.

Keep up the great work,

Rob

Wow! What a positive and encouraging comment!

Back when I felt discouraged by all of these negative comments I must’ve missed it. But reading it a few years later made me re-live that whole situation and it showed me how much I’d grown as a developer and as a person in the meantime.

If you find yourself in a similar situation, maybe feeling bogged down by the developer community who can be unfiltered and pretty rude sometimes, don’t get discouraged.

Even if some people don’t like what you did there can be thousands who love your work.

It’s a big pond, and sometimes the best ideas are polarizing.

The only way to find out is to ship, ship, ship.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

The Meaning of Underscores in Python

Tue, 23 May 2017 00:00:00 GMT

The Meaning of Underscores in Python

The various meanings and naming conventions around single and double underscores (“dunder”) in Python, how name mangling works and how it affects your own Python classes.

Single and double underscores have a meaning in Python variable and method names. Some of that meaning is merely by convention and intended as a hint to the programmer—and some of it is enforced by the Python interpreter.

If you’re wondering “What’s the meaning of single and double underscores in Python variable and method names?” I’ll do my best to get you the answer here.

In this article I’ll discuss the following five underscore patterns and naming conventions and how they affect the behavior of your Python programs:

Single Leading Underscore: _var
Single Trailing Underscore: var_
Double Leading Underscore: __var
Double Leading and Trailing Underscore: __var__
Single Underscore: _

At the end of the article you’ll also find a brief “cheat sheet” summary of the five different underscore naming conventions and their meaning, as well as a short video tutorial that gives you a hands-on demo of their behavior.

Let’s dive right in!

1. Single Leading Underscore: `_var`

When it comes to variable and method names, the single underscore prefix has a meaning by convention only. It’s a hint to the programmer—and it means what the Python community agrees it should mean, but it does not affect the behavior of your programs.

The underscore prefix is meant as a hint to another programmer that a variable or method starting with a single underscore is intended for internal use. This convention is defined in PEP 8.

This isn’t enforced by Python. Python does not have strong distinctions between “private” and “public” variables like Java does. It’s like someone put up a tiny underscore warning sign that says:

“Hey, this isn’t really meant to be a part of the public interface of this class. Best to leave it alone.”

Take a look at the following example:

class Test:
    def __init__(self):
        self.foo = 11
        self._bar = 23

What’s going to happen if you instantiate this class and try to access the foo and _bar attributes defined in its __init__ constructor? Let’s find out:

>>> t = Test()
>>> t.foo
11
>>> t._bar
23

You just saw that the leading single underscore in _bar did not prevent us from “reaching into” the class and accessing the value of that variable.

That’s because the single underscore prefix in Python is merely an agreed upon convention—at least when it comes to variable and method names.

However, leading underscores do impact how names get imported from modules. Imagine you had the following code in a module called my_module:

# This is my_module.py:

def external_func():
    return 23

def _internal_func():
    return 42

Now if you use a wildcard import to import all names from the module, Python will not import names with a leading underscore (unless the module defines an __all__ list that overrides this behavior):

>>> from my_module import *
>>> external_func()
23
>>> _internal_func()
NameError: "name '_internal_func' is not defined"

By the way, wildcard imports should be avoided as they make it unclear which names are present in the namespace. It’s better to stick to regular imports for the sake of clarity.

Unlike wildcard imports, regular imports are not affected by the leading single underscore naming convention:

>>> import my_module
>>> my_module.external_func()
23
>>> my_module._internal_func()
42

I know this might be a little confusing at this point. If you stick to the PEP 8 recommendation that wildcard imports should be avoided, then really all you need to remember is this:

Single underscores are a Python naming convention indicating a name is meant for internal use. It is generally not enforced by the Python interpreter and meant as a hint to the programmer only.

2. Single Trailing Underscore: `var_`

Sometimes the most fitting name for a variable is already taken by a keyword. Therefore names like class or def cannot be used as variable names in Python. In this case you can append a single underscore to break the naming conflict:

>>> def make_object(name, class):
SyntaxError: "invalid syntax"

>>> def make_object(name, class_):
...     pass

In summary, a single trailing underscore (postfix) is used by convention to avoid naming conflicts with Python keywords. This convention is explained in PEP 8.

3. Double Leading Underscore: `__var`

The naming patterns we covered so far received their meaning from agreed upon conventions only. With Python class attributes (variables and methods) that start with double underscores, things are a little different.

A double underscore prefix causes the Python interpreter to rewrite the attribute name in order to avoid naming conflicts in subclasses.

This is also called name mangling—the interpreter changes the name of the variable in a way that makes it harder to create collisions when the class is extended later.

I know this sounds rather abstract. This is why I put together this little code example we can use for experimentation:

class Test:
    def __init__(self):
        self.foo = 11
        self._bar = 23
        self.__baz = 23

Let’s take a look at the attributes on this object using the built-in dir() function:

>>> t = Test()
>>> dir(t)
['_Test__baz', '__class__', '__delattr__', '__dict__', '__dir__',
 '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
 '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__',
 '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
 '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
 '__weakref__', '_bar', 'foo']

This gives us a list with the object’s attributes. Let’s take this list and look for our original variable names foo, _bar, and __baz—I promise you’ll notice some interesting changes.

The self.foo variable appears unmodified as foo in the attribute list.
self._bar behaves the same way—it shows up on the class as _bar. Like I said before, the leading underscore is just a convention in this case. A hint for the programmer.
However with self.__baz, things look a little different. When you search for __baz in that list you’ll see that there is no variable with that name.

So what happened to __baz?

If you look closely you’ll see there’s an attribute called _Test__baz on this object. This is the name mangling that the Python interpreter applies. It does this to protect the variable from getting overridden in subclasses.

Let’s create another class that extends the Test class and attempts to override its existing attributes added in the constructor:

class ExtendedTest(Test):
    def __init__(self):
        super().__init__()
        self.foo = 'overridden'
        self._bar = 'overridden'
        self.__baz = 'overridden'

Now what do you think the values of foo, _bar, and __baz will be on instances of this ExtendedTest class? Let’s take a look:

>>> t2 = ExtendedTest()
>>> t2.foo
'overridden'
>>> t2._bar
'overridden'
>>> t2.__baz
AttributeError: "'ExtendedTest' object has no attribute '__baz'"

Wait, why did we get that AttributeError when we tried to inspect the value of t2.__baz? Name mangling strikes again! It turns out this object doesn’t even have a __baz attribute:

>>> dir(t2)
['_ExtendedTest__baz', '_Test__baz', '__class__', '__delattr__',
 '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__',
 '__getattribute__', '__gt__', '__hash__', '__init__', '__le__',
 '__lt__', '__module__', '__ne__', '__new__', '__reduce__',
 '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
 '__subclasshook__', '__weakref__', '_bar', 'foo', 'get_vars']

As you can see __baz got turned into _ExtendedTest__baz to prevent accidental modification:

>>> t2._ExtendedTest__baz
'overridden'

But the original _Test__baz is also still around:

>>> t2._Test__baz
42

Double underscore name mangling is fully transparent to the programmer. Take a look at the following example that will confirm this:

class ManglingTest:
    def __init__(self):
        self.__mangled = 'hello'

    def get_mangled(self):
        return self.__mangled

>>> ManglingTest().get_mangled()
'hello'
>>> ManglingTest().__mangled
AttributeError: "'ManglingTest' object has no attribute '__mangled'"

Does name mangling also apply to method names? It sure does—name mangling affects all names that start with two underscore characters (“dunders”) in a class context:

class MangledMethod:
    def __method(self):
        return 42

    def call_it(self):
        return self.__method()

>>> MangledMethod().__method()
AttributeError: "'MangledMethod' object has no attribute '__method'"
>>> MangledMethod().call_it()
42

Here’s another, perhaps surprising, example of name mangling in action:

_MangledGlobal__mangled = 23

class MangledGlobal:
    def test(self):
        return __mangled

>>> MangledGlobal().test()
23

In this example I declared a global variable called _MangledGlobal__mangled. Then I accessed the variable inside the context of a class named MangledGlobal. Because of name mangling I was able to reference the _MangledGlobal__mangled global variable as just __mangled inside the test() method on the class.

The Python interpreter automatically expanded the name __mangled to _MangledGlobal__mangled because it begins with two underscore characters. This demonstrated that name mangling isn’t tied to class attributes specifically. It applies to any name starting with two underscore characters used in a class context.

Now this was a lot of stuff to absorb.

To be honest with you I didn’t write these examples and explanations down off the top of my head. It took me some research and editing to do it. I’ve been using Python for years but rules and special cases like that aren’t constantly on my mind.

Sometimes the most important skills for a programmer are “pattern recognition” and knowing where to look things up. If you feel a little overwhelmed at this point, don’t worry. Take your time and play with some of the examples in this article.

Make these concepts sink in enough so that you’ll recognize the general idea of name mangling and some of the other behaviors I showed you. If you encounter them “in the wild” one day, you’ll know what to look for in the documentation.

⏰ Sidebar: What’s a “dunder” in Python?

I’ve you’ve heard some experienced Pythonistas talk about Python or watched a few conference talks you may have heard the term dunder. If you’re wondering what that is, here’s your answer:

Double underscores are often referred to as “dunders” in the Python community. The reason is that double underscores appear quite often in Python code and to avoid fatiguing their jaw muscles Pythonistas often shorten “double underscore” to “dunder.”

For example, you’d pronounce __baz as “dunder baz”. Likewise __init__ would be pronounced as “dunder init”, even though one might think it should be “dunder init dunder.” But that’s just yet another quirk in the naming convention.

It’s like a secret handshake for Python developers 🙂

4. Double Leading and Trailing Underscore: `var`

Perhaps surprisingly, name mangling is not applied if a name starts and ends with double underscores. Variables surrounded by a double underscore prefix and postfix are left unscathed by the Python interpeter:

class PrefixPostfixTest:
    def __init__(self):
        self.__bam__ = 42

>>> PrefixPostfixTest().__bam__
42

However, names that have both leading and trailing double underscores are reserved for special use in the language. This rule covers things like __init__ for object constructors, or __call__ to make an object callable.

These dunder methods are often referred to as magic methods—but many people in the Python community, including myself, don’t like that.

It’s best to stay away from using names that start and end with double underscores (“dunders”) in your own programs to avoid collisions with future changes to the Python language.

5. Single Underscore: `_`

Per convention, a single standalone underscore is sometimes used as a name to indicate that a variable is temporary or insignificant.

For example, in the following loop we don’t need access to the running index and we can use “_” to indicate that it is just a temporary value:

>>> for _ in range(32):
...     print('Hello, World.')

You can also use single underscores in unpacking expressions as a “don’t care” variable to ignore particular values. Again, this meaning is “per convention” only and there’s no special behavior triggered in the Python interpreter. The single underscore is simply a valid variable name that’s sometimes used for this purpose.

In the following code example I’m unpacking a car tuple into separate variables but I’m only interested in the values for color and mileage. However, in order for the unpacking expression to succeed I need to assign all values contained in the tuple to variables. That’s where “_” is useful as a placeholder variable:

>>> car = ('red', 'auto', 12, 3812.4)
>>> color, _, _, mileage = car

>>> color
'red'
>>> mileage
3812.4
>>> _
12

Besides its use as a temporary variable, “_” is a special variable in most Python REPLs that represents the result of the last expression evaluated by the interpreter.

This is handy if you’re working in an interpreter session and you’d like to access the result of a previous calculation. Or if you’re constructing objects on the fly and want to interact with them without assigning them a name first:

>>> 20 + 3
23
>>> _
23
>>> print(_)
23

>>> list()
[]
>>> _.append(1)
>>> _.append(2)
>>> _.append(3)
>>> _
[1, 2, 3]

📓 Python Underscore Naming Patterns – Summary

Here’s a quick summary or “cheat sheet” of what the five underscore patterns I covered in this article mean in Python:

Pattern	Example	Meaning
Single Leading Underscore	`_var`	Naming convention indicating a name is meant for internal use. Generally not enforced by the Python interpreter (except in wildcard imports) and meant as a hint to the programmer only.
Single Trailing Underscore	`var_`	Used by convention to avoid naming conflicts with Python keywords.
Double Leading Underscore	`__var`	Triggers name mangling when used in a class context. Enforced by the Python interpreter.
Double Leading and Trailing Underscore	`__var__`	Indicates special methods defined by the Python language. Avoid this naming scheme for your own attributes.
Single Underscore	`_`	Sometimes used as a name for temporary or insignificant variables (“don’t care”). Also: The result of the last expression in a Python REPL.

📺 Underscore Patterns – Video Tutorial

Watch a short video tutorial to see first-hand how things like double underscore name mangling work in Python and how they affect your own classes and modules:

Did I miss anything in this explanation? Want to add your own thoughts on the matter? Leave a comment below, I’d appreciate it.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

When to Use Python

Thu, 18 May 2017 00:00:00 GMT

When to Use Python

What is the Python programming language used for in the real world, and when is using Python the right choice?

When I grew up in Germany as a kid there was this craze about “desks that can grow with you.” The idea was you’d buy your kid an adjustable desk and then they’d be able to use it throughout their whole education career.

As your kid grows taller, so does his or her desk. Just turn the little crank handle every few months… And voila, you’re right on track for raising the next Albert Einstein or Marie Curie.

Python is a great
“adjustable desk” language.

With the small but important difference that Python is also a much prettier desk. One that you wouldn’t be embarrassed of using past elementary school. And one you’d be okay with showing to your girlfriend/boyfriend. (Okay, time to stop with that desk analogy.)

My point is this:

What I love about Python is how it scales so well (no pun intended)—from writing simple prototypes to validate an idea, all the way to building “production grade” systems.

Sure, sometimes it would be nice to have a compiler and static type checks to lean on—but often I realized that I would’ve never come this far in so little time with Java or C++. And with optional type hints in Python 3 and type checking tools like mypy this gap is starting to close.

But not only does Python scale and grow with the project at hand, it also scales and grows with your skills as a developer.

It’s relatively easy to get started with Python—but it’s not going to prevent you from growing as a developer and getting impressive real-world work done with it. My friend and fellow Python wrangler Michael Kennedy refers to it as a “full spectrum” language. And I really like that as an analogy.

Python spans the gamut from print('hello, world') all the way to running the back-end infrastructure for massive applications like Reddit, Instagram, or YouTube.

Now, is using Python
always the right choice?

No.

No single programming language is.

For example, it’s unlikely you’re going to write a real-time operating system kernel in Python. Neither will id Software use it to implement their next-generation rendering engine…

But millions of developers around the world are using Python to build web applications, write data-crunching pipelines, generate reports, automate tests, conduct research, and do all kinds of other amazing work in a multitude of domains.

By learning Python you’re not limiting yourself to a specific niche.

And that’s what I love about this adorable, “adjustable desk” of a language.

Happy Pythoning!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Stacks in Python

Tue, 16 May 2017 00:00:00 GMT

Stacks in Python

How to implement a stack data structure (LIFO) in Python using built-in types and classes from the standard library.

A stack is a collection of objects that supports fast last-in, first-out (LIFO) semantics for inserts and deletes. Unlike lists or arrays, stacks typically don’t allow for random access to the objects they contain. The insert and delete operations are also often called push and pop.

A useful real-world analogy for a stack data structure is a stack of plates:

New plates are added to the top of the stack. And because the plates are precious and heavy, only the topmost plate can be moved (last-in, first-out). To reach the plates lower down in the stack the topmost plates must be removed one by one.

Stacks and queues are similar. They’re both linear collections of items and the difference lies in the order that items are accessed in:

With a queue you remove the item least recently added (first-in, first-out or FIFO); and with a stack you remove the item most recently added (last-in, first-out or LIFO).

Performance-wise, a proper stack implementation is expected to take O(1) time for insert and delete operations.

Stacks have a wide range of uses in algorithms, for example in language parsing and runtime memory management (“call stack”). A short and beautiful algorithm using a stack is depth-first search (DFS) on a tree or graph data structure.

Python ships with several stack implementations that each have slightly different characteristics. Let’s take a look at them:

✅ The list Built-in

Python’s built-in list type makes a decent stack data structure as it supports push and pop operations in amortized O(1) time.

Python’s lists are implemented as dynamic arrays internally which means they occasional need to resize the storage space for elements stored in them when elements are added or removed. The list over-allocates its backing storage so that not every push or pop requires resizing and you get an amortized O(1) time complexity for these operations.

The downside is that this makes their performance less consistent than the stable O(1) inserts and deletes provided by a linked list based implementation (like collections.deque, see below). On the other hand lists do provide fast O(1) time random access to elements on the stack which can be an added benefit.

Here’s an important performance caveat when using lists as stacks:

To get the amortized O(1) performance for inserts and deletes new items must be added to the end of the list with the append() method and removed again from the end using pop(). Stacks based on Python lists grow to the right and shrink to the left.

Adding and removing from the front is much slower and takes O(n) time, as the existing elements must be shifted around to make room for the new element.

# How to use a Python list as a stack (LIFO):

s = []

s.append('eat')
s.append('sleep')
s.append('code')

>>> s
['eat', 'sleep', 'code']

>>> s.pop()
'code'
>>> s.pop()
'sleep'
>>> s.pop()
'eat'

>>> s.pop()
IndexError: "pop from empty list"

✅ The collections.deque Class

The deque class implements a double-ended queue that supports adding and removing elements from either end in O(1) time (non-amortized).

Because deques support adding and removing elements from either end equally well, they can serve both as queues and as stacks.

Python’s deque objects are implemented as doubly-linked lists which gives them excellent and consistent performance for inserting and deleting elements, but poor O(n) performance for randomly accessing elements in the middle of the stack.

collections.deque is a great choice if you’re looking for a stack data structure in Python’s standard library with the performance characteristics of a linked-list implementation.

# How to use collections.deque as a stack (LIFO):

from collections import deque
q = deque()

q.append('eat')
q.append('sleep')
q.append('code')

>>> q
deque(['eat', 'sleep', 'code'])

>>> q.pop()
'code'
>>> q.pop()
'sleep'
>>> q.pop()
'eat'

>>> q.pop()
IndexError: "pop from an empty deque"

✅ The queue.LifoQueue Class

This stack implementation in the Python standard library is synchronized and provides locking semantics to support multiple concurrent producers and consumers.

The queue module contains several other classes implementing multi-producer, multi-consumer queues that are useful for parallel computing.

Depending on your use case the locking semantics might be helpful, or just incur unneeded overhead. In this case you’d be better off with using a list or a deque as a general purpose stack.

# How to use queue.LifoQueue as a stack:

from queue import LifoQueue
s = LifoQueue()

s.put('eat')
s.put('sleep')
s.put('code')

>>> s
<queue.LifoQueue object at 0x108298dd8>

>>> s.get()
'code'
>>> s.get()
'sleep'
>>> s.get()
'eat'

>>> s.get_nowait()
queue.Empty

>>> s.get()
# Blocks / waits forever...

A good default choice: `collections.deque`

If you’re not looking for parallel processing support (or don’t want to handle locking and unlocking manually) your choice comes down to the built-in list type or collections.deque.

The difference lies in the data structure used behind the scenes and ease of use.

list is backed by a dynamic array which makes it great for fast random access but requires occasional resizing when elements are added or removed. The list over-allocates its backing storage so that not every push or pop requires resizing and you get an amortized O(1) time complexity for these operations. But you do need to be careful to only insert and remove items from the right-hand side (append and pop) or otherwise performance slows down to O(n).
collections.deque is backed by a doubly-linked list which optimizes appends and deletes at both ends and provides consistent O(1) performance for these operations. Not only is its performance more stable, the deque class is also easier to use because you don’t have to worry about adding or removing items from “the wrong end.”

For these reasons, collections.deque makes an excellent choice for implementing a stack (LIFO queue) data structure in Python.

Read the full “Fundamental Data Structures in Python” article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Let’s Program with Python: Reacting to User Input (Part 4)

Thu, 11 May 2017 00:00:00 GMT

Let’s Program with Python: Reacting to User Input (Part 4)

In the fourth (and final) class in this series you’ll learn how to make your Python programs interactive by letting them react to user input.

In this guest post series by Doug Farrell you’ll learn the basics of programming with Python from scratch. If you’ve never programmed before or need a fun little class to work through with your kids, you’re welcome to follow along.

Looking for the rest of the “Let’s Program with Python” series? Here you go:

Part 1: Statements, Variables, and Loops
Part 2: Functions and Lists
Part 3: Conditionals and “if” Statements
Part 4: Reacting to User Input (This article)

Table of Contents – Part 4

Let’s Write a Program Together
Getting Information From the Player
Converting a String to a Number
Another Kind of Loop
More Things We Can Do With Lists
How Many Items Are in a List?
How to Pick Random Things From a List?
Our Completed “Guess My Number” Program
Congratulations!
Appendix – Python Info That Doesn’t Fit in Class

Let’s Write a Program Together

For this class we’re going to write a “Guess My Number” game program. In this game the program will pick a random number from 1 to 10 and the player will try to guess what the number is. The program will respond in different ways depending on whether the player guessed correctly or incorrectly. The player can also end the game whenever they want by telling the program to “quit”.

The interesting part of this program is you’re going to tell me how to write it instead of the other way around. But before we get started, we need to learn a few more things about Python to help us build our game.

Getting Information From the Player

In order to play our game the player has to interact with it. We need a way to get guesses from the player so the game can compare its secret number to the players guess. To do this we use the input() function.

The input() function let’s us ask the user for some information, and then wait for them to enter something using the keyboard. In the Python interactive mode it looks like this:

>>> guess = input("Please enter a number: ")
Please enter a number:

At the point where the input() function runs, the cursor is at the end of the "Please enter a number: " string, waiting for you to type something.

You can type anything you want, when you hit the <ENTER> key whatever you typed will be assigned to the guess variable as a string. This is a very simple way to get input from the user using the keyboard.

Converting a String to a Number

We haven’t talked about this yet, but there is a difference between a string like "10" and the number 10. Try this in the interactive mode:

>>> 10 == 10
True
>>> "10" == 10
False

On the first line we are comparing the two number 10’s to each other to see if they are equal. Python knows they are, so it responds by printing True to the screen.

But the next comparison, "10" == 10, why does Python respond with False? The simple answer is Python doesn’t think they’re equal.

But why aren’t they equal? This can be confusing, "10" looks like the number ten. And 10 definitely looks like the number ten as well. For Python however, this isn’t true.

The number 10 is exactly that, the numerical value 10. The string "10" is just a string, it has no numerical value, even though "10" looks like ten to us.

The difference is the representation. The "10" represents a string to Python, it doesn’t know that string represents ten to us. The 10 however does mean numerical ten to Python, ten things, ten cars, ten whatever.

What does this have to do with our game? A lot actually. When the game starts the program will randomly pick a number from 1 to 10, not a string, a number. However when the player types something into our guess = input("Please enter a number: ") prompt, guess is a string variable.

Even if the player enters a “1” and then a “0” and then hits enter, the guess variable will be a string. This is where a problem comes in. Let’s say we call the game’s variable for its number secret_number. If we write some Python code that compares them, like this:

if secret_number == guess:

This code will fail because comparing a string to a number will always be False. We need to make Python compare two of the same kinds of things. For our game, both things need to be numbers. We need to convert the player’s guess variable to a number. Python can do this using the int() function. It looks like this:

guess_number = int(guess)

With this code we’re taking the player’s input, guess, which could be something like “8”, and converting it to the numerical value 8 and assigning it to the new variable guess_number. Now when we compare guess_number with secret_number, they are the same kind of thing (numbers) and will compare correctly when we write Python code like this:

if guess_number == secret_number:

Another Kind of Loop

We’ve only used the for loop so far because it’s handy when you know ahead of time how many times you want to loop. For our game program we won’t know ahead of time how many guesses it will take our player to guess the secret_number. We also don’t know how many times they’ll want to play the game.

This is a perfect use for the other loop Python supports, the while loop. The while loop is called a conditional loop because it will continue looping until some condition it is testing is True. Here’s an example of a while loop:

game_running = True
while game_running:
    # Run some Python statements

What these program lines mean is that while the variable game_running is True, the while loop will keep looping. This also means something in the while loop will have to change the value of game_running in order for the program to exit the loop.

Forgetting to provide a way for the while loop to end creates what’s called an infinite loop. This is usually a bad thing and means in order to exit the program it has to be crashed or stopped in some other way.

More Things We Can Do With Lists

We’ve used Python lists before to hold things we want to deal with as one thing, like lists of turtles. We’ve created lists and appended things to lists. So far we’ve used the things in the list one at a time using the for loop. But how do we get to the individual things inside a list? For example, suppose I have this list in Python:

names = ["Andy", "George", "Sally", "Sharon", "Sam", "Chris"]

How can I get just the "Sally" name from the names list variable? We use something called list indexing to do that. Everything in a list has a position in the list, and all lists in Python start at position 0. The position is called an index, so to get "Sally" from the list, remembering all lists start at index 0, we do this:

name = names[2]

When we do this the variable name will be equal to "Sally" from our list. The [2] above is called the index into the list. We’ve told Python we want the thing inside the names list at index 2.

How Many Items Are in a List?

It’s often useful to be able to find out how many things are in a list. For instance, our names list above has six strings in it. But how could we find this out using Python? We use the len() function. It looks like this:

number_of_names_in_list = len(names)

This will set the variable number_of_names_in_list equal to six. Notice something about the number of items in the names list and the largest index, the name “Chris”. To get the name “Chris” from our names list we would do this:

name = names[5]

The last thing in the list is at index 5, but the number of things in the list is 6. This is because all lists start with index 0, which is included in the number of things in the list. So for the names list we have indexes 0, 1, 2, 3, 4 and 5, totaling 6 things.

How to Pick Random Things From a List?

Now we know how to pick individual things from a list, how to determine how long a list is and what the maximum index value in a list is. Can we use this information to choose a random thing from a list? For a minute let’s think about our turtle programs, we had a list something like this:

colors = ["black", "red", "organge", "yellow", "green", "blue"]

How could we pick a random color from this list to use when we were creating a turtle? We know the smallest index is 0, which would be the color “black”. We also know by looking at the list that our largest index is 5, the color blue. This is one less than the number of colors in the list. So we could do something like this:

colors = ["black", "red", "organge", "yellow", "green", "blue"]
turtle_color = colors[random.randint(0, 5)]

This Python statement would set the turtle_color variable to a random color from our colors list. But what if we added more colors to our list? Something like this:

colors = ["black", "red", "organge", "yellow", "green", "blue", "violet", "pink"]
turtle_color = colors[random.randint(0, 5)]

Unless we change the 5 in the random.randint(5) function we’ll still be picking from the first six colors and ignoring the new ones we added. What if we’re picking random colors all over our program, we’d have to change all the lines that pick a color every time we changed the number of colors in our colors list. Can we get Python to handle this for us? Sure we can, we can use the len() function to help us out. We can change our code to look like this:

colors = ["black", "red", "organge", "yellow", "green", "blue", "violet", "pink"]
turtle_color = colors[random.randint(0, len(colors) - 1)]

What’s going on here? We still have our colors list variable, but now we’re using the len() function inside our random.randint() function. This is okay, the len() function returns a number and random.randint() expects a number as its second parameter.

But now we’re telling random.randint() the upper index limit of the numbers we want to choose from is one less than the number of things in the colors list variable. And as we’ve seen, one less than the number of things in a list will always be the highest index in the list. By using the code above we can add or subtract as many items from the colors list as we want and our random selection will still work, using all the things in the list.

Our Completed “Guess My Number” Program

Here’s our Guess My Number program, complete with comments:

#
# Guess My Number
#

import random

# Set our game ending flag to False
game_running = True

while game_running:
    # Greet the user to our game
    print()
    print("I'm thinking of a number between 1 and 10, can you guess it?")

    # Have the program pick a random number between 1 and 10
    secret_number = random.randint(0, 10)

    # Set the player's guess number to something outside the range
    guess_number = -1

    # Loop until the player guesses our number
    while guess_number != secret_number:

        # Get the player's guess from the player
        print()
        guess = input("Please enter a number: ")

        # Does the user want to quit playing?
        if guess == "quit":
            game_running = False
            break

        # Otherwise, nope, player wants to keep going
        else:
            # Convert the players guess from a string to an integer
            guess_number = int(guess)


        # Did the player guess the program's number?
        if guess_number == secret_number:
            print()
            print("Congratulations, you guessed my number!")

        # Otherwise, whoops, nope, go around again
        else:
            print()
            print("Oh, to bad, that's not my number...")

# Say goodbye to the player
print()
print("Thanks for playing!")

Congratulations!

We’ve completed our course and I hope you’ve had as much fun as I had! We’ve written some pretty amazing programs together and learned quite a bit about programming and Python along the way. My wish is this interested you enough to keep learning about programming and to continue on to discover new things you can do with Python.

Appendix – Python Info That Doesn’t Fit in Class

Differences Between Python and Other Languages

There are many programming languages out in the wild you can use to program a computer. Some have been around for a long time, like Fortran and C, and some are quite new, like Dart or Go. Python falls in the middle ground of being fairly new, but quite mature.

Why would a programmer choose one language to learn over another? That’s a somewhat complicated question as most languages will allow you to do anything you want. However it can be difficult to express what you want to do with a particular language instead of something else.

For instance, Fortran excels at computation and in fact it’s name comes from Fromula Translation (ForTran). However it’s not known as a great language if you need to do a lot of string/text manipulation. The C programming language is a great language if your goal is to maximize the performance of your program. If you program it well you can create extremely fast programs. Notice I said “if you program it well”, if you don’t you can completely crash not only your program, but perhaps even your computer. The C language doesn’t hold your hand to prevent you from doing things that could be bad for your program.

In addition to how well a language fits the problem you’re trying to solve, it might not be able to be used with the tools you like, or might not provide the tools you need, a particular language just may not appeal to you visually and appear ugly to you.

My choice of teaching Python fits a “sweet spot” for me. It’s fast enough to create the kinds of programs I want to create. It’s visually very appealing to me, and the grammar and syntax of the language fit the way I want to express the problems I’m trying to solve.

Python Vocabulary

Let’s talk about some of the vocabulary used in the class and what it means. Programming languages have their own “jargon”, or words, meaning specific things to programmers and that language. Here are some terms we’ve used in relation to Python.

IDLE – command prompt: IDLE is the programming environment that comes with Python. It’s what’s called an IDE, or Integrated Development Environment, and pulls together some useful things to help write Python programs. When you start IDLE it opens up a window that has the Python interactive prompt >>> in it.

This is a window running the Python interpreter in interactive mode. This is where you can play around with some simple Python program statements. It’s kind of a sandbox where you can try things out. However, there is no way to save or edit your work; once the Python interpreter runs your statements, they’re gone.

IDLE – editor window: The file window (File → New Window) opens up a simple text editor. This is like Notepad in Windows, except it knows about Python code, how to format it and colorizes the text. This is where you can write, edit and save your work and run it again later. When you run this code, behind the scenes IDLE is running the program in the Python interpreter, just like it is in the first IDLE window.

Syntax Highlighting: When we edit code in the file window of IDLE it knows about Python code. One of the things this means is the editor can “colorize”, or syntax highlight, various parts of the Python code you’re entering. It sets the keywords of Python, like for and if, to certain colors. Strings to other colors and comments to another. This is just the file window being helpful and providing syntax highlighting to make it easier for the programmer to read and understand what’s going on in the program.

Python Command Line: In Windows if you open up a command line window, what used to be called a DOS box, and run python, the system will respond with the Python command prompt >>>. At this point you’re running Python in it’s interactive mode, just like when you’re inside of IDLE. In fact they are the same thing, IDLE is running it’s own Python command line inside the window, they are functionally identical.

You might think “what use is that?”, and I agree, I’d rather work in IDLE if I’m going to using the interactive mode and play with the sandbox mode and the >>> command prompt. The real use of the Python command line is when you enter something like this at the system command prompt:

python myprogram.py

If I’ve written a program called myprogram.py and entered the line above, instead of going into interactive mode, Python will read myprogram.py and run the code. This is very useful if you’re written a program you want to use and not run inside of IDLE. As a programmer I run programs in this manner all day long, and in many cases these programs run essentially forever as servers.

Attribute and Property: We’ve thrown around the terms “attribute” and “property” kind of randomly, and this can lead to some confusion. The reason it’s confusing is these things mean essentially the same thing. When talking about programming there is always the goal to use specific words and terms to eliminate confusion about what you’re talking about.

For instance let’s talk about you. You have many qualities that different people want to express. Your friends want to know your name and phone number. Your school wants to know that as well, and your age, the grade you’re in and our attendance record. In programming terms we can think of these as attributes or properties about you.

The attributes and properties of a thing (you for example) help get more specific information about the thing. And the specific information wanted depends on the audience asking. For example when meeting someone new they are more likely to be interested in your name property. Whereas your school might be more interested in your attendance property.

In Python we’ve been working with turtles, and those turtles have attributes and properties. For example a turtle as a property called forward. This property happens to be a function that moves the turtle forward, but it’s still a property of the turtle. In fact all the properties and attributes associated with a turtle are expressed as functions. These functions either make the turtle do something, or tell us something about the turtle.

Attributes and properties lead into a concept of Object Oriented Programming (OOP) that adds the concept of “things” to programs rather than just data and statements. Object Oriented Programming is beyond the scope of this book, but is very interesting and useful.

Interpreter vs Compiler

In class you’ve heard me talk about the Python interpreter, what does this mean. As we’ve talked about, computer languages are a way for people to tell a computer what to do. But the truth is a computer only understands 0’s and 1’s, so how does a computer understand a language like Python? That’s where a translation layer comes into play, and that translation layer is the interpreter for Python (and other interpreted languages) and a compiler for compiled languages. Let’s talk about compilers first.

Compiler: A compiler is a translator that converts a computer language into machine code, the 0’s and 1’s a computer understands. A compiler usually produces an executable file, on Windows machines this is a file that ends in .exe. This file contains machine code information the computer can run directly. Languages like C, C++ and Fortran are compiled languages and have to be processed by a compiler before the program can run. One thing this means is you can’t run a compiled language directly, you have to compile it first. It also means there is nothing like the interactive mode (the >>> prompt in Python) in a compiled language. The entire program has to be compiled, it can’t compile and run single statements.

Interpreter: Here’s where things get a little more confusing. Most interpreted languages also have a compiled step, but the output of that step isn’t machine code, no 0’s and 1’s. Instead the compilation step produces what is called ByteCode. The ByteCode is kind of an intermediate step between the near English computer language and the machine code understood by the computer.

The ByteCode can’t be run directly, it is run by a thing called a virtual machine. When the program is run, the virtual machine reads the ByteCode and it generates the computer specific machine code that actually is run by the computer. When you run the program the virtual machine is constantly “interpreting” the ByteCode and generating computer specific machine code. Unlike a compiled language, languages like Python with virtual machines can provide an interactive mode (the >>> prompt) as the interpreter and virtual machine can translate and run program statements on the fly.

Advantages And Disadvantages: So why would a programmer pick a compiled language over an interpreted language, and vice versa? Well, what we said before still applies, expressiveness of the language, style, etc, those are important things to think about when choosing a language for a project. But there are some differences beyond that. In general compiled languages produce programs that run faster than programs produced by an interpreter. Remember, compiled languages produce programs containing machine code that can be run directly, whereas interpreted languages usually have a virtual machine between the ByteCode and the machine code, so there’s a speed penalty there. However, also keep in mind modern computers are so fast this difference is less important. In addition, interpreted languages are being constantly improved so their performance gets better and better, so the performance difference between the two is shrinking.

Most interpreted languages also offer safety features to prevent the programmer from crashing the program. Interpreted languages make it hard to corrupt memory. They make it difficult to get direct access to the hardware. They don’t force the programmer to manage memory explicitly. Compiled programs like C offer none of this, and therefore it’s easy to do all of those things, which can put your program at risk, unless you’re a skilled programmer. The safety features can be added to a C program, but this has to be done manually by the programmer and isn’t handled by the language natively.

Python Reference Materials

Included below is a list of reference materials to help you go further in your study of Python.

Python Website – Main Python website
Python Documentation – Official Python 3 Documentation
Python Turtle Documentation – Official Python Documentation for Turtle
Python Tutorials for Beginners on dbader.org
Learn Python – An interesting tutorial to help learn Python
How To Think Like A Computer Scientist – Interesting and interactive way to learn Python
PyGame – An add on module for writing games with Python

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

PythonistaCafe: A Peer-to-Peer Learning Community for Python Developers

Tue, 09 May 2017 00:00:00 GMT

PythonistaCafe: A Peer-to-Peer Learning Community for Python Developers

Introducing PythonistaCafe—an invite-only, online community of Python and software development enthusiasts helping each other succeed and grow.

Most programmers I know say they sometimes feel “stuck” in their learning progress. Whether you’re a beginner, an intermediate developer, or an experienced senior dev—you’ll eventually hit a point where you feel like you’re no longer making progress:

Where you think you’re no longer learning new things.
Where it feels like you’re making backwards progress, and your skills seemingly atrophy and get worse over time; or
Where you recognize you’re good at what you do—but you don’t know what to learn next, or how.

It happens to all of us. For example, I found that I’m the happiest when I can learn new things and then apply them in practice or teach them to others. Every time I hit a plateau like that this feeling of being “stuck” strikes at the core of my identity. If it lasts for too long I get uneasy. It’s almost like I’m losing my sense of purpose.

Not a nice feeling at all.

But I’ve also been around the block enough that I know I can overcome it, that I can shake it off eventually. What usually helps me get my bearings straight again is talking to my friends who also work in the tech industry. I found it’s important to talk with people who are in the same boat:

The folks that I went to university with or former colleagues who can relate to my “geek angst.”

I get it that many of the things us programmers are struggling with seem funny or weird to “outsiders”—

I’m not expecting my friend who works as an insurance broker to understand the strange headspace I can get in when I think that “my learning progress is stuck” (after having worked in the industry for a decade.)

Let’s be honest here, this sounds ridiculous to a “normal” person… 🙂

But it’s a real feeling—and one that’s quite common in the programming community.

The quickest way I found to overcome it is to talk to people that you feel comfortable pouring your geeky heart out to. Whether you’re learning Python on your own, working by yourself as a freelancer, or if you’re the only Pythonista at your company:

Experiencing this sense of community and exchanging your thoughts with other techies will have a huge benefit on your quality of life. I know it did on mine. And I want every developer I know to be able to experience the same:

That’s why I started PythonistaCafe, a peer-to-peer learning community for Python developers.

It’s an idea I’ve been kicking around and refining since July 2016. And over the last few months it finally became a reality.

A good way to think of PythonistaCafe is to see it as a club of mutual improvement for Python enthusiasts.

We have members located all over the world, and with a wide range of proficiency levels. I’m impressed by their diverse skill set and the depth and quality of the conversations we had. Every day we discuss a broad range of programming questions, career advice, and other topics:

We now even have some open-source projects and Kaggle data science competitions we’re collaborating on to help people build up their portfolio and gain experience. It’s been a ton of fun—and a great “Python support group.”

You can learn more about PythonistaCafe, our community values, and what we’re all about at www.pythonistacafe.com.

In part, I also started PythonistaCafe to “scratch my own itch”—it is my new home when it comes to Python. Each week I receive a ton of emails asking me for programming or career advice. And to be honest, it’s hard to keep up with them all.

If you need access to me to help solve a Python problem or get advice in what direction to go, the PythonistaCafe forums are where you can find me. I’m checking the forums and replying to topics and questions every single day. If you’re looking for an alternative to my 1:1 Python coaching program a membership in PythonistaCafe might be a great fit for you.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Let’s Program with Python: Conditionals and “if” Statements (Part 3)

Thu, 04 May 2017 00:00:00 GMT

Let’s Program with Python: Conditionals and “if” Statements (Part 3)

In part three of this four-part Python introduction you’ll see how to teach your program how to make decisions with conditionals and if-statements.

Looking for the rest of the “Let’s Program with Python” series? Here you go:

Part 1: Statements, Variables, and Loops
Part 2: Functions and Lists
Part 3: Conditionals and “if” Statements (This article)
Part 4: Reacting to User Input

Let’s Get Those Turtles Thinking

In our last class we used a Python list to help us get multiple turtles drawing on the screen. We could keep adding turtles to our hearts content and the program would faithfully make each turtle draw our flower. This worked great for drawing the well controlled structure of the flower.

But what if we want to draw something that’s randomly generated, something where the turtles draw something and we don’t know ahead of time what that will be? How can we use what we know already to help us do that?

Let’s teach our program how to make decisions and do things on its own. Here is an image of one possible graphical outcome for our class:

New Turtle Drawing Functions

We’re going to create a new program where our turtles use some new drawing functions and new modules to create a randomly drawn image. Let’s learn the new turtle drawing functions first.

Let’s start out by starting Idle, opening a new program editor window and creating a new Python program. In this new program let’s start as we’ve done before by entering this Python statement:

import turtle

Save this program to a new file name, somewhere you can remember where to find it.

Get the Turtle Screen: turtle.Screen()

The first new turtle drawing function we’re going to learn isn’t really about the turtles at all, but about the screen they draw on. Up until now we haven’t cared to much about the screen the turtles are drawing on, we’ve just let the turtles create it as needed and away we go.

But now we want to modify something about the screen. In order to do that we have to first get the screen in a manner that we can change it. As with everything we do with Python programming, any time we want to get something so we can modify it, we save it to a variable. To get the screen we enter the following into our new program:

screen = turtle.Screen()

This calls another function of our turtle module, Screen(), which gets the screen the module will use to draw turtles on, and saves it in the newly created variable screen.

Notice how the Screen() function of the turtle module has it’s first letter capitalized, like when we create a turtle with Turtle().

Set the Screen Size: turtle.setup()

Until now we’ve let the turtle module create our window to be whatever size it wants. We can control this using the setup() function of a turtle. I’m not sure why this is a turtle function instead of a screen function, but sometimes programming is like that. This function looks like this:

turtle.setup(1024, 768)

This Python statement sets our turtle drawing window to be 1024 pixels wide by 768 pixels tall.

Set the Background Color of the Screen: screen.bgcolor()

Now that we have a variable that represents the screen, we can modify a feature of it. We’re going to change the background color from white to some other color. We do this using this Python statement:

screen.bgcolor("#FFFFE0")

This statement shows how to use the screen variable and call one of its functions, bgcolor() (short for background color) to set the background color on the screen.

If we save and run this you’ll see an empty turtle window that has a light yellow color instead of white. The light yellow color is the "#FFFFE0" we passed as a parameter to the bgcolor() function.

So what does "#FFFFE0" mean? We could have just passed "yellow" to the bgcolor() function, like we’ve done with our turtles, but that yellow is pretty intense and I wanted something lighter for a background color.

So we’ve used a different way to define a color, this way comes right out of HTML (web page) coding. The "#FFFFE0" value represents setting the RGB (Red / Green / Blue) color value, each two character portion of the string FFFFE0 represents a value from 0 - 255 in hexadecimal (base 16, common in programming). This breaks down like this:

FF FF E0
 |  |  |
 |  |  +--- 224 Blue
 |  +------ 255 Green
 +--------- 255 Red

This somewhat complex color code let’s us pick a color much more precisely than the limited pre-defined set of named colors (like "red" or "yellow") that are inside the turtle module.

Turtles Are Rubber Stamps!

We can also use our turtles as rubber stamps! By this I mean we can tell the turtle to leave a permanent image of itself at any point the turtle exists on the screen. We do this by using the turtle stamp() function, which looks like this:

turtle.stamp()

Running this Python statement makes a “stamp” of our turtle on the screen. When next we move the turtle you’ll see the stamp it left behind, kind of like bread crumbs of where its been. Let’s see how this works by entering the following into our program to make it look like this:

import turtle

screen = turtle.Screen()
turtle.setup(1024, 768)
screen.bgcolor("#FFFFE0")

t1 = turtle.Turtle()
t1.speed(0)
t1.shape("turtle")
t1.width(3)
t1.color("red")

for side in range(4):
    t1.forward(100)
    t1.stamp()
    t1.right(90)

When we save and run this program we should end up with a box outlined in red and a turtle “stamp” at each corner. The screen should look like this:

New Modules and Functions

In order to make our new program have random behavior we need to import a new module, logically enough called “random”. The random module, like the turtle module, brings additional functionality into our program so we can use it.

Add this line at the top of our program right under the import turtle statement:

import random

Just like the turtle module this doesn’t do anything immediately, but now our program has access to the functions in the random module.

Pick a Number, Any Number: random.randint()

The module random, as the name suggests, creates randomness. We’ll use the functions in the random module to make our turtle drawing less predictable, and maybe more interesting.

One of those functions on the module is called randint(), and it generates random integers. If we jump over to our Idle interactive window we can try out the function.

Enter this into our Idle interactive window to try the randint() function out:

>>> random.randint(0, 10)
4
>>> random.randint(0, 10)
10

You can see that just like the functions in the turtle module we have to use the module name random and a dot (.) character before the function we want.

In the above lines we’ve used the randint() function twice and it returned a different number each time. This is what randint() does, it returns randomly generated integers. This also means that the numbers you’ll see in your Idle window when you run this example will (likely) be different.

The two numbers we passed to it (0 and 10) are parameters telling randint() the beginning and ending limits of numbers we want it to generate. In our case we want integer numbers ranging from 0 to 10, including both 0 and 10. Random number generators are used a lot in game programming to create unexpected behavior and challenges for the player.

Let’s Get Our Program Going

Let’s get our random turtle program going so we can add things to it. Make your program look like this:

import turtle
import random

screen = turtle.Screen()
screen.bgcolor("#FFFFE0")

for move in range(100):
    for a_turtle in turtles:
        move_turtle(a_turtle)

If we save and try to run the above program we’ll get errors. Why is this?

Well for a couple of reasons, we have no variable named turtles and the function move_turtle() isn’t defined. Let’s fix that. Like our flower program we want to create a list of turtles, and we’ll need to define our move_turtle() function.

So make your program look like this:

import turtle
import random

screen = turtle.Screen()
screen.bgcolor("#FFFFE0")

def move_turtle(t):
    pass

turtles = []

for move in range(100):
    for a_turtle in turtles:
        move_turtle(a_turtle)

Now when we save and run our program it doesn’t crash with an error, but it doesn’t do anything other than open a light yellow window.

Why is that? Again, a couple of reasons. We haven’t defined any turtles in our turtles list variable. We’ve also defined our move_turtle() function, but it doesn’t do anything. The pass statement is just a placeholder that makes program work, but doesn’t provide any functionality. First things first, let’s create our turtles.

Getting a Variable From a Function

In our flower program when we wanted to create our turtles we did so by copying the turtle creation and setup code for every turtle we wanted. Then we put all those turtles into a list we called turtles.

This works fine, but let’s do something clever and create a function to create our turtle for us. And let’s define it so we can set the color of the turtle by passing the color as a parameter to the function. Here’s a function that will do just that:

def create_turtle(color):
    t = turtle.Turtle()
    t.speed(0)
    t.width(3)
    t.shape("turtle")
    t.color(color)
    return t

Notice at the end of the create_turtle(color) definition, the return t statement. What does this do?

This is how to return the turtle we just created for use in the rest of the program. We’ve seen this before when we used the t1 = turtle.Turtle() statement. The turtle.Turtle() function returns a turtle, and that returned turtle is assigned to the variable t1. In our case we’re returning the turtle we created, what we called t, so it can be saved someplace in our program and used later.

Now we have a function that will create a turtle for us that will draw with the color we’ve asked for. But we need to create multiple turtles to put into our turtles list variable.

The create_turtle() function only creates one turtle, how can we create multiple turtles with it? An easy way to do this is to create another function using create_turtles() inside a loop to create our list of turtles. Here’s a function that does that:

def create_turtles(colors):
    turtles = []
    for color in colors:
        t = create_turtle(color)
        turtles.append(t)
    return turtles

Here we’ve created a function create_turtles(colors) (notice the plural on both the name of the function and the parameter, this just helps us be clear what our intent is) that creates a list of turtles. We use this function like this:

colors = ["black", "red", "orange", "green"]
turtles = create_turtles(colors)

In the above code we created a variable colors containing a list of four valid turtle colors. We then passed the list to our create_turtles() function. Inside that function we create an empty turtles list with the turtles = [] statement.

Then we start a for loop taking one color at a time from the colors list parameter, passes that to our create_turtle() function, which creates a turtle that draws in that color.

We then use the turtles.append(t) statement to add the turtle to our turtles variable. The append() function is part of the functionality associated with lists, and lets us add elements to the end of the list programmatically. At the end of the loop we return our turtles list variable so it can be used later.

If we save and run this program it works, but doesn’t draw anything but the last green turtle on the screen. Remember turtles are all created in the center of the screen, so all four are there, just stacked on top of each other.

Let’s put some code in our move_turtle(t) function to get those turtles moving.

Moving Turtles Randomly

We want our turtles to draw randomly around the screen, so inside the draw_turtle(t) function is where we’re going to use our random.randint() function we learned about earlier. We also want to stamp a turtle on the screen with every move, which is where we’ll use our stamp() function. Here’s a function that will turn a turtle a random angle and move it a random distance:

def move_turtle(t):
    t.stamp()
    angle = random.randint(-90, 90)
    t.right(angle)
    distance = random.randint(50, 100)
    t.forward(distance)

This function does a couple of things. First, it expects a turtle as a parameter variable, in the example above that parameter variable is t. The first thing the function does is use our turtle t to stamp() a turtle image on the screen.

It then uses the random.randint() function to create an angle variable set to between -90 and 90 degrees. This allows our turtle to turn left or right some random amount. We pass this random angle variable to our t.turn(angle) function to turn our t turtle.

We then do a similar thing to create a random distanace varible set to between 50 and 100. We use this variable in our t.forward(distance) function call to move our t turtle forward some random distance.

Our Program So Far

Let’s see what we’ve got for our program so far:

import turtle
import random

screen = turtle.Screen()
turtle.setup(1024, 768)
screen.bgcolor("#FFFFE0")

# The number of turtles to create and what color to create them with
colors = ["black", "red", "orange", "green"]

# Create a new turtle with a certain color
def create_turtle(color):
    t = turtle.Turtle()
    t.speed(0)
    t.width(3)
    t.shape("turtle")
    t.color(color)
    return t

# Create a list of turtles from a list of colors
def create_turtles(colors):
    turtles = []
    for color in colors:
        t = create_turtle(color)
        turtles.append(t)
    return turtles

def move_turtle(t):
    t.stamp()
    angle = random.randint(-90, 90)
    t.right(angle)
    distance = random.randint(50, 100)
    t.forward(distance)

turtles = create_turtles(colors)

for move in range(100):
    for a_turtle in turtles:
        move_turtle(a_turtle)

If you save and run our program it will generate a screen that looks something like this:

You probably noticed that your turtles might have wandered off the screen, sometimes never to return. How can we keep our turtles on the screen so we can see what they’re drawing?

We have them make decisions so they know how to turn around if they go off the screen. This is where we use something called conditionals in programming, a way of making a decision based on a condition that is happening in our program.

Conditionals and “if” Statements

As we briefly talked about in our first class, the way to make programs act smarter is to have them make decisions. To do this we use something called conditionals.

Conditionals are just a way for a program to look at something (a condition) and make a decision to do something or something else. For instance, here’s some possible Python conditional program statements:

if x < -250 or x > 250:
    outside_box = True

Here’s what’s happening in these Python statements:

Use the if statement to test whether the variable x is less than negative 250, or greater than positive 250
If x is outside those two values, set the variable outside_box to Boolean True

How can we use conditionals to keep our turtles inside a viewable area? First off let’s make our viewable area a box that’s inside our screen so we can see what our turtles do when they go outside that box.

In our program we’ll create a variable box_size equal to the size of the box we want to make our viewable area, let’s say 500. We’ll also use one of our turtles to draw this viewable box on the screen so we can see the box edges.

Let’s make our program look like this:

import turtle
import random

screen = turtle.Screen()
turtle.setup(1024, 768)
screen.bgcolor("#FFFFE0")

colors = ["black", "red", "orange", "green"]

box_size = 500

def create_turtle(color):
    t = turtle.Turtle()
    t.speed(0)
    t.width(3)
    t.shape("turtle")
    t.color(color)
    return t

def create_turtles(colors):
    turtles = []
    for color in colors:
        t = create_turtle(color)
        turtles.append(t)
    return turtles

def move_turtle(t):
    t.stamp()
    angle = random.randint(-90, 90)
    t.right(angle)
    distance = random.randint(50, 100)
    t.forward(distance)

turtles = create_turtles(colors)

t1 = turtles[0]
t1.penup()
t1.goto(box_size / 2, box_size / 2)
t1.pendown()

for side in range(4):
    t1.right(90)
    t1.forward(box_size)

t1.penup()
t1.goto(0, 0)
t1.pendown()

for move in range(100):
    for a_turtle in turtles:
        move_turtle(a_turtle)

Right under where we create our colors list we’ve created the box_size variable and set it equal to 500. Further down under where we created our turtles list variable, we’ve used the first turtle from the list, t1 = turtles[0], to draw our viewable boundary box. After we’re done drawing the box the turtle is moved back to it’s starting position.

So how do we use a conditional to keep our turtles inside the box we’ve just drawn? First things first, we need to know where the turtle is in order to figure out if it’s outside the boundary box. To do this we need another turtle function.

Where’s My Turtle: xcor() and ycor()

A turtle has two functions telling us where it is in relation to the home position, (0, 0). Those functions are called xcor() and ycor(), which are short for x coordinate and y coordinate. They are used like this:

x = t.xcor()
y = t.ycor()

As you might have guessed, the t.xcor() function returns the current x coordinate of the turtle t, and t.ycor() returns the current y coordinate of the turtle.

Now we have enough information to decide if a turtle is inside or outside our boundary box. We know where the edges of the boundary box are in relation to where we started drawing it, plus and minus 250 pixels in relation to the starting position of the turtles, (0, 0). We also can figure out where our turtles are any time we want, which we can compare to the boundary box edges.

Let’s create a function that returns True if the turtle is outside the box and False otherwise. The function will need the turtle to test and information about the box. That function looks like this:

def is_turtle_outside_box(t, size):
    outside_box = False
    x = t.xcor()
    y = t.ycor()
    if x < (size / 2) or x > (size / 2):
        outside_box = True
    if y < (size / -2) or y > (size / 2):
        outside_box = True
    return outside_box

This function expects a turtle to be passed as the first parameter and a number for the size of the boundary box as the second parameter. It then sets the return variable outside_box initially to False. It then creates the x and y variables, setting them to the x and y coordinates of the passed in turtle t respectively. Then using an if statement it compares the x and y variables to the size divided by 2.

Why is the size divided by 2? Because my intention is to pass the box_size variable to this function, and the boundary box is centered on the screen, with half (250 pixels) on each side of that.

Now that we have this function, how can we use it? Inside our inner most loop we move our turtle, at which point it might be outside the boundary box, so this seems like a good place to use our is_turtle_outside_box() function. Here’s just the looping portion of our current program showing the inclusion of the new function:

for move in range(100):
    for a_turtle in turtles:
        move_turtle(a_turtle)
        if is_turtle_outside_box(a_turtle, box_size) == True:
            a_turtle.right(180)
            a_turtle.forward(100)

What we’ve done is after our move_turtle() function call, we added an if statement using our is_turtle_outside_box() function to figure out if our turtle t is outside the boundary box. If the return value of is_turtle_outside_box() is True, we turn our turtle t around 180 degrees from where it’s currently facing and move it 100 pixels back inside the boundary box. Then the loop moves onto the next turtle and the next move for all turtles.

Here’s our completed program with comments:

import turtle
import random

# Change the color of the background
screen = turtle.Screen()
screen.bgcolor("#FFFFE0")

# The number of turtles to create and what color to create them with
colors = ["black", "red", "orange", "green"]

# Size of our box
box_size = 500

# Create a new turtle with a certain color
def create_turtle(color):
    t = turtle.Turtle()
    t.speed(0)
    t.width(3)
    t.shape("turtle")
    t.color(color)
    return t

# Create a list of turtles from a list of colors
def create_turtles(colors):
    turtles = []
    for color in colors:
        t = create_turtle(color)
        turtles.append(t)
    return turtles


# Stamp and move the turtle
def move_turtle(t):
    t.stamp()
    angle = random.randint(-90, 90)
    t.right(angle)
    distance = random.randint(50, 100)
    t.forward(distance)

# Is the turtle outside the box?
def is_turtle_outside_box(t, size):
    outside_box = False
    x = t.xcor()
    y = t.ycor()
    if x < (size / -2)  or x > (size / 2):
        outside_box = True
    if y < (size / -2) or y > (size / 2):
        outside_box = True
    return outside_box

# Create our list of turtles
turtles = create_turtles(colors)

# Use the first turtle to draw our boundary box
t1 = turtles[0]
t1.penup()
t1.goto(box_size / 2, box_size / 2)
t1.pendown()

for side in range(4):
    t1.right(90)
    t1.forward(box_size)

t1.penup()
t1.goto(0, 0)
t1.pendown()

# Move all the turtles a hundred times
for move in range(100):

    # Move a particular turtle from our list of turtles
    for a_turtle in turtles:
        move_turtle(a_turtle)

        # Is the turtle outside the boundary box?
        if is_turtle_outside_box(a_turtle, box_size) == True:

            # Turn the turtle around and move it back
            a_turtle.right(180)
            a_turtle.forward(100)

When we run our program the screen should look something like this:

Conclusion

You’re all getting to be real Python programmers now! You’ve created a program that draws with turtles, and makes decisions based on where those turtles are, very, very cool!

In the fourth (and final) class in this series you’ll learn how to make your Python programs interactive by letting them react to user input:

Let’s Program with Python: Reacting to User Input (Part 4)

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Queues in Python

Tue, 02 May 2017 00:00:00 GMT

Queues in Python

How to implement a FIFO queue data structure in Python using only built-in data types and classes from the standard library.

A queue is a collection of objects that supports fast first-in, first-out (FIFO) semantics for inserts and deletes. The insert and delete operations sometimes called enqueue and dequeue. Unlike lists or arrays, queues typically don’t allow for random access to the objects they contain.

Here’s a real-world analogy for a first-in, first-out queue:

Imagine a line of Pythonistas waiting to pick up their conference badges on day one of PyCon registration. New additions to the line are made to the back of the queue as new people enter the conference venue and “queue up” to receive their badges. Removal (serving) happens in the front of the queue, as developers receive their badges and conference swag bags and leave the queue.

Another way to memorize the characteristics of a queue data structure is to think of it as a pipe:

New items (water molecules, ping-pong balls, …) are put in at one end and travel to the other where you or someone else removes them again. While the items are in the queue, a solid metal pipe, you can’t get at them. The only way to interact with the items in the queue is to add new items at the back (enqueue) or to remove items at the front (dequeue) of the pipe.

Queues are similar to stacks and the difference between them is in removing items:

With a queue you remove the item least recently added (first-in, first-out or FIFO); and with a stack you remove the item most recently added (last-in, first-out or LIFO).

Performance-wise, a proper queue implementation is expected to take O(1) time for insert and delete operations. These are the two main operations performed on a queue and they should be fast in a correct implementation.

Queues have a wide range of applications in algorithms and to solve scheduling, as well as parallel programming problems. A short and beautiful algorithm using a queue is breadth-first search (BFS) on a tree or graph data structure.

Scheduling algorithms often use priority queues internally. These are specialized queues: instead of retrieving the next element by insertion time, a priority queue retrieves the highest-priority element. The priority of individual elements is decided by the queue based on the ordering applied to their keys.

A regular queue, however, won’t re-order the items it carries. You get what you put in, and in exactly that order (remember the pipe example?)

Python ships with several queue implementations that each have slightly different characteristics. Let’s take a look at them:

⛔ The list Built-in

It’s possible to use a regular list as a queue but this is not ideal from a performance perspective. Lists are quite slow for this purpose because inserting or deleting an element at the beginning requires shifting all of the other elements by one, requiring O(n) time.

Therefore I would not recommend you use a list as a makeshift queue in Python (unless you’re dealing with a small number of elements only).

# How to use Python's list as a FIFO queue:

q = []

q.append('eat')
q.append('sleep')
q.append('code')

>>> q
['eat', 'sleep', 'code']

# Careful: This is slow!
>>> q.pop(0)
'eat'

✅ The collections.deque Class

The deque class implements a double-ended queue that supports adding and removing elements from either end in O(1) time.

Python’s deque objects are implemented as doubly-linked lists which gives them excellent performance for enqueuing and dequeuing elements, but poor O(n) performance for randomly accessing elements in the middle of the queue.

Because deques support adding and removing elements from either end equally well, they can serve both as queues and as stacks.

collections.deque is a great default choice if you’re looking for a queue data structure in Python’s standard library.

# How to use collections.deque as a FIFO queue:

from collections import deque
q = deque()

q.append('eat')
q.append('sleep')
q.append('code')

>>> q
deque(['eat', 'sleep', 'code'])

>>> q.popleft()
'eat'
>>> q.popleft()
'sleep'
>>> q.popleft()
'code'

>>> q.popleft()
IndexError: "pop from an empty deque"

✅ The queue.Queue Class

This queue implementation in the Python standard library is synchronized and provides locking semantics to support multiple concurrent producers and consumers.

The queue module contains several other classes implementing multi-producer, multi-consumer queues that are useful for parallel computing.

Depending on your use case the locking semantics might be helpful, or just incur unneeded overhead. In this case you’d be better off with using collections.deque as a general purpose queue.

# How to use queue.Queue as a FIFO queue:

from queue import Queue
q = Queue()

q.put('eat')
q.put('sleep')
q.put('code')

>>> q
<queue.Queue object at 0x1070f5b38>

>>> q.get()
'eat'
>>> q.get()
'sleep'
>>> q.get()
'code'

>>> q.get_nowait()
queue.Empty

>>> q.get()
# Blocks / waits forever...

✅ The multiprocessing.Queue Class

This is a shared job queue implementation that allows queued items to be processed in parallel by multiple concurrent workers. Process-based parallelization is popular in Python due to the global interpreter lock (GIL).

multiprocessing.Queue is meant for sharing data between processes and can store any pickle-able object.

# How to use multiprocessing.Queue as a FIFO queue:

from multiprocessing import Queue
q = Queue()

q.put('eat')
q.put('sleep')
q.put('code')

>>> q
<multiprocessing.queues.Queue object at 0x1081c12b0>

>>> q.get()
'eat'
>>> q.get()
'sleep'
>>> q.get()
'code'

>>> q.get()
# Blocks / waits forever...

A good default choice: `collections.deque`

If you’re not looking for parallel processing support the implementation offered by collections.deque is an excellent default choice for implementing a FIFO queue data structure in Python.

I’d provides the performance characteristics you’d expect from a good queue implementation and can also be used as a stack (LIFO Queue).

Read the full “Fundamental Data Structures in Python” article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Let’s Program with Python: Functions and Lists (Part 2)

Thu, 27 Apr 2017 00:00:00 GMT

Let’s Program with Python: Functions and Lists (Part 2)

In part two of this four-part Python introduction you’ll see how to write reusable “code building blocks” in your Python programs with functions.

Part 1: Statements, Variables, and Loops
Part 2: Functions and Lists (This article)
Part 3: Conditionals and “if” Statements
Part 4: Reacting to User Input

Table of Contents – Part 2

Programmers Are Lazy
Introduction to Functions
New Turtle Drawing Functions
Drawing With Multiple Turtles
Grouping Things With Lists
Conclusion

Programmers Are Lazy

We mentioned this in the last class, but if you’re going to be a programmer, you have to embrace basic laziness. Programmers don’t like to repeat themselves and always look for ways to write less code rather than more to get the same things done.

In our last class we saw how using a for loop could reduce the amount of code we had to write to draw a flower. We used a loop to repeat drawing the “petals” of our flower so we didn’t have to write code for every one.

Let’s learn about another tool we can put in our programmers toolbelt called functions.

Introduction to Functions

Functions allow us to use the same set of Python statements over and over again, and even change what the Python code does without having to change the code. We’ve already used functions in the previous session in our turtle program. We used the range() function as part of a for loop.

The range() function is built into Python, but what does it do?

It generates a range of numbers we can use inside a for loop, as simple as that. Let’s start Idle, get into interactive mode and enter this at the Python command prompt:

>>> range(10)
range(0, 10)

The range(10) function created something that will generate a count from 0 to 9 (that’s 10 numbers in total). Notice we told the range() function how big the range we wanted was by passing 10 as the parameter of the function.

Using this in a for loop shows the values generated by range(10):

>>> for x in range(10):
...     print(x)
0
1
2
3
4
5
6
7
8
9

What we’ve done is:

Create a for loop that’s going to assign the range of values generated one at a time to the variable x.
Then inside the loop we’re just printing the latest value of x.

You’ll notice that value of x goes from 0 to 9, not 10 as you might expect. There are still ten values, but because Python is zero based (starts things at zero, unless told otherwise), the range(10) function goes from 0 → 9.

In our flower drawing turtle program we called range() like this:

>>> range(36)
range(0, 36)

This generated a range of 36 values, from 0 to 35. These two examples demonstrate we are changing what the range() function does based on the value we give to it.

The value we give to the range() function is called a parameter, and the value of that parameter is used to change what the range() function does. In the examples above the parameter tells the range() function how many numbers to generate and gives back to our program a way to use them.

We’ve also used functions when we were working with our turtle. For example when I changed the color of my turtle t, with the color() function, like this:

>>> t.color("yellow", "red")

I was calling the color() function of the turtle variable t, and passed it two parameters, "yellow" and "red":

The "yellow" parameter changed the color of the t turtle and the color it draws with.
The "red" parameter changed the color the turtle used when filling a shape.

Flower Drawing Using Functions

Okay, so it’s great Python provides a bunch of functions we can use to do different things, how do functions help me be lazy?

Well, Python also lets us create our own functions and use them just like we would any built in function.

In Idle let’s open our turtle program code from last class and try something out. Modify your program to look like this:

import turtle

t1 = turtle.Turtle()
t1.shape("turtle")
t1.speed(0)
t1.color("yellow", "red")
t1.width(3)

def draw_box(t):
    t.begin_fill()
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.end_fill()

for petal in range(36):
    draw_box(t1)
    t1.right(10)

Save and run our program and it should create our flower exactly as it did before. You’re probably thinking “what’s the big deal, it did exactly the same thing”, and you’d be right!

Notice I renamed our turtle variable from t to t1. Why did I do this?

I’m getting ready to draw with two turtles at the same time (coming soon to a lesson near you!). Notice also the function I’ve defined, draw_box, has a t in between the parenthesis. Even though my turtle variable is defined as t1, I’m using a variable called t inside the draw_box function.

The draw_box function is defined by beginning the program line with the Python keyword def, followed by any word we’d like, parenthesis and finally a colon character ‘:’.

Just like the range(36) function, where I pass it a value of 36 so it generates 36 numbers, here I’m passing a parameter I’m calling t, and it’s using it to draw with.

Inside my for loop notice I’m calling draw_box with my newly renamed t1 variable. This is because the variable name passed to a function as a parameter has nothing to do with the variable name inside the function when it’s defined.

Notice also that all the drawing code in the draw_box function is indented. Just like the for loop this indicates these Python statements are part of the function definition for draw_box().

When our program runs the for loop calls our draw_box function 36 times, and each time it turns our turtle (t1) 10 degrees to the right.

New Turtle Drawing Functions

We’re getting ready to draw multiple flowers with multiple turtles. To do that and have them look good on the screen we’ll learn some more turtle drawing functions.

Turtle Pen Up: penup()

We can move our turtle without drawing a line by lifting our pen up. In this way we can move the turtle and no line will be drawn. To do this we use the turtle penup() function. It looks like this:

t1.penup()

Turtle Pen Down: pendown()

Once we’ve moved our turtle where we want it to be without drawing a line, we need to put the pen down again, and the turtle system provides this. We use the pendown() function. It looks like this:

t1.pendown()

Turtle Goto: goto()

We can move our turtle to a specific position on the screen using the goto() funciton. We pass x and y coordinates to the goto() function to position our turtle. One thing to be aware of is the 0, 0 coordinates are where our turtle is created (center of the screen) when we did this t1 = turtle.Turtle().

So the coordinates we pass to goto() are relative to that starting position. The goto() function looks like this to move our turtle up and to the right:

t1.goto(150, 150)

Let’s update our program and move our t1 turtle up and to the right a bit just to see how these new drawing functions work. Make your flower program look like this:

import turtle

t1 = turtle.Turtle()
t1.shape("turtle")
t1.speed(0)
t1.width(3)
t1.color("yellow", "red")

t1.penup()
t1.goto(150, 150)
t1.pendown()

def draw_box(t):
    t.begin_fill()
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.end_fill()

for petal in range(36):
    draw_box(t1)
    t1.right(10)

Save and run your program and you should see your flower, but its offset up and to the right side of the screen by 150 pixels. Those are the offsets we passed as the first and second parameter to the t1.goto(150, 150) function call.

Drawing With Multiple Turtles

We want to draw with multiple turtles, and our goal for this class is to create this image:

So far our flower drawing program is working pretty well, but can we change it even more to draw two, or perhaps more, flowers at once?

Sure we can, we’re programmers! In order to use two turtles we’ll have to create a second turtle. I’m going to call the second turtle t2 just to stay consistent. Add this to your program right below where we created our first turtle t1:

t2 = turtle.Turtle()
t2.shape("turtle")
t2.color("blue", "orange")
t2.shape("turtle")
t2.speed(0)
t2.width(3)

This creates a second turtle with a different variable name, drawing color and fill color. When we create a turtle it’s starting position is right in the center of the screen, so our second turtle starts out right in the middle of the screen.

Let’s move it left and down so t1 and t2 don’t draw on top of each other. Add these lines for turtle t2 under the same lines for t1:

t2.penup()
t2.penup(-150, -150)
t2.pendown()

Houston We Have a Problem

At this point our program should look like this:

import turtle

t1 = turtle.Turtle()
t1.shape("turtle")
t1.speed(0)
t1.width(3)
t1.color("yellow", "red")

t2 = turtle.Turtle()
t2.shape("turtle")
t2.speed(0)
t2.width(3)
t2.color("blue", "orange")

t1.penup()
t1.goto(150, 150)
t1.pendown()

t2.penup()
t2.goto(-150, -150)
t2.pendown()

def draw_box(t):
    t.begin_fill()
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.end_fill()

for petal in range(36):
    draw_box(t1)
    t1.right(10)

If save our program and run it our turtle screen looks like this:

Where’s The Second Flower?

When you get your program running you’ll notice the second turtle didn’t draw a flower. Why not? Well, we didn’t tell it to draw anything, so it just waited around while the first turtle drew a flower.

How do we get it to draw it’s own flower? We add it to the for loop. Our updated program now looks like this:

import turtle

t1 = turtle.Turtle()
t1.shape("turtle")
t1.speed(0)
t1.width(3)
t1.color("yellow", "red")

t2 = turtle.Turtle()
t2.shape("turtle")
t2.speed(0)
t2.width(3)
t2.color("blue", "orange")

t1.penup()
t1.goto(150, 150)
t1.pendown()

t2.penup()
t2.goto(-150, -150)
t2.pendown()

def draw_box(t):
    t.begin_fill()
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.end_fill()

for petal in range(36):
    draw_box(t1)
    t1.right(10)

    draw_box(t2)
    t2.right(10)

Just by adding two lines we got our second turtle t2 to draw its own complete flower. This is a definite win for laziness. All we had to do was add a couple Python statements to draw a complete second flower!

By setting things up and using a function we are able to build more and more interesting programs. In fact we could keep going and add more and more turtles to fill the screen with flowers and all we’d have to do is create more turtles and add statements to our for loop.

But this is starting to look like when we were adding flower petals to start with. Can we be even lazier and organize things differently to handle multiple turtles better? Yes of course, we can use something Python calls lists.

Grouping Things With Lists

Lists are a way of grouping things together so we can work with them all at once. They’re a handy way of keeping things grouped together and giving that group a name. There’s nothing magical about this, we can create lists easily with Python. If we enter these statements in the interactive window:

>>> my_list = [4, 2, 3, 0]
>>> print(my_list)
[4, 2, 3, 0]

We created a variable we called my_list containing the list [4, 2, 3, 0].

You can see the things in the list don’t have to be in order. Lists are created by surrounding a set of things separated by commas with the [ and ] characters at either end.

We can use a list to organize our turtles. Let’s create a list of turtles like this in our program:

turtles = [t1, t2]

This creates a variable called turtles that is a list containing our two turtles. Now we can create a new for loop that gets a turtle from our turtles list one at a time and draws with it. We do this with these Python statements:

for a_turtle in turtles:
    draw_box(a_turtle)
    a_turtle.right(10)

We’re using a for loop to get each turtle one at a time from our turtles list, assigning it to the variable a_turtle and calling draw_box(a_turtle) and a_turtle.right(10) with that variable.

If we put this inside our main for loop, it will be called for each petal the main for loop wants to draw.

We can now add a third turtle easily by creating a new turtle and adding it to the turtles list.

Let’s do that in our updated, three turtle program. I’ve added comments to describe what’s going on:

import turtle

# Create our t1 turtle
t1 = turtle.Turtle()
t1.shape("turtle")
t1.speed(0)
t1.width(3)
t1.color("yellow", "red")

# Create our t2 turtle
t2 = turtle.Turtle()
t2.shape("turtle")
t2.speed(0)
t2.width(3)
t2.color("blue", "orange")

# Create our t3 turtle
t3 = turtle.Turtle()
t3.shape("turtle")
t3.speed(0)
t3.width(3)
t3.color("red", "blue")

# Move t1 to its starting position
t1.penup()
t1.goto(150, 150)
t1.pendown()

# Move t2 to its starting position
t2.penup()
t2.goto(-150, -150)
t2.pendown()

# Move t3 to its starting position
t3.penup()
t3.goto(-150, 150)
t3.pendown()

# Define our draw_box function
def draw_box(t):
    t.begin_fill()
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.end_fill()

# Create our list of turtles
turtles = [t1, t2, t3]

# Create our for loop for 36 petals of the flower
for petal in range(36):

    # Create our for loop to draw a flower petal with
    # each turtle in the turtles list
    for a_turtle in turtles:

        # Draw and rotate each turtle
        draw_box(a_turtle)
        a_turtle.right(10)

I created a third turtle called t3 and just added t3 to the turtles list. Notice that our main for loop didn’t change, as far as it’s concerned, it’s just looping 36 times.

The inner for loop is responsible for calling the draw_box() function with each turtle variable, and then turning that turtle right 10 degrees. Here’s what the output of the program looks like:

Conclusion

Congratulations, you’re a multi-turtle genius now! You saw how to use Python lists to help us get multiple turtles drawing on the screen. We could keep adding turtles to our hearts content and the program would faithfully make each turtle draw our flower. This worked very well drawing the well controlled structure of the flower.

In the next class in this series you’ll teach our program how to make decisions and do things on its own:

Let’s Program with Python: Conditionals and “if” Statements (Part 3)

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Sets and Multisets in Python

Tue, 25 Apr 2017 00:00:00 GMT

Sets and Multisets in Python

How to implement mutable and immutable set and multiset (bag) data structures in Python using built-in data types and classes from the standard library.

A set is an unordered collection of objects that does not allow duplicate elements. Typically sets are used to quickly test a value for membership in the set, to insert or delete new values from a set, and to compute the union or intersection of two sets.

In a “proper” set implementation, membership tests are expected to run in O(1) time. Union, intersection, difference, and subset operations should take O(n) time on average. The set implementations included in Python’s standard library follow these performance characteristics.

Just like dictionaries, sets get special treatment in Python and have some syntactic sugar that makes it easier to create sets. For example, the curly-braces set expression syntax and set comprehensions allow you to conveniently define new set instances:

vowels = {'a', 'e', 'i', 'o', 'u'}
squares = {x * x for x in range(10)}

Careful: To create an empty set you’ll need to call the set() constructor, as using empty curly-braces ({}) is ambiguous and will create a dictionary instead.

Python and its standard library provide the following set implementations:

✅ The set Built-in

The built-in set implementation in Python. The set type in Python is mutable and allows the dynamic insertion and deletion of elements. Python’s sets are backed by the dict data type and share the same performance characteristics. Any hashable object can be stored in a set.

>>> vowels = {'a', 'e', 'i', 'o', 'u'}
>>> 'e' in vowels
True

>>> letters = set('alice')
>>> letters.intersection(vowels)
{'a', 'e', 'i'}

>>> vowels.add('x')
>>> vowels
{'i', 'a', 'u', 'o', 'x', 'e'}

>>> len(vowels)
6

✅ The frozenset Built-in

An immutable version of set that cannot be changed after it was constructed. Frozensets are static and only allow query operations on their elements (no inserts or deletions.) Because frozensets are static and hashable they can be used as dictionary keys or as elements of another set.

>>> vowels = frozenset({'a', 'e', 'i', 'o', 'u'})
>>> vowels.add('p')
AttributeError: "'frozenset' object has no attribute 'add'"

✅ The collections.Counter Class

The collections.Counter class in the Python standard library implements a multiset (or bag) type that allows elements in the set to have more than one occurrence.

This is useful if you need to keep track not only if an element is part of a set but also how many times it is included in the set.

>>> from collections import Counter
>>> inventory = Counter()

>>> loot = {'sword': 1, 'bread': 3}
>>> inventory.update(loot)
>>> inventory
Counter({'bread': 3, 'sword': 1})

>>> more_loot = {'sword': 1, 'apple': 1}
>>> inventory.update(more_loot)
>>> inventory
Counter({'bread': 3, 'sword': 2, 'apple': 1})

Careful with counting the number of elements in a Counter object. Calling len() returns the number of unique elements in the multiset, whereas the total number of elements must be retrieved slightly differently:

>>> len(inventory)
3  # Unique elements
>>> sum(inventory.values())
6  # Total no. of elements

📺🐍 Learn More With This Video Tutorial

I recorded a step-by-step video tutorial to go along with the article. See how sets work in general and how to use them in Python. Watch the video embedded below or on my YouTube channel:

Read the full “Fundamental Data Structures in Python” article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Let’s Program with Python: Statements, Variables, and Loops (Part 1)

Thu, 20 Apr 2017 00:00:00 GMT

Let’s Program with Python: Statements, Variables, and Loops (Part 1)

In this four-part introduction for new programmers you’ll learn the basics of programming with Python using step-by-step descriptions and graphical examples.

Looking for the rest of the “Let’s Program with Python” series? Here you go:

Part 1: Statements, Variables, and Loops (This article)
Part 2: Functions and Lists
Part 3: Conditionals and “if” Statements
Part 4: Reacting to User Input

Table of Contents – Part 1

What is Python?
Natural Language vs. Formal Language
Elements of Programming
Enough of That, Let’s Write Some Python!
Statements in Python
Creating Python Program Files
Saving and Running a Python Program
Variables in Python
Let’s Get Back to Drawing!
Loops in Python
Conclusion

What is Python?

Since you’re reading this I’m hoping you’re interested in learning how to program in Python.

Python is a programming language, which means it’a a language both people and computers can understand. A computer language is a formal subset of an natural language, like English. A computer language lets people express what they want a computer to do, and tells a computer how to do it.

A computer program is a set of instructions written in a particular computer language. There are lots of different computer languages in the world, most were created to solve certain kinds of problems in different ways, and most over lap in the kinds of things they can do.

Python was developed by a Dutch software engineer named Guido van Rossum, who created the language to solve some problems he saw in computer languages of the time.

Python draws from a lot of good ideas in other languages and pulls them together in one place. Python is a pretty easy computer language to learn, and yet is very powerful. The name Python comes from Guido’s favorite comedy group, Monty Python’s Flying Circus.

This course uses Python 3.6.1, but the examples should work with any version of Python 3 and greater.

Natural Language vs. Formal Language

English is a natural language that’s evolved over time to help us talk with each other. It has a big vocabulary, lots of multiple meanings and depends a lot on how it’s used to make the meaning clear.

Natural languages work well for people because we fill in the gaps where needed. This kind of language fails completely for computers because they need exact instructions in order to run. Formal languages (all programming languages) have limited vocabularies and almost no multiple meanings.

Let’s take an English example that’s something like a “program” for a person, how to make scrambled eggs:

1. Place a frying pan on the stove burner
2. Turn the burner to medium
3. Melt butter in the pan
4. Crack two eggs into pan
5. Stir the eggs to cook and scramble them
6. When finished, serve the eggs on a plate

If the steps above are followed in order, someone should be able to make scrambled eggs. This simple set of steps describe how to perform a task. A computer program is very much the same, a set of steps telling a computer how to perform a task.

Elements of Programming

As you learn to program you’ll find you need to do certain things to make the program do what you want: how to make the computer do something, remember things, doing things over and over and make decisions. Almost all programming languages provide ways to do these four basic things, and they’re known as:

Statements: the things a program can do, like performing calculations, drawing on the screen, etc.
Variables: these are the “things” (information) you want your program to work on and remember
Loops: doing things over and over again very quickly
Conditionals: these are choices a program can make about what to do, this is what makes programs “appear” smart.

We’ll make use of these four things as we go along.

Enough of That, Let’s Write Some Python!

Our goal is to create a Python program that will draw an image on our computer screen. The image we’re going to create looks something like a flower, and we’re going to learn how to use Python to create it. The end results will look like this:

So how do we create a Python program? There are two ways to work with Python; working with it directly, and creating Python program files.

This is where we can use the tool called Idle. Idle is a program that lets you both work with Python directly and create Python program files.

So let’s start Idle. When you installed Python you should also have gotten the Idle program installed, if so, let’s start it up!

Starting Idle should give you a window that looks something like this:

This window provides a Python command prompt (hit return a couple of times if you don’t see it) that allows you to run Python statements line by line.

This is called interactive mode as it allows us to ‘interact’ with Python. The command prompt in interactive mode looks like this:

>>>

Its at this prompt where you enter Python statements to try things out.

Statements in Python

Statements are the program commands that make the computer do something in a Python program. Statements can be as simple or as complicated as we’d like to make them.

Here are some examples:

>>> print("Hello there")
Hello there

>>> print(12)
12

>>> 12 * 3
36

>>> 12 / 3
4.0

The above statements print out a welcome string, perform some basic math and Python is responding. What we’ve done above is enter some Python statements at our command prompt, and Python ran them.

Creating Python Program Files

Working with interactive mode, is great for trying things out with Python. However, we want to create a Python program we can run over and over again without having to re-type it every time.

This is where creating a Python program and saving it as a file is very handy. Python program files are just like any other text file, but usually have the extension “.py”.

We can create a Python program file in Idle by clicking the File → New Window menu item. This opens up a new, empty window that is a simple text editor.

You’ll notice there is no >>> Python command prompt in the window. This is because in the file window we’re not interacting with Python directly, we’re creating a Python program file.

Let’s use our new Python program file to create our first Python program.

The “Turtle” Graphics Module

Python comes with a large library of modules that let us do some interesting things, and one of those modules is called turtle.

The turtle module is a nice tool for drawing graphics on the screen. You can find the turtle graphics documentation here.

The turtle module is based on the idea of a “turtle” on the screen that draws a line as it moves around, as if it had a marker taped to it’s shell.

In order to use the turtle module we have to “import” it into our Python program. Importing a module adds the features and capabilities of that module to our Python program.

To import the turtle module add this line to our Python program:

import turtle

Drawing With Turtle

Once the turtle module is available to us we can use it to draw things with a turtle. Enter the following lines into our program:

t = turtle.Turtle()
t.shape("turtle")
t.forward(100)

Saving and Running a Python Program

Once you’ve got this entered, let’s run the program. To do that we have to save the file first, which we can do from the File → Save menu selection.

Give our program a name and save it to a directory on the hard disk where you can find it again.

To run the program select Run → Run Module. If your program runs without any errors (which usually means you have a typo in your program), a window will open up with a turtle shape at the end of a short line.

That window should look something like this:

This is what our program told Python to do, use the turtle module to create a turtle we’re calling t, change it’s shape to look like a ‘turtle’ and move it forward 100 pixels.

Our turtle, t, is the first variable we’ve created with Python in our program.

Variables in Python

In Python things like our turtle t are represented by variables. Variables let us give a name to something so you and the program can remember it and use it later.

For instance, here’s a variable assignment:

x = 10

This looks a lot like math, and that’s actually where the idea of assigning variables came from.

This simple Python statement assigns the number 10 to a variable called x. The equal sign (=) in the line above creates the variable x and assigns it a value.

In our program we’ve done this by using the turtle module to create a turtle (the Python statement turtle.Turtle()) and assigned the results, a turtle object, to a variable we called t.

Let’s Get Back to Drawing!

Let’s add some more statements to our program to make it draw some more. Let’s make our Python program look like this:

import turtle

t = turtle.Turtle()
t.shape("turtle")
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.color("red")
t.forward(100)
t.right(90)

When you save and run our program the screen your turtle is drawing on should look like this:

So what’s going on here? What we’ve done is given our turtle a set of commands (Python program statements), and it has run them. Here’s what the statements we entered were doing:

Line 1: import the turtle module so our program can use it
Line 3: use the turtle module to create our turtle, t
Line 4: change the shape of our turtle to look like a turtle
Line 5: from where the turtle is, move forward 100 pixels
Line 6: from where the turtle is, turn right 90 degrees, a right angle
Line 7: from where the turtle is, move forward 100 pixels
Line 8: from where the turtle is, turn right 90 degrees, a right angle
Line 9: from where the turtle is, move forward 100 pixels
Line 10: from where the turtle is, turn right 90 degrees, a right angle
Line 11: change the color used by the turtle to red
Line 12: from where the turtle is, move forward 100 pixels
Line 13: from where the turtle is, turn right 90 degrees, a right angle. This brings our turtle back to its original starting position.

These statements made the turtle draw a box with the last side of the box drawn in red. You can see something interesting about drawing with our turtle; what it draws is based on where it is on the screen and which way it’s headed.

Let’s learn some more Python statements to draw with our turtles.

Turtle Speed: To make our turtle draw faster we use the turtle speed() method. To use this we’ll add this statement to our program:

t.speed(0)

The number 0 between the parenthesises is called a parameter, which is being given to the turtle’s speed() method, making our turtle draw as fast as it can.

Turtle Line Width: We can make our turtle draw with a thicker line, making it easier to see on screen. We do this with the turtle width() method. We can pass a parameter to the width method, expressing a value in pixels. So for example adding this line to our program makes our turtle draw with a line 3 pixels wide:

t.width(3)

Filling-in Shapes: We can also fill a shape (like our box) with color using two other turtle methods, begin_fill() and end_fill(), and by modifying our t.color() method. If we use these Python statements:

t.color("yellow", "red")
t.begin_fill()
# Draw shape
t.end_fill()

We have told our turtle to draw with “yellow” and fill in any shapes with “red”.

Then we use begin_fill() at the start of drawing a closed shape, draw our shape and then used end_fill() to fill that shape with “red”.

The following line is a Python comment:

# Draw shape

I’m using the comment here to indicate where our shape drawing code should go. Comments in Python are used to tell us, the programmers, what’s going on, but are ignored by Python.

Putting It All Together

Taking the things we’ve learned from our box drawing, let’s draw something a little more complicated by drawing a sort of flower.

We’ll do this by doing two things; draw multiple boxes, filling the box with color and turning the turtle slightly between each one using the new turtle methods we just learned about.

To draw our flower we’re going to draw multiple boxes, turning each box slightly every time. One way to do that would be to just repeat our box code over and over like this:

import turtle

t = turtle.Turtle()
t.speed(0)
t.color("yellow", "red")
t.width(3)

# Draw our first filled in box
t.begin_fill()
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.end_fill()

t.right(10)

# Draw our second filled in box
t.begin_fill()
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.end_fill()

t.right(10)

# Keep going till you've drawn your flower

This would work fine, but we would have to repeat these statements for as many petals as we want to give our flower.

One thing you should know about being a programmer is we’re very lazy and don’t like to repeat ourselves if we don’t have to.

Can Python help us not repeat ourselves? Yes, it can by letting us use a loop to repeat drawing the box multiple times.

Loops in Python

One of the elements of programming mentioned earlier is being able to create loops. Loops are statements in a programming language that allows us to repeat a set of program statements over and over in a controlled way. In our program we’d like to repeat the statements:

t.begin_fill()
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.forward(100)
t.right(90)
t.end_fill()
t.right(10)

This set of statements creates our outlined and filled in box. We’d like to repeat these statements, with a slight turn of 10 degrees each time, in order to create a flower. Creating a loop lets us do this. One of the looping statements in Python is called a “for loop”, and it’s used to create a loop that repeats a fixed number of times.

Let’s do a little math to figure out how many times we need to repeat this if we want to make a full circle when our turtle is turning right by 10 degrees after every filled in box.

In case you didn’t know already, there are 360 degrees in a full circle. That means dividing a circle of 360 degrees by 10 degrees gives us a value of 36. This means we want to repeat our box 36 times in order to create our flower.

How can we do this with Python?

The “for” loop: We want to repeat our filled in box 36 times, so we know beforehand how many times we want to loop. This is where the Python for loop comes in handy.

It is made for repeating things a known number of times. It looks like this as a Python definition statement:

for <thing> in <list of things>:
    # Everything we want to do on the loop

What does that mean? That’s a kind of formal presentation of what a for loop should look like in Python. What it means is this:

Take one <thing> from the <list of things>
End the for statement with a : character
All the statements indented under the for loop should be run every time through the loop. Indentation is very important to Python
Go back to the start of the for loop, get another <thing> from <list_of_things>
Keep doing this till there are no more things in <list_of_things>

This sounds a lot harder than it actually is. For our program we’ll use a for loop that looks like this:

for petal in range(36):
    t.begin_fill()
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.end_fill()
    t.right(10)

Our box code is in there, under the for loop, but what’s that strange looking range(36) thing at a the end of our for loop where the <list of things> should go?

The range(36) thing is what provides the <list_of_things> from our formal definition of a for loop. Let’s talk about that.

The “range” function: Let’s jump back over to our Idle interactive window for a minute and enter this statement:

>>> range(36)
range(0, 36)

Python responds by running this code and prints out what looks like it just told us about itself. What does this mean?

To Python the range(36) function is going to provide 36 things when used inside a for loop. Each time through the for loop Python will take a value from the range defined (0 to 35) and assign it to a variable, in our case that variable is called petal.

It will continue this until there are no values left in the range. The way we’ve set things up it will loop 36 times, which is what we want. In this case we’re not using the petal variable, but it’s required by Python to create a correct for loop.

To create our flower using a for loop make our program look like this:

import turtle

t = turtle.Turtle()
t.speed(0)
t.color("yellow", "red")
t.width(3)

for petal in range(36):
    t.begin_fill()
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.forward(100)
    t.right(90)
    t.end_fill()
    t.right(10)

Let’s go through this line by line.

Line 1: import our turtle module
Line 3: create our turtle object and use our variable t to keep track of it
Line 4: set our turtle drawing speed to fast
Line 5: tell our turtle to draw in “yellow” and fill in shapes with “red”
Line 6: set our turtle drawing width to 3 pixels
Line 8: begin our for loop and tell it to loop 36 times
Line 9-19: draw our box and then turn slightly right 10 degrees.

Notice how lines 9 through 19 are indented under the for loop. This is important as it tells Python all these lines are part of the for loop. In Python the indentation is required to tell the program a set of statements are part of a block like this.

Once you’ve got this entered, save our program and then run it. If your program runs without any syntax errors (which usually means you have a typo in your program), you should get a window like this:

Conclusion

Congratulations, you’ve written your first colorful, interesting Python program! In the next class in this series you’ll learn how to write reusable “code building blocks” with functions:

Let’s Program with Python: Functions and Lists (Part 2)

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Dictionaries, Maps, and Hash Tables in Python

Tue, 18 Apr 2017 00:00:00 GMT

Dictionaries, Maps, and Hash Tables in Python

Need a dictionary, map, or hash table to implement an algorithm in your Python program? Read on to see how the Python standard library can help you.

In Python, dictionaries (or “dicts”, for short) are a central data structure:

Dicts store an arbitrary number of objects, each identified by a unique dictionary key. Dictionaries are often also called maps, hashmaps, lookup tables, or associative arrays. They allow the efficient lookup, insertion, and deletion of any object associated with a given key.

To give a more practical explanation—phone books are a decent real-world analog for dictionaries:

Phone books allow you to quickly retrieve the information (phone number) associated with a given key (a person’s name). Instead of having to read a phonebook front to back in order to find someone’s number you can jump more or less directly to a name and look up the associated number.

This analogy breaks down somewhat when it comes to how the information is organized to allow for fast lookups. But the fundamental performance characteristics hold:

Dictionaries allow you to quickly find the information associated with a given key.

Python Dictionaries, Hashmaps, and Hash Tables

The dictionary abstract data type is one of the most frequently used and most important data structures in computer science. Because of this importance Python features a robust dictionary implementation as one of its built-in data types (dict).

Python even provides some useful syntactic sugar for working with dictionaries in your programs. For example, the curly-braces dictionary expression syntax ({}) and dictionary comprehensions allow you to conveniently define new dictionaries:

phonebook = {
    'bob': 7387,
    'alice': 3719,
    'jack': 7052,
}

squares = {x: x * x for x in range(10)}

Python’s dictionaries are indexed by keys that can be of any hashable type. A hashable object has a hash value which never changes during its lifetime (see __hash__), and it can be compared to other objects (see __eq__).

In addition, hashable objects which compare equal must have the same hash value. Immutable types like strings and numbers work well as dictionary keys. You can also use tuples as dictionary keys as long as they contain only hashable types themselves.

✅ Built-in dict type

For most use cases you’ll face Python’s built-in dictionary implementation will do everything you need. Dictionaries are highly optimized and underlie many parts of the language, for example class attributes and variables in a stack frame are both stored internally in dictionaries.

Python dictionaries are based on a well-tested and finely tuned hash table implementation that provides the performance characteristics you’d expect: O(1) time complexity for lookup, insert, update, and delete operations in the average case.

There’s little reason to not use the standard dict implementation included with Python. However, specialized third-party dictionary data structures exist, for example skip lists or B-tree based dictionary implementations.

>>> phonebook = {'bob': 7387, 'alice': 3719, 'jack': 7052}
>>> phonebook['alice']
3719

Interestingly, Python ships with a number of specialized dictionary implementations in its standard library. These specialized dictionaries are all based on the built-in dictionary implementation (and share its performance characteristics) but add some convenience features:

✅ collections.OrderedDict – Remember the insertion order of keys

A dictionary subclass that remembers the insertion order of keys added to the collection.

While standard dict instances preserve the insertion order of keys in CPython 3.6+ this is just a side effect of the CPython implementation and not defined in the language spec. If key order is important for your algorithm to work it’s best to communicate this clearly by using the OrderDict class.

OrderedDict is not a built-in part of the core language and must be imported from the collections module in the standard library.

>>> import collections
>>> d = collections.OrderedDict(one=1, two=2, three=3)

>>> d
OrderedDict([('one', 1), ('two', 2), ('three', 3)])

>>> d['four'] = 4
>>> d
OrderedDict([('one', 1), ('two', 2), ('three', 3), ('four', 4)])

>>> d.keys()
odict_keys(['one', 'two', 'three', 'four'])

✅ collections.defaultdict – Return default values for missing keys

Another dictionary subclass that accepts a default value in its constructor that will be returned if a requested key cannot be found in a defaultdict instance. This can save some typing and make the programmer’s intention more clear compared to using the get() methods or catching a KeyError exception in regular dictionaries.

>>> from collections import defaultdict
>>> dd = defaultdict(list)

# Accessing a missing key creates it and initializes it
# using the default factory, i.e. list() in this example:
>>> dd['dogs'].append('Rufus')
>>> dd['dogs'].append('Kathrin')
>>> dd['dogs'].append('Mr Sniffles')

>>> dd['dogs']
['Rufus', 'Kathrin', 'Mr Sniffles']

✅ collections.ChainMap – Search multiple dictionaries as a single mapping

This data structure groups multiple dictionaries into a single mapping. Lookups search the underlying mappings one by one until a key is found. Insertions, updates, and deletions only affect the first mapping added to the chain.

>>> from collections import ChainMap
>>> dict1 = {'one': 1, 'two': 2}
>>> dict2 = {'three': 3, 'four': 4}
>>> chain = ChainMap(dict1, dict2)

>>> chain
ChainMap({'one': 1, 'two': 2}, {'three': 3, 'four': 4})

# ChainMap searches each collection in the chain
# from left to right until it finds the key (or fails):
>>> chain['three']
3
>>> chain['one']
1
>>> chain['missing']
KeyError: 'missing'

✅ types.MappingProxyType – A wrapper for making read-only dictionaries

A wrapper around a standard dictionary that provides a read-only view into the wrapped dictionary’s data. This class was added in Python 3.3 and it can be used to create immutable proxy versions of dictionaries.

>>> from types import MappingProxyType
>>> read_only = MappingProxyType({'one': 1, 'two': 2})

>>> read_only['one']
1
>>> read_only['one'] = 23
TypeError: "'mappingproxy' object does not support item assignment"

Using Dictionaries in Python: Conclusion

All of the Python hashmap implementations I listed in this tutorial are valid implementations built into the Python standard library.

If you’re looking for a general recommendation on which mapping type to use in your Python programs, then I’d point you to the built-in dict data type. It’s a versatile and optimized dictionary implementation that’s built directly into the core language.

Only if you have special requirements that go beyond what’s provided by dict would I recommend that you use one of the other data types listed here. Yes, I still believe they’re valid options—but usually your code will be clearer and easier to maintain by other developers if it relies on standard Python dictionaries most of the time.

Read the full “Fundamental Data Structures in Python” article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Priority Queues in Python

Wed, 12 Apr 2017 00:00:00 GMT

Priority Queues in Python

What are the various ways you can implement a priority queue in Python? Read on and find out what the Python standard library has to offer.

A priority queue is a container data structure that manages a set of records with totally-ordered keys (for example, a numeric weight value) to provide quick access to the record with the smallest or largest key in the set.

You can think of a priority queue as a modified queue: instead of retrieving the next element by insertion time, it retrieves the highest-priority element. The priority of individual elements is decided by the ordering applied to their keys.

Priority queues are commonly used for dealing with scheduling problems. For example, to give precedence to tasks with higher urgency.

For example, let’s take an operating system task scheduler—ideally, high-priority tasks on the system (e.g. playing a real-time game) should take precedence over lower-priority tasks (e.g. downloading updates in the background). By organizing pending tasks in a priority queue that uses the task urgency as the key, the task scheduler can allow the highest-priority tasks to run first.

Let’s take a look at a few options for how you can implement Priority Queues in Python using built-in data structures or data structures that ship with Python’s standard library. They each have their up- and downsides, but in my mind there’s a clear winner for most common scenarios. But see for yourself:

⛔ Keeping a Manually Sorted List

You can use a sorted list to quickly identify and delete the smallest or largest element. The downside is that inserting new elements into a list is a slow O(n) operation.

While the insertion point can be found in O(log n) time using bisect.insort in the standard library, this is always dominated by the slow insertion step.

Maintaining the order by appending to the list and re-sorting also takes at least O(n log n) time.

Therefore sorted lists are only suitable when there will be few insertions into the priority queue.

q = []

q.append((2, 'code'))
q.append((1, 'eat'))
q.append((3, 'sleep'))

# NOTE: Remember to re-sort every time
#       a new element is inserted, or use
#       bisect.insort().
q.sort(reverse=True)

while q:
    next_item = q.pop()
    print(next_item)

# Result:
#   (1, 'eat')
#   (2, 'code')
#   (3, 'sleep')

✅ The heapq Module

This is a binary heap implementation usually backed by a plain list and it supports insertion and extraction of the smallest element in O(log n) time.

This module is a good choice for implementing priority queues in Python. Because heapq technically only provides a min-heap implementation, extra steps must be taken to ensure sort stability and other features typically expected from a “practical” priority queue.

import heapq

q = []

heapq.heappush(q, (2, 'code'))
heapq.heappush(q, (1, 'eat'))
heapq.heappush(q, (3, 'sleep'))

while q:
    next_item = heapq.heappop(q)
    print(next_item)

# Result:
#   (1, 'eat')
#   (2, 'code')
#   (3, 'sleep')

✅ The queue.PriorityQueue Class

This priority queue implementation uses heapq internally and shares the same time and space complexities.

The difference is that PriorityQueue is synchronized and provides locking semantics to support multiple concurrent producers and consumers.

Depending on your use case this might be helpful, or just incur unneeded overhead. In any case you might prefer its class-based interface over using the function-based interface provided by heapq.

from queue import PriorityQueue

q = PriorityQueue()

q.put((2, 'code'))
q.put((1, 'eat'))
q.put((3, 'sleep'))

while not q.empty():
    next_item = q.get()
    print(next_item)

# Result:
#   (1, 'eat')
#   (2, 'code')
#   (3, 'sleep')

A good default choice: `queue.PriorityQueue`

Now which priority queue implementation should you use in your Python programs? They each have slightly different use cases. But in my mind queue.PriorityQueue is a good default choice.

Sure, it might incur some unnecessary locking overhead—but it has a nice object-oriented interface and a name that states its intent clearly.

Read the full “Fundamental Data Structures in Python” article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Fundamental Data Structures in Python

Tue, 11 Apr 2017 00:00:00 GMT

Fundamental Data Structures in Python

In this article series we’ll take a tour of some fundamental data structures and implementations of abstract data types (ADTs) available in Python’s standard library.

Data structures are the fundamental constructs around which you build your applications. Each data structure provides a particular way of organizing data so it can be accessed efficiently, depending on the use case at hand.

Python ships with an extensive set of data structures in its standard library. However due to naming differences it’s often unclear how even well-known “abstract data types” correspond to a specific implementation in Python.

Other languages like Java stick to a more “computer sciencey” and explicit naming scheme for their standard data structures. For example, a list isn’t just a “list” in Java—it’s either a LinkedList or an ArrayList. This makes it easier to recognize the computational complexity of these types.

Python favors a simpler and more “human” naming scheme. The downside is that to a Python initiate it’s unclear whether the built-in list type is implemented as a linked list or a dynamic array.

My goal with this article series is to clarify how the most common abstract data types map to Python’s naming scheme and to provide a brief description for each. This information will also help you in Python coding interviews.

If you’re looking for a good book to brush up your data structures knowledge I highly recommend Steven S. Skiena’s The Algorithm Design Manual.

It strikes a great balance between teaching you fundamental (and more advanced) data structures and then showing you how to put them to practical use in various algorithms. Steve’s book was a great help in the writing of this series.

Alright, let’s get started. This article serves as the “hub” for the individual data structure tutorials that I’ll link in the list below:

Python Data Structures Tutorials

By the way, I’m always looking to improve these tutorials so if you find an error or would like to suggest an addition—please leave a comment on the article or reach out to me via email or Twitter.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python Decorators: A Step-By-Step Introduction

Tue, 04 Apr 2017 00:00:00 GMT

Python Decorators: A Step-By-Step Introduction

Understanding decorators is a milestone for any serious Python programmer. Here’s your step-by-step guide to how decorators can help you become a more efficient and productive Python developer.

Python’s decorators allow you to extend and modify the behavior of a callable (functions, methods, and classes) without permanently modifying the callable itself.

Any sufficiently generic functionality you can “tack on” to an existing class or function’s behavior makes a great use case for decoration. This includes:

logging,
enforcing access control and authentication,
instrumentation and timing functions,
rate-limiting,
caching; and more.

Why Should I Master Decorators in Python?

That’s a fair question. After all, what I just mentioned sounded quite abstract and it might be difficult to see how decorators can benefit you in your day-to-day work as a Python developer. Here’s an example:

Imagine you’ve got 30 functions with business logic in your report-generating program. One rainy Monday morning your boss walks up to your desk and says:

“Happy Monday! Remember those TPS reports? I need you to add input/output logging to each step in the report generator. XYZ Corp needs it for auditing purposes. Oh, and I told them we can ship this by Wednesday.”

Depending on whether or not you’ve got a solid grasp on Python’s decorators, this request will either send your blood pressure spiking—or leave you relatively calm.

Without decorators you might be spending the next three days scrambling to modify each of those 30 functions and clutter them up with manual logging calls. Fun times.

If you do know your decorators, you’ll calmly smile at your boss and say:

“Don’t worry Jim, I’ll get it done by 2pm today.”

Right after that you’ll type the code for a generic @audit_log decorator (that’s only about 10 lines long) and quickly paste it in front of each function definition. Then you’ll commit your code and grab another cup of coffee.

I’m dramatizing here. But only a little. Decorators can be that powerful 🙂

I’d go as far as to say that understanding decorators is a milestone for any serious Python programmer. They require a solid grasp of several advanced concepts in the language—including the properties of first-class functions.

But:

Understanding Decorators Is Worth It 💡

The payoff for understanding how decorators work in Python is huge.

Sure, decorators are relatively complicated to wrap your head around for the first time—but they’re a highly useful feature that you’ll often encounter in third-party frameworks and the Python standard library.

Explaining decorators is also a make or break moment for any good Python tutorial. I’ll do my best here to introduce you to them step by step.

Before you dive in, now would be an excellent moment to refresh your memory on the properties of first-class functions in Python. I wrote a tutorial on them here on dbader.org and I would encourage you to take a few minutes to review it. The most important “first-class functions” takeaways for understanding decorators are:

Functions are objects—they can be assigned to variables and passed to and returned from other functions; and
Functions can be defined inside other functions—and a child function can capture the parent function’s local state (lexical closures.)

Alright, ready to do this? Let’s start with some:

Python Decorator Basics

Now, what are decorators really? They “decorate” or “wrap” another function and let you execute code before and after the wrapped function runs.

Decorators allow you to define reusable building blocks that can change or extend the behavior of other functions. And they let you do that without permanently modifying the wrapped function itself. The function’s behavior changes only when it’s decorated.

Now what does the implementation of a simple decorator look like? In basic terms, a decorator is a callable that takes a callable as input and returns another callable.

The following function has that property and could be considered the simplest decorator one could possibly write:

def null_decorator(func):
    return func

As you can see, null_decorator is a callable (it’s a function), it takes another callable as its input, and it returns the same input callable without modifying it.

Let’s use it to decorate (or wrap) another function:

def greet():
    return 'Hello!'

greet = null_decorator(greet)

>>> greet()
'Hello!'

In this example I’ve defined a greet function and then immediately decorated it by running it through the null_decorator function. I know this doesn’t look very useful yet (I mean we specifically designed the null decorator to be useless, right?) but in a moment it’ll clarify how Python’s decorator syntax works.

Instead of explicitly calling null_decorator on greet and then reassigning the greet variable, you can use Python’s @ syntax for decorating a function in one step:

@null_decorator
def greet():
    return 'Hello!'

>>> greet()
'Hello!'

Putting an @null_decorator line in front of the function definition is the same as defining the function first and then running through the decorator. Using the @ syntax is just syntactic sugar, and a shortcut for this commonly used pattern.

Note that using the @ syntax decorates the function immediately at definition time. This makes it difficult to access the undecorated original without brittle hacks. Therefore you might choose to decorate some functions manually in order to retain the ability to call the undecorated function as well.

So far, so good. Let’s see how:

Decorators Can Modify Behavior

Now that you’re a little more familiar with the decorator syntax, let’s write another decorator that actually does something and modifies the behavior of the decorated function.

Here’s a slightly more complex decorator which converts the result of the decorated function to uppercase letters:

def uppercase(func):
    def wrapper():
        original_result = func()
        modified_result = original_result.upper()
        return modified_result
    return wrapper

Instead of simply returning the input function like the null decorator did, this uppercase decorator defines a new function on the fly (a closure) and uses it to wrap the input function in order to modify its behavior at call time.

The wrapper closure has access to the undecorated input function and it is free to execute additional code before and after calling the input function. (Technically, it doesn’t even need to call the input function at all.)

Note how up until now the decorated function has never been executed. Actually calling the input function at this point wouldn’t make any sense—you’ll want the decorator to be able to modify the behavior of its input function when it gets called eventually.

Time to see the uppercase decorator in action. What happens if you decorate the original greet function with it?

@uppercase
def greet():
    return 'Hello!'

>>> greet()
'HELLO!'

I hope this was the result you expected. Let’s take a closer look at what just happened here. Unlike null_decorator, our uppercase decorator returns a different function object when it decorates a function:

>>> greet
<function greet at 0x10e9f0950>

>>> null_decorator(greet)
<function greet at 0x10e9f0950>

>>> uppercase(greet)
<function uppercase.<locals>.wrapper at 0x10da02f28>

And as you saw earlier, it needs to do that in order to modify the behavior of the decorated function when it finally gets called. The uppercase decorator is a function itself. And the only way to influence the “future behavior” of an input function it decorates is to replace (or wrap) the input function with a closure.

That’s why uppercase defines and returns another function (the closure) that can then be called at a later time, run the original input function, and modify its result.

Decorators modify the behavior of a callable through a wrapper so you don’t have to permanently modify the original. The callable isn’t permanently modified—its behavior changes only when decorated.

This let’s you “tack on” reusable building blocks, like logging and other instrumentation, to existing functions and classes. It’s what makes decorators such a powerful feature in Python that’s frequently used in the standard library and in third-party packages.

⏰ A Quick Intermission

By the way, if you feel like you need a quick coffee break at this point—that’s totally normal. In my opinion closures and decorators are some of the most difficult concepts to understand in Python. Take your time and don’t worry about figuring this out immediately. Playing through the code examples in an interpreter session one by one often helps make things sink in.

I know you can do it 🙂

Applying Multiple Decorators to a Single Function

Perhaps not surprisingly, you can apply more than one decorator to a function. This accumulates their effects and it’s what makes decorators so helpful as reusable building blocks.

Here’s an example. The following two decorators wrap the output string of the decorated function in HTML tags. By looking at how the tags are nested you can see which order Python uses to apply multiple decorators:

def strong(func):
    def wrapper():
        return '<strong>' + func() + '</strong>'
    return wrapper

def emphasis(func):
    def wrapper():
        return '<em>' + func() + '</em>'
    return wrapper

Now let’s take these two decorators and apply them to our greet function at the same time. You can use the regular @ syntax for that and just “stack” multiple decorators on top of a single function:

@strong
@emphasis
def greet():
    return 'Hello!'

What output do you expect to see if you run the decorated function? Will the @emphasis decorator add its <em> tag first or does @strong have precedence? Here’s what happens when you call the decorated function:

>>> greet()
'<strong><em>Hello!</em></strong>'

This clearly shows in what order the decorators were applied: from bottom to top. First, the input function was wrapped by the @emphasis decorator, and then the resulting (decorated) function got wrapped again by the @strong decorator.

To help me remember this bottom to top order I like to call this behavior decorator stacking. You start building the stack at the bottom and then keep adding new blocks on top to work your way upwards.

If you break down the above example and avoid the @ syntax to apply the decorators, the chain of decorator function calls looks like this:

decorated_greet = strong(emphasis(greet))

Again you can see here that the emphasis decorator is applied first and then the resulting wrapped function is wrapped again by the strong decorator.

This also means that deep levels of decorator stacking will have an effect on performance eventually because they keep adding nested function calls. Usually this won’t be a problem in practice, but it’s something to keep in mind if you’re working on performance intensive code.

Decorating Functions That Accept Arguments

All examples so far only decorated a simple nullary greet function that didn’t take any arguments whatsoever. So the decorators you saw here up until now didn’t have to deal with forwarding arguments to the input function.

If you try to apply one of these decorators to a function that takes arguments it will not work correctly. How do you decorate a function that takes arbitrary arguments?

This is where Python’s *args and **kwargs feature for dealing with variable numbers of arguments comes in handy. The following proxy decorator takes advantage of that:

def proxy(func):
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper

There are two notable things going on with this decorator:

It uses the * and ** operators in the wrapper closure definition to collect all positional and keyword arguments and stores them in variables (args and kwargs).
The wrapper closure then forwards the collected arguments to the original input function using the * and ** “argument unpacking” operators.

(It’s a bit unfortunate that the meaning of the star and double-star operators is overloaded and changes depending on the context they’re used in. But I hope you get the idea.)

Let’s expand the technique laid out by the proxy decorator into a more useful practical example. Here’s a trace decorator that logs function arguments and results during execution time:

def trace(func):
    def wrapper(*args, **kwargs):
        print(f'TRACE: calling {func.__name__}() '
              f'with {args}, {kwargs}')

        original_result = func(*args, **kwargs)

        print(f'TRACE: {func.__name__}() '
              f'returned {original_result!r}')

        return original_result
    return wrapper

Decorating a function with trace and then calling it will print the arguments passed to the decorated function and its return value. This is still somewhat of a toy example—but in a pinch it makes a great debugging aid:

@trace
def say(name, line):
    return f'{name}: {line}'

>>> say('Jane', 'Hello, World')
'TRACE: calling say() with ("Jane", "Hello, World"), {}'
'TRACE: say() returned "Jane: Hello, World"'
'Jane: Hello, World'

Speaking of debugging—there are some things you should keep in mind when debugging decorators:

How to Write “Debuggable” Decorators

When you use a decorator, really what you’re doing is replacing one function with another. One downside of this process is that it “hides” some of the metadata attached to the original (undecorated) function.

For example, the original function name, its docstring, and parameter list are hidden by the wrapper closure:

def greet():
    """Return a friendly greeting."""
    return 'Hello!'

decorated_greet = uppercase(greet)

If you try to access any of that function metadata you’ll see the wrapper closure’s metadata instead:

>>> greet.__name__
'greet'
>>> greet.__doc__
'Return a friendly greeting.'

>>> decorated_greet.__name__
'wrapper'
>>> decorated_greet.__doc__
None

This makes debugging and working with the Python interpreter awkward and challenging. Thankfully there’s a quick fix for this: the functools.wraps decorator included in Python’s standard library.

You can use functools.wraps in your own decorators to copy over the lost metadata from the undecorated function to the decorator closure. Here’s an example:

import functools

def uppercase(func):
    @functools.wraps(func)
    def wrapper():
        return func().upper()
    return wrapper

Applying functools.wraps to the wrapper closure returned by the decorator carries over the docstring and other metadata of the input function:

@uppercase
def greet():
    """Return a friendly greeting."""
    return 'Hello!'

>>> greet.__name__
'greet'
>>> greet.__doc__
'Return a friendly greeting.'

As a best practice I’d recommend that you use functools.wraps in all of the decorators you write yourself. It doesn’t take much time and it will save you (and others) debugging headaches down the road.

Python Decorators – Key Takeaways

Decorators define reusable building blocks you can apply to a callable to modify its behavior without permanently modifying the callable itself.
The @ syntax is just a shorthand for calling the decorator on an input function. Multiple decorators on a single function are applied bottom to top (decorator stacking).
As a debugging best practice, use the functools.wraps helper in your own decorators to carry over metadata from the undecorated callable to the decorated one.

Was this tutorial helpful? Got any suggestions on how it could be improved that could help other learners? Leave a comment below and share your thoughts.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Finding and Choosing Quality Python Packages

Tue, 28 Mar 2017 00:00:00 GMT

Finding and Choosing Quality Python Packages

PyPI, the Python packaging repository, just crossed 100,000 third-party packages in total the other week. That’s an overwhelming number of packages to choose from.

The Quest for the Perfect Python Package

Back when I got “serious” about building my Python skills, mastering the syntax of the language was not the hardest part. Python’s syntax seemed quite clear and intuitive by comparison, and there was a (relatively) obvious path to learning it from books and other resources.

But when it came to Python’s tens of thousands of libraries and frameworks that was simply an overwhelming number to choose from. Memorizing them was (and still is) an impossible task.

And this feeling of overwhelm and “choice paralysis” is exactly what held me back earlier on in my Python career.

Mastering Python ≠ Mastering the Syntax

What tripped me up as a fledgling Pythonista was this: I had the basics of Python under my belt, but I struggled when it came to adopting the right workflows and tools of the “ecosystem” surrounding the core language.

Thus, I wasted time reinventing existing solutions left and right—sometimes I spent days writing my own (terrible) versions of common building blocks like config file parsers, data validators, or visualization tools.

Now, sure I learned quite a bit from doing that…

Overcoming “Reinventing the Wheel Disease”

But I kept repeating the same mistake and was “reinventing the wheel” even when under a tight deadline. In hindsight, my ignorance caused me a ton of undue stress and sleep deprivation.

Part of it was overconfidence in my abilities, and another part was a lack of experience using “bread and butter” tools like the pip package manager, virtual environments, and requirements files.

Once I got the hang of Python’s dependency management tools and workflows I was able to overcome my “reinventing the wheel disease” quickly.

Dependency Management Skills Are Key

Mastering those tools and coming up with strategies for identifying high-quality Python packages opened up a whole new world to me:

By leveraging Python’s packaging ecosystem I was suddenly coding at a higher level of abstraction—and it had a massive impact on my productivity and efficiency. Saying it allowed me to “10X” my output would not be too far off.

If you use Python and you’re wondering how to go from “writing scripts” to “building applications”—then there’s a good chance you could benefit from focusing on your dependency management skills.

You might be ready for a similar “quantum leap” in your productivity.

To discover the strategies and exact steps I used to break through this barrier, check out my new “Managing Python Dependencies” course:

Click to Learn More About “Managing Python Dependencies” →

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python’s Functions Are First-Class

Tue, 21 Mar 2017 00:00:00 GMT

Python’s Functions Are First-Class

Python’s functions are first-class objects. You can assign them to variables, store them in data structures, pass them as arguments to other functions, and even return them as values from other functions.

Grokking these concepts intuitively will make understanding advanced features in Python like lambdas and decorators much easier. It also puts you on a path towards functional programming techniques.

In this tutorial I’ll guide you through a number of examples to help you develop this intuitive understanding. The examples will build on top of one another, so you might want to read them in sequence and even to try out some of them in a Python interpreter session as you go along.

Wrapping your head around the concepts we’ll be discussing here might take a little longer than expected. Don’t worry—that’s completely normal. I’ve been there. You might feel like you’re banging your head against the wall, and then suddenly things will “click” and fall into place when you’re ready.

Throughout this tutorial I’ll be using this yell function for demonstration purposes. It’s a simple toy example with easily recognizable output:

def yell(text):
    return text.upper() + '!'

>>> yell('hello')
'HELLO!'

Functions Are Objects

All data in a Python program is represented by objects or relations between objects. Things like strings, lists, modules, and functions are all objects. There’s nothing particularly special about functions in Python.

Because the yell function is an object in Python you can assign it to another variable, just like any other object:

>>> bark = yell

This line doesn’t call the function. It takes the function object referenced by yell and creates a second name pointing to it, bark. You could now also execute the same underlying function object by calling bark:

>>> bark('woof')
'WOOF!'

Function objects and their names are two separate concerns. Here’s more proof: You can delete the function’s original name (yell). Because another name (bark) still points to the underlying function you can still call the function through it:

>>> del yell

>>> yell('hello?')
NameError: "name 'yell' is not defined"

>>> bark('hey')
'HEY!'

By the way, Python attaches a string identifier to every function at creation time for debugging purposes. You can access this internal identifier with the __name__ attribute:

>>> bark.__name__
'yell'

While the function’s __name__ is still “yell” that won’t affect how you can access it from your code. This identifier is merely a debugging aid. A variable pointing to a function and the function itself are two separate concerns.

(Since Python 3.3 there’s also __qualname__ which serves a similar purpose and provides a qualified name string to disambiguate function and class names.)

Functions Can Be Stored In Data Structures

As functions are first-class citizens you can store them in data structures, just like you can with other objects. For example, you can add functions to a list:

>>> funcs = [bark, str.lower, str.capitalize]
>>> funcs
[<function yell at 0x10ff96510>,
 <method 'lower' of 'str' objects>,
 <method 'capitalize' of 'str' objects>]

Accessing the function objects stored inside the list works like it would with any other type of object:

>>> for f in funcs:
...     print(f, f('hey there'))
<function yell at 0x10ff96510> 'HEY THERE!'
<method 'lower' of 'str' objects> 'hey there'
<method 'capitalize' of 'str' objects> 'Hey there'

You can even call a function object stored in the list without assigning it to a variable first. You can do the lookup and then immediately call the resulting “disembodied” function object within a single expression:

>>> funcs[0]('heyho')
'HEYHO!'

Functions Can Be Passed To Other Functions

Because functions are objects you can pass them as arguments to other functions. Here’s a greet function that formats a greeting string using the function object passed to it and then prints it:

def greet(func):
    greeting = func('Hi, I am a Python program')
    print(greeting)

You can influence the resulting greeting by passing in different functions. Here’s what happens if you pass the yell function to greet:

>>> greet(yell)
'HI, I AM A PYTHON PROGRAM!'

Of course you could also define a new function to generate a different flavor of greeting. For example, the following whisper function might work better if you don’t want your Python programs to sound like Optimus Prime:

def whisper(text):
    return text.lower() + '...'

>>> greet(whisper)
'hi, i am a python program...'

The ability to pass function objects as arguments to other functions is powerful. It allows you to abstract away and pass around behavior in your programs. In this example, the greet function stays the same but you can influence its output by passing in different greeting behaviors.

Functions that can accept other functions as arguments are also called higher-order functions. They are a necessity for the functional programming style.

The classical example for higher-order functions in Python is the built-in map function. It takes a function and an iterable and calls the function on each element in the iterable, yielding the results as it goes along.

Here’s how you might format a sequence of greetings all at once by mapping the yell function to them:

>>> list(map(yell, ['hello', 'hey', 'hi']))
['HELLO!', 'HEY!', 'HI!']

map has gone through the entire list and applied the yell function to each element.

Functions Can Be Nested

Python allows functions to be defined inside other functions. These are often called nested functions or inner functions. Here’s an example:

def speak(text):
    def whisper(t):
        return t.lower() + '...'
    return whisper(text)

>>> speak('Hello, World')
'hello, world...'

Now, what’s going on here? Every time you call speak it defines a new inner function whisper and then calls it.

And here’s the kicker—whisper does not exist outside speak:

>>> whisper('Yo')
NameError: "name 'whisper' is not defined"

>>> speak.whisper
AttributeError: "'function' object has no attribute 'whisper'"

But what if you really wanted to access that nested whisper function from outside speak? Well, functions are objects—you can return the inner function to the caller of the parent function.

For example, here’s a function defining two inner functions. Depending on the argument passed to top-level function it selects and returns one of the inner functions to the caller:

def get_speak_func(volume):
    def whisper(text):
        return text.lower() + '...'
    def yell(text):
        return text.upper() + '!'
    if volume > 0.5:
        return yell
    else:
        return whisper

Notice how get_speak_func doesn’t actually call one of its inner functions—it simply selects the appropriate function based on the volume argument and then returns the function object:

>>> get_speak_func(0.3)
<function get_speak_func.<locals>.whisper at 0x10ae18>

>>> get_speak_func(0.7)
<function get_speak_func.<locals>.yell at 0x1008c8>

Of course you could then go on and call the returned function, either directly or by assigning it to a variable name first:

>>> speak_func = get_speak_func(0.7)
>>> speak_func('Hello')
'HELLO!'

Let that sink in for a second here… This means not only can functions accept behaviors through arguments but they can also return behaviors. How cool is that?

You know what, this is starting to get a little loopy here. I’m going to take a quick coffee break before I continue writing (and I suggest you do the same.)

Functions Can Capture Local State

You just saw how functions can contain inner functions and that it’s even possible to return these (otherwise hidden) inner functions from the parent function.

Best put on your seat belts on now because it’s going to get a little crazier still—we’re about to enter even deeper functional programming territory. (You had that coffee break, right?)

Not only can functions return other functions, these inner functions can also capture and carry some of the parent function’s state with them.

I’m going to slightly rewrite the previous get_speak_func example to illustrate this. The new version takes a “volume” and a “text” argument right away to make the returned function immediately callable:

def get_speak_func(text, volume):
    def whisper():
        return text.lower() + '...'
    def yell():
        return text.upper() + '!'
    if volume > 0.5:
        return yell
    else:
        return whisper

>>> get_speak_func('Hello, World', 0.7)()
'HELLO, WORLD!'

Take a good look at the inner functions whisper and yell now. Notice how they no longer have a text parameter? But somehow they can still access the text parameter defined in the parent function. In fact, they seem to capture and “remember” the value of that argument.

Functions that do this are called lexical closures (or just closures, for short). A closure remembers the values from its enclosing lexical scope even when the program flow is no longer in that scope.

In practical terms this means not only can functions return behaviors but they can also pre-configure those behaviors. Here’s another bare-bones example to illustrate this idea:

def make_adder(n):
    def add(x):
        return x + n
    return add

>>> plus_3 = make_adder(3)
>>> plus_5 = make_adder(5)

>>> plus_3(4)
7
>>> plus_5(4)
9

In this example make_adder serves as a factory to create and configure “adder” functions. Notice how the “adder” functions can still access the n argument of the make_adder function (the enclosing scope).

Objects Can Behave Like Functions

Object’s aren’t functions in Python. But they can be made callable, which allows you to treat them like functions in many cases.

If an object is callable it means you can use round parentheses () on it and pass function call arguments to it. Here’s an example of a callable object:

class Adder:
    def __init__(self, n):
         self.n = n
    def __call__(self, x):
        return self.n + x

>>> plus_3 = Adder(3)
>>> plus_3(4)
7

Behind the scenes, “calling” an object instance as a function attempts to execute the object’s __call__ method.

Of course not all objects will be callable. That’s why there’s a built-in callable function to check whether an object appears callable or not:

>>> callable(plus_3)
True
>>> callable(yell)
True
>>> callable(False)
False

Key Takeaways

Everything in Python is an object, including functions. You can assign them to variables, store them in data structures, and pass or return them to and from other functions (first-class functions.)
First-class functions allow you to abstract away and pass around behavior in your programs.
Functions can be nested and they can capture and carry some of the parent function’s state with them. Functions that do this are called closures.
Objects can be made callable which allows you to treat them like functions in many cases.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to Make Your Python Loops More Pythonic

Tue, 14 Mar 2017 00:00:00 GMT

How to Make Your Python Loops More Pythonic

Pythonize your C-style “for” and “while” loops by refactoring them using generators and other techniques.

One of the easiest ways to spot a developer with a background in C-style languages who only recently picked up Python is to look at how they write loops.

For example, whenever I see a code snippet like this, that’s an example of someone trying to write Python like it’s C or Java:

my_items = ['a', 'b', 'c']

i = 0
while i < len(my_items):
    print(my_items[i])
    i += 1

Now, what’s so “unpythonic” about this code?

Two things could be improved in this code example:

First, it keeps track of the index i manually—initializing it to zero and then carefully incrementing it upon every loop iteration.
And second, it uses len() to get the size of a container in order to determine how often to iterate.

In Python you can write loops that handle both of these responsibilities automatically. It’s a great idea to take advantage of that.

If your code doesn’t have to keep track of a running index it’s much harder to write accidental infinite loops, for example. It also makes the code more concise and therefore more readable.

How to track the loop index automatically

To refactor the while-loop in the code example, I’ll start by removing the code that manually updates the index. A good way to do that is with a for-loop in Python.

Using Python’s range() built-in I can generate the indexes automatically (without having to increment a running counter variable):

>>> range(len(my_items))
range(0, 3)

>>> list(range(0, 3))
[0, 1, 2]

The range type represents an immutable sequence of numbers. It’s advantage over a regular list is that it always takes the same small amount of memory.

Range objects don’t actually store the individual values representing the number sequence—instead they function as iterators and calculate the sequence values on the fly.

(This is true for Python 3. In Python 2 you’ll need to use the xrange() built-in to get this memory-saving behavior, as range() will construct a list object containing all the values.)

Now, instead of incrementing i on each loop iteration, I could write a refactored version of that loop like this:

for i in range(len(my_items)):
    print(my_items[i])

This is better. However it still isn’t super Pythonic—in most cases when you see code that uses range(len(...)) to iterate over a container it can be improved and simplified even further.

Let’s have a look at how you might do that in practice.

💡 Python’s “for” loops are “for-each” loops

As I mentioned, for-loops in Python are really “for-each” loops that can iterate over items from a container or sequence directly, without having to look them up by index. I can take advantage of that and simplify my loop even further:

for item in my_items:
    print(item)

I would consider this solution to be quite Pythonic. It’s nice and clean and almost reads like pseudo code from a text book. I don’t have to keep track of the container’s size or a running index to access elements.

The container itself takes care of handing out the elements so they can be processed. If the container is ordered, so will be the resulting sequence of elements. If the container is not ordered it will return its elements in an arbitrary order but the loop will still cover all of them.

What if I need the item index?

Now, of course you won’t always be able to rewrite your loops like that. What if you need the item index, for example? There’s a Pythonic way to keep a running index that avoids the range(len(...)) construct I recommended against.

The enumerate() built-in is helpful in this case:

>>> for i, item in enumerate(my_items):
...     print(f'{i}: {item}')

0: a
1: b
2: c

You see, iterators in Python can return more than just one value. They can return tuples with an arbitrary number of values that can then be unpacked right inside the for-statement.

This is very powerful. For example, you can use the same technique to iterate over the keys and values of a dictionary at the same time:

>>> emails = {
...     'Bob': 'bob@example.com',
...     'Alice': 'alice@example.com',
... }

>>> for name, email in emails.items():
...     print(f'{name} → {email}')

'Bob → bob@example.com'
'Alice → alice@example.com'

Okay, what if I just have to write a C-style loop?

There’s one more example I’d like to show you. What if you absolutely, positively need to write a C-style loop. For example, what if you must control the step size for the index? Imagine you had the following original C or Java for-loop:

for (int i = a; i < n; i+=s) {
    // ...
}

How would this pattern translate to Python?

The range() function comes to our rescue again—it can accept extra parameters to control the start value for the loop (a), the stop value (n), and the step size (s).

Therefore our example C-style loop could be implemented as follows:

for i in range(a, n, s):
    # ...

Main Takeaways

Writing C-style loops in Python is considered unpythonic. Avoid managing loop indexes and stop conditions manually if possible.
Python’s for-loops are really “for-each” loops that can iterate over items from a container or sequence directly.

📺 Watch a video tutorial based on this article

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

5 Python Development Setup Tips to Boost Your Productivity

Tue, 07 Mar 2017 00:00:00 GMT

5 Python Development Setup Tips to Boost Your Productivity

I struggled with setting up an effective development environment as a new Python developer. It was difficult to build the right habits and to find a set of tools I enjoyed to use.

Back then I didn’t understand how much this impacted my productivity. I didn’t even know some of the most valuable practices and tools I’m using today existed!

As my experience grew I understood this was a common pain among Python developers. No matter who I spoke with—colleagues, strangers at a conference, or developers on web forums and mailing lists—I saw similar struggles.

Today I believe entry-level Python programmers can make leaps in their productivity by adopting a few key practices and tools into their workflow.

This article helps you identify and fix 5 common issues in your Python development setup. I experienced them all myself and in some cases helped others through them as a colleague and a team lead. If you can avoid these issues you’ll become a happier and more effective Python developer.

#1 – Don’t waste time doing the compiler’s job

When developer brains do what computer brains can do much better then that’s usually a costly mistake. One example is programmers spending time hunting bugs that could be spotted just as well by automated tools.

For some reason, maybe because of Python’s dynamic nature and earlier status as a “scripting” language, it’s still rare to see it used with static code analysis tools and linters.

But these tools are fantastic. They can help detect and avoid certain bugs and classes of errors completely. For example, they can catch functional bugs like misspelled identifiers or reveal code quality issues like unused variables and imports.

I won’t say code analysis tools are a miracle cure—but they can help reduce debugging and code review time with a small initial time investment.

If you’re looking for just one tool that will improve the quality of your Python code without getting in the way with false positives and verbose messages, then I’d recommend the Pyflakes code linter. Pyflakes is open-source, available for free, and easy to set up.

To get immediate feedback and catch bugs early I recommend you integrate Pyflakes with your code editor and build server. Automatic linting for code changes as part of your continuous integration process makes your life easier. It ensures all developers on your team use the same settings and no uncaught warnings slip through the cracks.

Tip 1: Use static code analysis tools like Pyflakes.

#2 – Avoid fruitless code style discussions

Your team does code reviews? Great! But be aware that a common mistake among inexperienced code reviewers is to spend too much time on feedback that automated tools could give for them. I’m talking about code style issues.

It’s easy for development teams to get into a habit where they mostly talk about code style issues in code reviews: “We need an extra space character here.” or “Class names should use camel case.”

This is a form of bikeshedding that prevents developers from looking at the real issues. The ones that cost money and cause maintenance problems later on.

A quick fix here is to pick one of the Python style guides available on the internet, like PEP 8 or Google’s Python Style Guide, and to put automated tools in place that make sure committed code follows the style guide.

I recommend using PEP 8 as a style guide in combination with the Pycodestyle or flake8 code style checker. This will help avoid most code style discussions and allows your team to focus on the issues that matter.

Tip 2: Pick a code style (PEP 8) and enforce it with automated tools.

#3 – Micro delays and death by a thousand cuts

Usability research shows the large effect website page load time has on user abandonment: If people get bored waiting for something to happen it increases the chances that they’ll abandon the original task they had in mind.

As software developers, waiting on tools to complete their job is a normal part of our day to day workflow. We’re always waiting for a module to install, a test to run, or a commit to finish (“It’s compiling!”). Of course we’re not “abandoning” our work every time we have to wait a few seconds for a tool to run—keeping focused on the task at hand is part of our job after all.

Yet, keeping that focus costs mental energy that we might then lack in other areas of our work: We get tired a little quicker in the afternoon, or introduce a tiny little extra bug with our latest commit.

In my experience even small forced pauses and delays add up. Switching files in a slow editor or jumping between apps on a slow computer is frustrating! We can even apply this at a microscopic level and look into editor typing latencies. I believe these micro delays add up, too. They cost us productivity and cause frustration.

Got time for a little thought experiment? Let’s say you’re waiting for a task to complete for about 1 out of every 10 seconds you spend on productive work. That adds up to half a day per week, or 2 days a month, or a whole month of productive work you might be losing over the course of a year.

Maybe this estimate is too high—but what if you could get an additional week of productive time a year just by spending an afternoon on optimizing your tools? I’d say that’s worth a try!

Tip 3: Your development tools should be fast. Favor simplicity.

#4 – Don’t work with an unpleasant editing environment

Working with tools that I don’t enjoy crushes my productivity. You might know the feeling. Some tools are so frustrating to work with they zap your energy levels and motivation.

What’s the most important tool that you work with every day as a developer? For me it’s my code editor. For some developers it might be their email client or a team chat app—but let’s hope that a large part of your day is spent writing code.

This means it pays off in terms of productivity (and happiness!) to invest into an enjoyable code editing environment.

As Python developers we have many editors and IDEs to choose from: Vim, Emacs, PyCharm, Wing IDE, Atom, Eclipse PyDev, Sublime Text—just to name a few.

I spent much time fine-tuning my editing environment over the years. After trying other editors and IDEs I eventually settled on Sublime Text. I like its speed, simplicity, and stability. It just feels right for my programming workflow. And I arrived at this choice by trying as many other options as I could.

Your choice might be different. The point I’m trying to make is you need to find out which tool works best for yourself and your unique needs. Go and try out some editors and see which one you enjoy the most. Your productivity will thank you for it.

Tip 4: Find the right editor and tailor it to your needs.

#5 – Invest in your setup

I once worked with someone who used a commercial editor to write code. But that developer didn’t want to spend the money to purchase a license for it. Instead they used the trial version of the editor for months on end.

The trial version of this particular editor has a nag screen that pops up every few minutes when you save a file, asking you to buy the full version. This developer constantly saved files out of habit and therefore got to see that nag screen hundreds of times a day…

A license that would’ve removed the nag screen cost about $70. I love a frugal mindset but this was ridiculous! Trying to save some money on a critical tool you use all day was the wrong choice—I’m sure the nag screens and the subtle frustrations they caused added up to more than $70 of lost productivity.

If you’re working for yourself then these license costs will be a business expense you can deduct from your taxes. If you’re working for a company I’m sure they’ll gladly invest in your tools if you explain how they make you more productive and more valuable as an employee.

License costs for software development tools are low compared to what graphic designers or architects have to put up with, for example. Some of the best tools and editors are even available for free. Invest money in the right tools where it makes sense and your life (and career) will be better for it.

Tip 5: Invest in tools that make you happy and more effective.

Where to start?

I showed you five common development setup issues that can harm your productivity as a Python programmer. Luckily most of them are easy to fix with the right approach:

Tip 1: Use static code analysis tools like Pyflakes.
Tip 2: Pick a code style (PEP 8) and enforce it with automated tools.
Tip 3: Your development tools should be fast. Favor simplicity.
Tip 4: Find the right editor and tailor it to your needs.
Tip 5: Invest in tools that make you happy and more effective.

Here’s a good way to start: Find the one problem that irritates you the most. You’ll want to divide and conquer instead of trying to achieve perfection immediately. Fix one small thing at a time. Then iterate and keep making improvements from there.

Think of it as an investment—even small changes will compound over time and give you a nice long-term productivity gain. In my experience, success is all about building the right habits and a mindset of continuous improvement.

A great development environment makes you feel confident and productive. When you feel right at home in your setup programming Python becomes even more enjoyable and fun. Good luck!

(This article was originally published on TechBeacon.)

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Context Managers and the “with” Statement in Python

Tue, 28 Feb 2017 00:00:00 GMT

Context Managers and the “with” Statement in Python

The “with” statement in Python is regarded as an obscure feature by some. But when you peek behind the scenes of the underlying Context Manager protocol you’ll see there’s little “magic” involved.

So what’s the with statement good for? It helps simplify some common resource management patterns by abstracting their functionality and allowing them to be factored out and reused.

In turn this helps you write more expressive code and makes it easier to avoid resource leaks in your programs.

A good way to see this feature used effectively is by looking at examples in the Python standard library. A well-known example involves the open() function:

with open('hello.txt', 'w') as f:
    f.write('hello, world!')

Opening files using the with statement is generally recommended because it ensures that open file descriptors are closed automatically after program execution leaves the context of the with statement. Internally, the above code sample translates to something like this:

f = open('hello.txt', 'w')
try:
    f.write('hello, world')
finally:
    f.close()

You can already tell that this is quite a bit more verbose. Note that the try...finally statement is significant. It wouldn’t be enough to just write something like this:

f = open('hello.txt', 'w')
f.write('hello, world')
f.close()

This implementation won’t guarantee the file is closed if there’s an exception during the f.write() call—and therefore our program might leak a file descriptor. That’s why the with statement is so useful. It makes acquiring and releasing resources properly a breeze.

Another good example where the with statement is used effectively in the Python standard library is the threading.Lock class:

some_lock = threading.Lock()

# Harmful:
some_lock.acquire()
try:
    # Do something...
finally:
    some_lock.release()

# Better:
with some_lock:
    # Do something...

In both cases using a with statement allows you to abstract away most of the resource handling logic. Instead of having to write an explicit try...finally statement each time, with takes care of that for us.

The with statement can make code dealing with system resources more readable. It also helps avoid bugs or leaks by making it almost impossible to forget cleaning up or releasing a resource after we’re done with it.

Supporting `with` in Your Own Objects

Now, there’s nothing special or magical about the open() function or the threading.Lock class and the fact that they can be used with a with statement. You can provide the same functionality in your own classes and functions by implementing so-called context managers.

What’s a context manager? It’s a simple “protocol” (or interface) that your object needs to follow so it can be used with the with statement. Basically all you need to do is add __enter__ and __exit__ methods to an object if you want it to function as a context manager. Python will call these two methods at the appropriate times in the resource management cycle.

Let’s take a look at what this would look like in practical terms. Here’s how a simple implementation of the open() context manager might look like:

class ManagedFile:
    def __init__(self, name):
        self.name = name

    def __enter__(self):
        self.file = open(self.name, 'w')
        return self.file

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.file:
            self.file.close()

Our ManagedFile class follows the context manager protocol and now supports the with statement, just like the original open() example did:

>>> with ManagedFile('hello.txt') as f:
...    f.write('hello, world!')
...    f.write('bye now')

Python calls __enter__ when execution enters the context of the with statement and it’s time to acquire the resource. When execution leaves the context again, Python calls __exit__ to free up the resource.

Writing a class-based context manager isn’t the only way to support the with statement in Python. The contextlib utility module in the standard library provides a few more abstractions built on top of the basic context manager protocol. This can make your life a little easier if your use cases matches what’s offered by contextlib.

For example, you can use the contextlib.contextmanager decorator to define a generator-based factory function for a resource that will then automatically support the with statement. Here’s what rewriting our ManagedFile context manager with this technique looks like:

from contextlib import contextmanager

@contextmanager
def managed_file(name):
    try:
        f = open(name, 'w')
        yield f
    finally:
        f.close()

>>> with managed_file('hello.txt') as f:
...     f.write('hello, world!')
...     f.write('bye now')

In this case, managed_file() is a generator that first acquires the resource. Then it temporarily suspends its own executing and yields the resource so it can be used by the caller. When the caller leaves the with context, the generator continues to execute so that any remaining clean up steps can happen and the resource gets released back to the system.

Both the class-based implementations and the generator-based are practically equivalent. Depending on which one you find more readable you might prefer one over the other.

A downside of the @contextmanager-based implementation might be that it requires understanding of advanced Python concepts, like decorators and generators.

Once again, making the right choice here comes down to what you and your team are comfortable using and find the most readable.

Writing Pretty APIs With Context Managers

Context managers are quite flexible and if you use the with statement creatively you can define convenient APIs for your modules and classes.

For example, what if the “resource” we wanted to manage was text indentation levels in some kind of report generator program? What if we could write code like this to do it:

with Indenter() as indent:
    indent.print('hi!')
    with indent:
        indent.print('hello')
        with indent:
            indent.print('bonjour')
    indent.print('hey')

This almost reads like a domain-specific language (DSL) for indenting text. Also, notice how this code enters and leaves the same context manager multiple times to change indentation levels. Running this code snippet should lead to the following output and print neatly formatted text:

hi!
    hello
        bonjour
hey

How would you implement a context manager to support this functionality?

By the way, this could be a great exercise to wrap your head around how context managers work. So before you check out my implementation below you might take some time and try to implement this yourself as a learning exercise.

Ready? Here’s how we might implement this functionality using a class-based context manager:

class Indenter:
    def __init__(self):
        self.level = 0

    def __enter__(self):
        self.level += 1
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.level -= 1

    def print(self, text):
        print('    ' * self.level + text)

Another good exercise would be trying to refactor this code to be generator-based.

Things to Remember

The with statement simplifies exception handling by encapsulating standard uses of try/finally statements in so-called Context Managers.
Most commonly it is used to manage the safe acquisition and release of system resources. Resources are acquired by the with statement and released automatically when execution leaves the with context.
Using with effectively can help you avoid resource leaks and make your code easier to read.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Installing Python and Pip on Windows

Sat, 25 Feb 2017 00:00:00 GMT

Installing Python and Pip on Windows

In this tutorial you’ll learn how to set up Python and the Pip package manager on Windows 10, completely from scratch.

Step 1: Download the Python Installer

The best way to install Python on Windows is by downloading the official Python installer from the Python website at python.org.

To do so, open a browser and navigate to https://python.org/. After the page has finished loading, click Downloads.

The website should detect that you’re on Windows and offer you to download the latest version of Python 3 or Python 2. If you don’t know which version of Python to use then I recommend Python 3. If you know you’ll need to work with legacy Python 2 code only then should you pick Python 2.

Under Downloads → Download for Windows, click the “Python 3.X.X” (or “Python 2.X.X”) button to begin downloading the installer.

Sidebar: 64-bit Python vs 32-bit Python

If you’re wondering whether you should use a 32-bit or a 64-bit version of Python then you might want to go with the 32-bit version.

It’s sometimes still problematic to find binary extensions for 64-bit Python on Windows, which means that some third-party modules might not install correctly with a 64-bit version of Python.

My thinking is that it’s best to go with the version currently recommended on python.org. If you click the Python 3 or Python 2 button under “Download for Windows” you’ll get just that.

Remember that if you get this choice wrong and you’d like to switch to another version of Python you can just uninstall Python and then re-install it by downloading another installer from python.org.

Step 2: Run the Python Installer

Once the Python installer file has finished downloading, launch it by double-clicking on it in order to begin the installation.

Be sure to select the Add Python X.Y to PATH checkbox in the setup wizard.

Please make sure the “Add Python X.Y to PATH” checkbox was enabled in the installer because otherwise you will have problems accessing your Python installation from the command line. If you accidentally installed Python without checking the box, follow this tutorial to add python.exe to your system PATH.

Click Install Now to begin the installation process. The installation should finish quickly and then Python will be ready to go on your system. We’re going to make sure everything was set up correctly in the next step.

Step 3: Verify Python Was Installed Correctly

After the Python installer finished its work Python should be installed on your system. Let’s make sure everything went correctly by testing if Python can be accessed from the Windows Command Prompt:

Open the Windows Command Prompt by launching cmd.exe
Type pip and hit Return
You should see the help text from Python’s “pip” package manager. If you get an error message running pip go through the Python install steps again to make sure you have a working Python installation. Most issues you will encounter here will have something to do with the PATH not being set correctly. Re-installing and making sure that the “Add Python to PATH” option is enabled in the installer should resolve this.

What Now?

Assuming everything went well and you saw the output from Pip in your command prompt window—Congratulations, you just installed Python on your system!

Wondering where to go from here? Click here to get some pointers for Python beginners.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Sublime Text Settings for Writing Clean Python

Tue, 21 Feb 2017 00:00:00 GMT

Sublime Text Settings for Writing Clean Python

How to write beautiful and clean Python by tweaking your Sublime Text settings so that they make it easier to adhere to the PEP 8 style guide recommendations.

There are a few settings you can change to make it easier for you to write PEP 8 compliant Python with Sublime Text 3. PEP 8 is the most common Python style guide and widely used in the Python community.

The tweaks I describe in this article mainly deal with getting the placement of whitespace correct so that you don’t have to manage this (boring) aspect yourself.

I’ll also show you how to get visual indicators for the maximum allowed line-lengths in your editor window so that your lines can be concise and beautifully PEP 8 compliant—just like Guido wants them to be 🙂

Optional: Opening Sublime’s Syntax-Specific Settings for Python

The settings we’re changing now are specific to Python. Feel free to place them in your User settings, that will work just fine. However if you’d like to apply some or all of the settings in this tutorial only to Python code then here’s how you can do that:

Open a Python file in Sublime Text (or create a new file, open the Command Palette and execute the “Set Syntax: Python” command)
Click on Sublime Text → Preferences → Settings – More → Syntax Specific – User to open your Python-specific user settings. Make sure this opens a new editor tab called Python.sublime-settings. That’s the one you want!

If you’d like like to learn more about how Sublime Text’s preferences system works, then check out this tutorial I wrote.

Better Whitespace Handling

The following changes you can make to your (Syntax Specific) User Settings will help you keep the whitespace in your Python code clean and consistent:

"tab_size": 4,
"translate_tabs_to_spaces": true,
"trim_trailing_white_space_on_save": true,
"ensure_newline_at_eof_on_save": true

A tab_size of 4 is the general recommendation for writing Python. You’ll also want to enable translate_tabs_to_spaces to ensure that you don’t have a mixture of tabs and spaces in your Python files, which should be avoided.

The trim_trailing_white_space_on_save option will remove superfluous whitespace at the end of lines or on empty lines. I highly recommend enabling this because it can save headaches and merge conflicts when working with Git and other forms of source control.

PEP 8 recommends that Python files should end with a blank line to ensure that POSIX tools can process the file correctly. If you want to never have to worry about this again then turn on the ensure_newline_at_eof_on_save setting as this will make sure that your Python files end with a newline automatically.

Enable PEP 8 Line-Length Indicators

Another setting that’s really handy for writing PEP 8 compliant code is the “rulers” feature. It enables visual indicators in the editor area that show you the preferred maximum line length.

You can enable several rulers with different line lengths at the same time. This helps you follow the PEP 8 recommendations of limiting your docstrings to 72 characters and limiting all other lines to 79 characters.

Here’s how to set up the rulers feature for Python development. Open your (Syntax Specific) User Settings and add the following setting:

"rulers": [
    72,
    79
]

This will add two line-length indicators—one at 72 characters for docstrings, and one at 79 characters for regular lines. You can see them in the screenshot as vertical lines on the right-hand side of the editor area.

Turn On Word Wrapping

I like enabling Sublime’s word-wrapping feature when I’m writing Python. Most of my projects follow the PEP 8 style guide and therefore use a maximum line length of 79 characters.

I don’t want to get into an argument why that’s a good idea or not—but one benefit I found from limiting the lengths of my lines is that I can comfortably fit several files on my screen at once using Sublime’s “split layouts” feature.

This is especially useful if you’re following a test-heavy development process because you can see and edit the test and the production code at the same time.

Of course sometimes you’ll encounter a file that uses line-lengths above the 79 characters recommended by PEP 8. If I’m using split layouts with multiple editor panes at the same time it impacts my productivity if I have to scroll around horizontally.

The idea is to see all of the code at once. So, how can we fix that?

The best way I found to handle this is to enable Sublime’s word-wrap feature. This will visually break apart lines that are longer than the maximum line length. It might look a little odd sometimes but it’s still light years better than having to scroll around horizontally.

Here’s how you enable word wrapping. Open your (Syntax Specific) User Settings and add (or modify) the following options:

"word_wrap": true,
"wrap_width": 80

I’m setting the wrap_width to 80 which is one character past the 79 characters recommended by PEP 8. Therefore any line that goes beyond the PEP 8 recommendations will get wrapped.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Writing Clean Python With Namedtuples

Tue, 14 Feb 2017 00:00:00 GMT

Writing Clean Python With Namedtuples

Python comes with a specialized “namedtuple” container type that doesn’t seem to get the attention it deserves. It’s one of these amazing features in Python that’s hidden in plain sight.

Namedtuples can be a great alternative to defining a class manually and they have some other interesting features that I want to introduce you to in this article.

Now, what’s a namedtuple and what makes it so special? A good way to think about namedtuples is to view them as an extension of the built-in tuple data type.

Python’s tuples are a simple data structure for grouping arbitrary objects. Tuples are also immutable—they cannot be modified once they’ve been created.

>>> tup = ('hello', object(), 42)
>>> tup
('hello', <object object at 0x105e76b70>, 42)
>>> tup[2]
42
>>> tup[2] = 23
TypeError: "'tuple' object does not support item assignment"

A downside of plain tuples is that the data you store in them can only be pulled out by accessing it through integer indexes. You can’t give names to individual properties stored in a tuple. This can impact code readability.

Also, a tuple is always an ad-hoc structure. It’s hard to ensure that two tuples have the same number of fields and the same properties stored on them. This makes it easy to introduce “slip-of-the-mind” bugs by mixing up the field order.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Lambda Functions in Python: What Are They Good For?

Tue, 07 Feb 2017 00:00:00 GMT

Lambda Functions in Python: What Are They Good For?

An introduction to “lambda” expressions in Python: What they’re good for, when you should use them, and when it’s best to avoid them.

The lambda keyword in Python provides a shortcut for declaring small anonymous functions. Lambda functions behave just like regular functions declared with the def keyword. They can be used whenever function objects are required.

For example, this is how you’d define a simple lambda function carrying out an addition:

>>> add = lambda x, y: x + y
>>> add(5, 3)
8

You could declare the same add function with the def keyword:

>>> def add(x, y):
...     return x + y
>>> add(5, 3)
8

Now you might be wondering: Why the big fuss about lambdas? If they’re just a slightly more terse version of declaring functions with def, what’s the big deal?

Take a look at the following example and keep the words function expression in your head while you do that:

>>> (lambda x, y: x + y)(5, 3)
8

Okay, what happened here? I just used lambda to define an “add” function inline and then immediately called it with the arguments 5 and 3.

Conceptually the lambda expression lambda x, y: x + y is the same as declaring a function with def, just written inline. The difference is I didn’t bind it to a name like add before I used it. I simply stated the expression I wanted to compute and then immediately evaluated it by calling it like a regular function.

Before you move on, you might want to play with the previous code example a little to really let the meaning of it sink in. I still remember this took me a while to wrap my head around. So don’t worry about spending a few minutes in an interpreter session.

There’s another syntactic difference between lambdas and regular function definitions: Lambda functions are restricted to a single expression. This means a lambda function can’t use statements or annotations—not even a return statement.

How do you return values from lambdas then? Executing a lambda function evaluates its expression and then automatically returns its result. So there’s always an implicit return statement. That’s why some people refer to lambdas as single expression functions.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Why Learn Python? Here Are 8 Data-Driven Reasons

Tue, 31 Jan 2017 00:00:00 GMT

Why Learn Python? Here Are 8 Data-Driven Reasons

Is Python worth learning? We’ve interviewed experts and surveyed the job market to identify the key reasons why you should learn Python today.

Python had a great year in 2016. The latest Stack Overflow Developer Survey ranked Python as the 6th most popular and the 4th most wanted technology of the year.

Python is also one of the hottest skills to have according to research by Dice, and the 2nd most popular programming language in the world based on the PYPL Popularity of Programming Language Index.

So why the hype? What makes Python so popular? Should you stop what you’re doing and start learning Python right now? I’ve searched far and wide to find out why Python is one of the world’s most loved and most used technologies. Without further ado, here’s why Python is worth learning in 2017 and the years ahead:

1. You Can Use Python for Pretty Much Anything

One significant advantage of learning Python is that it’s a general-purpose language that can be applied in a large variety of projects. Below are just some of the most common fields where Python has found its use:

Data science
Scientific and mathematical computing
Web development
Finance and trading
System automation and administration
Computer graphics
Basic game development
Security and penetration testing
General and application-specific scripting
Mapping and geography (GIS software)

In preparation for this post, I posted the question “Is Python worth learning?” on Google+, Quora, and LinkedIn in order to collect some professional opinions on the matter. Here’s one of the responses I got that supports my point:

“I had the opportunity to start learning Python 6 years ago. Since that time, I’ve used Python for everything from work related stuff to home automation tasks, and I have never stumbled upon a problem that can’t be solved with Python.”

— Anass Bensrhir, Senior Data Scientist and Managing Director at Bold Data

2. Python Is Widely Used in Data Science

(Source)

Python’s application in data science and data engineering is what’s really fuelling its popularity today. Pandas, NumPy, SciPy, and other tools combined with the ability to prototype quickly and then “glue” systems together enable data engineers to maintain high efficiency when using Python.

Justin McGrath, a researcher at the University of Illinois, Champaign-Urbana agrees:

“Python is probably going to become the de facto standard for scientific and statistical analyses. If you’re going into those fields, it’s certainly worth learning.”

3. Python Pays Well

It’s all well and good, but what about the pay, I hear you ask? It turns out Python engineers have some of the highest salaries in the industry, at least in the US.

At nearly $103,500 per year, Python is the second best-paying programming language in the country (beating out Java, C++, and JavaScript) according to Gooroo, a skill and salary analytics platform.

Indeed’s salary calculator gives an even larger figure—a whopping $116,000 per year. Of course, tech salaries differ greatly from one state to another. So to add some context, here’s a breakdown of how much Python engineers make in the states featured on Indeed:

(Click to view a larger version of the above image.)

4. Demand for Python Developers Is High (And Growing)

Based on Indeed’s job trends, it looks like having Python under your belt can help you land a job in very short terms. The graph below displays a steady growth in the number of job postings featuring Python since 2012, and there has been a strong spike in popularity over the last six months.

What’s more, the demand for Python skills clearly outstrips jobseeker interest. The job market outlook for Python developers is excellent at the moment.

5. Python Saves Time

I’m pretty sure that the majority of the developers who’ve used Python would agree that making anything with this language takes a lot less time and code than most other technologies.

Even the classic “Hello, world” program illustrates this point:

print("Hello, world")

For comparison, this is what the same program looks like in Java:

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, world");
    }
}

6. Python Is Beginner Friendly

Following up from the previous point, Python’s famously clean and readable syntax makes it newbie-friendly. A well-written Python program can look like it came straight out of an algorithms text book. There’s little superfluous boilerplate, allowing beginners and experts alike to focus on the job at hand—instead of the code.

Python’s efficiency and readability have also made it the number one most commonly taught introductory language at top US universities. This will have ramifications on the future job market and likely make Python an even more popular technology choice.

7. All the Big Names Use Python

Ever wanted to work for a tech giant like Google or Facebook? Python could be your way in, as these companies, as well as YouTube, IBM, Yahoo, Dropbox, Quora, Mozilla, Instagram, and many others all use Python for a wide array of purposes, and are constantly hiring Python developers.

(Source)

Dropbox’s code base, for instance, uses Python for almost everything, including analytics, the server backend, the API backend, and the desktop clients.

8. Python Has an Amazing Ecosystem

Last but not least, there’s a huge number of resources developed for Python that keep getting updated, including an impressive standard library with built-in functionality, a built-in unit testing framework, and more than enough frameworks and environments that allow you to focus on writing the website or app at hand.

Django is the most commonly used Python web framework, but there’s also Flask, Pyramid, web2py, Zope 2, and a few more.

What Do Python Community Leaders Think?

I thought it would also be a good idea to top things up with a few expert opinions on the advantages of Python as well as its future. It’s always a good idea to get a second (or third) opinion. So I reached out to several influencers and leaders in the Python space.

I asked each of these experts three questions:

What advantages does Python have over other programming languages?
What future do you see for Python in 3–5 years?
What will the job market look like for a Python developer in the coming years?

Here’s what I was able to learn:

Michael Kennedy Python Coach and Host of the Talk Python and Python Bytes Podcasts

“You start easy but you rarely outgrow Python like you do other easy to learn languages”

What advantages does Python have over other programming languages?

I often think of programming languages as falling into two buckets.

The first group would be the “With great power comes great responsibility” type of languages. This would be C, C++, and to a lesser degree C# and Java. The others are “I just need to ship something, don’t waste my time with minutia” languages. Visual Basic (pre-VB.NET) and JavaScript seem solidly in this camp, although JavaScript appears to be trying to escape with the massive decoupling seen in typical Node.js code and TypeScript.

You choose C++ or C# if you need to really control the system and build large professional software. Is it mission critical enterprise software running the company with 100k lines of code? You might choose these. If you need a quick app to get the job done, like writing that “forms over data” app for something internal, VB 6 used to be a great answer for finishing that in a week, but coding yourself into a box if it grows too big or needs low level capabilities.

Python is one of the few languages that is:

Easy to learn
Solves that “Don’t waste my time” set of problems well
Yet, is also well designed with OOP and solid modern language features
Can grow in power to match the powerful languages in capabilities

In short, it’s one of the few languages that spans the spectrum of these capabilities. You start easy but you rarely outgrow Python like you do other easy to learn languages.

We could also go into things like data science, scientific computing, web development, microcontrollers, things like Raspberry Pi, and how Python spans more technologies and areas of focus than most programming languages do.

But the full spectrum aspect is the most powerful to me.

What future do you see for Python in 3–5 years?

In terms of predictions, I’m willing to make a few:

Python will continue to expand into new areas of computing. It will be the primary IoT programming language.
We will see Python interpreters/runtimes evolve and innovate. The YouTube team just released a project running Python on the Go runtime for example.
The Python 3 vs Python 2 schism that has turned off countless new developers and generally been a cloud over community will be closed, and Python 3 will be just “Python”.

What will the job market look like for a Python developer in the coming years?

Given the growth numbers as well as the wide areas of computing that Python occupies, I think the job perspectives for Python developers are very solid.

Some folks may feel Python is kind of a niche language or a small time scripting language. But very major applications are written in Python, including Dropbox and Youtube.

Other areas outside web development where Python shines are places like the Large Hadron Collider where the team that found the Higgs Boson and won the Nobel Prize made heavy use of Python. Netflix uses Python to manage their AWS servers which cumulatively handle up to 35% of the bandwidth of the United States during the evenings.

You’ll find that some locations in the world are more Python-centric than others. But there are many opportunities for Python developers.

Michael Kennedy is a Python coach and host of the popular Talk Python and Python Bytes podcasts.

Ankur Gupta Curator at ImportPython

“There is a demand-supply mismatch for Python developers with 2 to 6 years of experience”

What advantages does Python have over other programming languages?

Python is an easier language to learn compared to, say, C++, C, C#, or Java, but that’s not it. We often tend to credit syntax, core team, feature roadmap, etc for the success of a certain language.

They’re beyond doubt important, but when it comes to Python, it’s the global, diverse, and vibrant community that make it so widely adopted. Initiatives like Django Girls and the scale at which they operate are unique. There are at least three dozen free books on Python, thousands of free videos to learn from, as well as the PyCon events all around the world.

Active local and online regional Python communities are the biggest advantage that Python has over other languages. It’s the people behind the language that make it special.

What future do you see for Python in 3–5 years?

10 years ago, mentioning Python was guaranteed to invite blank stares. But today, Python is a pretty mainstream language. I think Python is here to stay.

In 3–5 years I foresee:

2.x codebase becoming a minority
Python developers being available in abundance thanks to schools and colleges that teach Python as an introductory language
People using different Python runtime interpreters instead of just CPython

What will the job market look like for a Python developer in the coming years?

Back in 2007–2008, I’d get no more than 3–4 calls a month concerning Python job openings, and most of those calls had to do with Python scripting for test automation (India). But if I were to look for a job today, I’m sure my phone would ring multiple times per day.

There is a demand-supply mismatch for Python developers with 2 to 6 years of experience because of all these companies wanting to use Python for data science, data processing, machine learning, web application development, and so on.

This situation will be gradually improving over the next couple of years, which means today is definitely the best time to be a Python developer.

Ankur Gupta is the curator of the weekly newsletter over at ImportPython.com, which keeps you updated on everything happening in the world of Python programming.

Sebastian Vetter Python Engineer at Eventbase, PyCon Speaker and Meetup Host

“The community around Python is the most welcoming and inclusive one out of all those that I’ve experienced”

What advantages does Python have over other programming languages?

Community. The community around Python is the most welcoming and inclusive one out of all those that I’ve experienced. Many times I’ve been inspired by the progressive effort at meetups and conferences to be inclusive to newcomers, underrepresented groups and minorities.
Readability. A lot of effort has gone into developing Python as a language that has readability as one of its main features, rather than considering it as an afterthought. As Robert C. Martin wrote in Clean Code, “the ratio of time spent reading versus writing is well over 10 to 1.”
Consistency. One of the things that I’ve always loved about Python is the fact that it uses whitespace to determine blocks instead of using various types of brackets. Although this is a little unintuitive when starting out, in my opinion, the advantage is that it ensures that Python code is relatively similar across different projects. It improves consistency and readability.

What future do you see for Python in 3–5 years?

In my opinion, the use of Python and the number of developers working with it will grow significantly in scientific fields. The number of science-related topics at Python conferences (and beyond) and releases of new tools to help the scientific community will make it easier to adopt the language. This will give the scientific community access to a very inclusive and welcoming developer community that will help improve the quality of development and simplify the tooling for scientific and research-related applications.

The mobile space is going to be very interesting in about 3–5 years. As Russell Keith-Magee pointed out in his presentation “Python on the Move: the State of Mobile Python” at PyCon AU 2015, the future of Python as a language will most likely depend in part on how the community moves into the mobile development space. Although the Python community is very diverse and the language is used in a lot of different fields, we currently don’t have any decent support for mobile platforms. Looking at Russell’s efforts to bridge this gap with his project under the BeeWare umbrella, I’m confident that this gap will be closed within the next few years, and we’ll be able to maintain a strong position even in these new areas.

Over the last several years, there’s been a lot of disagreement over Python 3 and whether it’s a step in the right direction. I do understand some of the critical arguments made against Python 3. Several highly qualified Pythonistas with vastly more experience than myself have raised valid concerns and pointed out flaws. Regardless of these concerns, I’m convinced that the adoption of Python 3 will pick up steam over the next two or three years, moving faster towards it being the mainstream version. This is indicated by projects like Django dropping support for Python 2.7 within 2017 with their release of Django 2.0 and the broader adoption of asyncio and coroutine-based frameworks and libraries.

Making the Python community a more inclusive space for individuals of underrepresented groups such as women and other minorities will help us build a community made up of all different types of people. I’m sure that over the next 5 years, we’ll see the first major benefits of these initiatives contributing to a much stronger community. Making everyone welcome and embracing the differences in perspectives and experiences will serve as a model for companies, proving that such an environment results in better software and happier employees. I also think that individuals from within the Python community who’ve experienced this atmosphere will impact their employers by demanding a similar environment in their professional lives, drawing from the support of the community.

What will the job market look like for a Python developer in the coming years?

The next few years will most likely see a much more diverse landscape of Python jobs. With the increased application of Python in scientific fields, more research positions will become available. In addition, I think the growing need of programming skills within the scientific community will lead to having a combination of researchers and programmers to produce a skilled workforce that is capable in the scientific aspect as well as development best practises and tooling.

The position of Data Scientist is going to become more and more important in the tech industry and will therefore increase the demand within the Python community specifically. We already have a large number of scientists use Python as their main language for their research in our community. Their skills in statistics and the use of the language will make them prime candidates for positions that are related to data-driven systems. With the demand for such systems growing fast, there will be a high demand for these individuals, and anybody within the Python community willing to level up on either the development aspects or the scientific skills.

The Python community is strongly committed to improving its inclusiveness and diversity. Mandating and enforcing codes of conduct at conferences and meetups as well as openly stating the inclusive nature of communities around projects like the Django framework are helping to improve the representation of underprivileged individuals within the community. I hope and believe that this will, over the next few years, help make the community a place that will thrive, because individuals from these underrepresented groups will feel safe and welcome. This will make the Python community an exceptional pool to tap into for companies that are making an effort to improve the diversity of their development and science teams.

Sebastian Vetter is a Senior Python Engineer at Eventbase, PyCon speaker and Python meet-up host.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

The 4 Major Ways to Do String Formatting in Python

Tue, 24 Jan 2017 00:00:00 GMT

The 4 Major Ways to Do String Formatting in Python

Remember the Zen of Python and how there should be “one obvious way to do something in Python”? You might scratch your head when you find out that there are *four* major ways to do string formatting in Python.

In this article I’ll demonstrate how these four string formatting approaches work and what their respective strengths and weaknesses are. I’ll also give you my simple “rule of thumb” for how I pick the best general purpose string formatting approach.

Let’s jump right in, as we’ve got a lot to cover. In order to have a simple toy example for experimentation, let’s assume we’ve got the following variables (or constants, really) to work with:

>>> errno = 50159747054
>>> name = 'Bob'

And based on these variables we’d like to generate an output string containing a simple error message:

'Hey Bob, there is a 0xbadc0ffee error!'

Hey… now that error could really spoil a dev’s Monday morning. But we’re here to discuss string formatting. So let’s get to work.

#1 – “Old Style” String Formatting (%-operator)

Strings in Python have a unique built-in operation that can be accessed with the %-operator. This lets you do simple positional formatting very easily. If you’ve ever worked with a printf-style function in C you’ll recognize how this works instantly. Here’s a simple example:

>>> 'Hello, %s' % name

"Hello, Bob"

I’m using the %s format specifier here to tell Python where to substitute the value of name, represented as a string.

There are other format specifiers available that let you control the output format. For example it’s possible to convert numbers to hexadecimal notation or to add whitespace padding to generate nicely formatted tables and reports (cf. Python Docs: “printf-style String Formatting”).

Here, we can use the %x format specifier to convert an int value to a string and to represent it as a hexadecimal number:

>>> '%x' % errno

'badc0ffee'

The “old style” string formatting syntax changes slightly if you want to make multiple substitutions in a single string. Because the %-operator only takes one argument you need to wrap the right-hand side in a tuple, like so:

>>> 'Hey %s, there is a 0x%x error!' % (name, errno)

'Hey Bob, there is a 0xbadc0ffee error!'

It’s also possible to refer to variable substitutions by name in your format string, if you pass a mapping to the %-operator:

>>> 'Hey %(name)s, there is a 0x%(errno)x error!' % {
...     "name": name, "errno": errno }

'Hey Bob, there is a 0xbadc0ffee error!'

This makes your format strings easier to maintain and easier to modify in the future. You don’t have to worry about making sure the order you’re passing in the values matches up with the order the values are referenced in the format string. Of course the downside is that this technique requires a little more typing.

I’m sure you’ve been wondering why this printf-style formatting is called “old style” string formatting. It was technically superseded by “new style” formatting, which we’re going to talk about in a minute.

#2 – “New Style” String Formatting (str.format)

Python 3 introduced a new way to do string formatting that was also later back-ported to Python 2.7. This “new style” string formatting gets rid of the %-operator special syntax and makes the syntax for string formatting more regular. Formatting is now handled by calling a format() function on a string object (cf. Python Docs: “str.format”).

You can use the format() function to do simple positional formatting, just like you could with “old style” formatting:

>>> 'Hello, {}'.format(name)

'Hello, Bob'

Or, you can refer to your variable substitutions by name and use them in any order you want. This is quite a powerful feature as it allows for re-arranging the order of display without changing the arguments passed to the format function:

>>> 'Hey {name}, there is a 0x{errno:x} error!'.format(
...     name=name, errno=errno)

'Hey Bob, there is a 0xbadc0ffee error!'

This also shows that the syntax to format an int variable as a hexadecimal string has changed. Now we need to pass a format spec by adding a :x suffix. The format string syntax has become more powerful without complicating the simpler use cases. It pays off to read up on this string formatting mini-language in the Python documentation (cf. Python Docs: “Format String Syntax”).

In Python 3, this “new style” string formatting is to be preferred over %-style formatting. While “old style” formatting has been de-emphasized it has not been deprecated. It is still supported in the latest versions of Python. According to this discussion on the Python dev email list and this issue on the Python dev bug tracker, %-formatting is going to stick around for a long time to come.

Still, the official Python 3.X documentation doesn’t exactly recommend “old style” formatting or speak too fondly of it:

The formatting operations described here exhibit a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly). Using the newer formatted string literals or the str.format() interface helps avoid these errors. These alternatives also provide more powerful, flexible and extensible approaches to formatting text. (Source: Python 3 Docs)

This is why I’d personally try to stick with str.format for new code moving forward. Starting with Python 3.6 there’s yet another way to format your strings. I’ll tell you all about it in the next section.

#3 – Literal String Interpolation (Python 3.6+)

Python 3.6 adds a new string formatting approach called Formatted String Literals. This new way of formatting strings lets you use embedded Python expressions inside string constants. Here’s a simple example to give you a feel for the feature:

>>> f'Hello, {name}!'

'Hello, Bob!'

This new formatting syntax is powerful. Because you can embed arbitrary Python expressions you can even do inline arithmetic with it. See here for example:

>>> a = 5
>>> b = 10
>>> f'Five plus ten is {a + b} and not {2 * (a + b)}.'

'Five plus ten is 15 and not 30.'

Formatted string literals are a Python parser feature that converts f-strings into a series of string constants and expressions. They then get joined up to build the final string.

Imagine we had the following greet() function that contains an f-string:

>>> def greet(name, question):
...     return f"Hello, {name}! How's it {question}?"
...

>>> greet('Bob', 'going')
"Hello, Bob! How's it going?"

When we disassemble the function and inspect what’s going on behind the scenes we can see that the f-string in the function gets transformed into something similar to the following:

>>> def greet(name, question):
...    return "Hello, " + name + "! How's it " + question + "?"

The real implementation is slightly faster than that because it uses the BUILD_STRING opcode as an optimization. But functionally they’re the same:

>>> import dis
>>> dis.dis(greet)
  2           0 LOAD_CONST               1 ('Hello, ')
              2 LOAD_FAST                0 (name)
              4 FORMAT_VALUE             0
              6 LOAD_CONST               2 ("! How's it ")
              8 LOAD_FAST                1 (question)
             10 FORMAT_VALUE             0
             12 LOAD_CONST               3 ('?')
             14 BUILD_STRING             5
             16 RETURN_VALUE

String literals also support the existing format string syntax of the str.format() method. That allows you to solve the same formatting problems we’ve discussed in the previous two sections:

>>> f"Hey {name}, there's a {errno:#x} error!"

"Hey Bob, there's a 0xbadc0ffee error!"

Python’s new Formatted String Literals are similar to the JavaScript Template Literals added in ES2015. I think they’re quite a nice addition to the language and I’ve already started using them in my day to day (Python 3) work. You can learn more about Formatted String Literals in the official Python documentation (cf. Python Docs: “Formatted string literals”).

#4 – Template Strings (standard library)

Here’s one more technique for string formatting in Python: Template Strings. It’s a simpler and less powerful mechanism, but in some cases this might be exactly what you’re looking for.

Let’s take a look at a simple greeting example:

>>> from string import Template
>>> t = Template('Hey, $name!')
>>> t.substitute(name=name)

'Hey, Bob!'

You see here that we need to import the Template class from Python’s built-in string module. Template strings are not a core language feature but they’re supplied by a module in the standard library.

Another difference is that template strings don’t allow format specifiers. So in order to get our error string example to work we need to transform our int error number into a hex-string ourselves:

>>> templ_string = 'Hey $name, there is a $error error!'
>>> Template(templ_string).substitute(
...     name=name, error=hex(errno))

'Hey Bob, there is a 0xbadc0ffee error!'

That worked great. So when should you use template strings in your Python programs? In my opinion the best use case for template strings is when you’re handling format strings generated by users of your program. Due to their reduced complexity template strings are a safer choice.

The more complex formatting mini-languages of the other string formatting techniques might introduce security vulnerabilities to your programs. For example, it’s possible for format strings to access arbitrary variables in your program.

That means, if a malicious user can supply a format string they can potentially leak secret keys and other sensible information! Here’s a simple proof of concept of how this attack might be used:

>>> SECRET = 'this-is-a-secret'
>>> class Error:
...     def __init__(self):
...         pass
>>> err = Error()
>>> user_input = '{error.__init__.__globals__[SECRET]}'

# Uh-oh...
>>> user_input.format(error=err)

'this-is-a-secret'

See how a hypothetical attacker was able to extract our secret string by accessing the __globals__ dictionary? Scary, huh? Template Strings close this attack vector. And this makes them a safer choice if you’re handling format strings generated from user input:

>>> user_input = '${error.__init__.__globals__[SECRET]}'
>>> Template(user_input).substitute(error=err)

ValueError:
"Invalid placeholder in string: line 1, col 1"

Which String Formatting Method Should I Use?

I totally get that having so much choice for how to format your strings in Python can feel very confusing. This is an excellent cue to bust out this handy flowchart infographic I’ve put together for you:

This flowchart is based on the following rule of thumb that I apply when I’m writing Python:

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Assert Statements in Python

Wed, 18 Jan 2017 00:00:00 GMT

Assert Statements in Python

How to use assertions to help automatically detect errors in your Python programs in order to make them more reliable and easier to debug.

What Are Assertions & What Are They Good For?

Python’s assert statement is a debugging aid that tests a condition. If the condition is true, it does nothing and your program just continues to execute. But if the assert condition evaluates to false, it raises an AssertionError exception with an optional error message.

The proper use of assertions is to inform developers about unrecoverable errors in a program. They’re not intended to signal expected error conditions, like “file not found”, where a user can take corrective action or just try again.

Another way to look at it is to say that assertions are internal self-checks for your program. They work by declaring some conditions as impossible in your code. If one of these conditions doesn’t hold that means there’s a bug in the program.

If your program is bug-free, these conditions will never occur. But if they do occur the program will crash with an assertion error telling you exactly which “impossible” condition was triggered. This makes it much easier to track down and fix bugs in your programs.

To summarize: Python’s assert statement is a debugging aid, not a mechanism for handling run-time errors. The goal of using assertions is to let developers find the likely root cause of a bug more quickly. An assertion error should never be raised unless there’s a bug in your program.

Assert in Python — An Example

Here’s a simple example so you can see where assertions might come in handy. I tried to give this some semblance of a real world problem you might actually encounter in one of your programs.

Suppose you were building an online store with Python. You’re working to add a discount coupon functionality to the system and eventually write the following apply_discount function:

def apply_discount(product, discount):
    price = int(product['price'] * (1.0 - discount))
    assert 0 <= price <= product['price']
    return price

Notice the assert statement in there? It will guarantee that, no matter what, discounted prices cannot be lower than $0 and they cannot be higher than the original price of the product.

Let’s make sure this actually works as intended if we call this function to apply a valid discount:

#
# Our example product: Nice shoes for $149.00
#
>>> shoes = {'name': 'Fancy Shoes', 'price': 14900}

#
# 25% off -> $111.75
#
>>> apply_discount(shoes, 0.25)
11175

Alright, this worked nicely. Now, let’s try to apply some invalid discounts:

#
# A "200% off" discount:
#
>>> apply_discount(shoes, 2.0)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    apply_discount(prod, 2.0)
  File "<input>", line 4, in apply_discount
    assert 0 <= price <= product['price']
AssertionError

#
# A "-30% off" discount:
#
>>> apply_discount(shoes, -0.3)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    apply_discount(prod, -0.3)
  File "<input>", line 4, in apply_discount
    assert 0 <= price <= product['price']
AssertionError

As you can see, trying to apply an invalid discount raises an AssertionError exception that points out the line with the violated assertion condition. If we ever encounter one of these errors while testing our online store it will be easy to find out what happened by looking at the traceback.

This is the power of assertions, in a nutshell.

Python’s Assert Syntax

It’s always a good idea to study up on how a language feature is actually implemented in Python before you start using it. So let’s take a quick look at the syntax for the assert statement according to the Python docs:

assert_stmt ::= "assert" expression1 ["," expression2]

In this case expression1 is the condition we test, and the optional expression2 is an error message that’s displayed if the assertion fails.

At execution time, the Python interpreter transforms each assert statement into roughly the following:

if __debug__:
    if not expression1:
        raise AssertionError(expression2)

You can use expression2 to pass an optional error message that will be displayed with the AssertionError in the traceback. This can simplify debugging even further—for example, I’ve seen code like this:

if cond == 'x':
    do_x()
elif cond == 'y':
    do_y()
else:
    assert False, ("This should never happen, but it does occasionally. "
                   "We're currently trying to figure out why. "
                   "Email dbader if you encounter this in the wild.")

Is this ugly? Well, yes. But it’s definitely a valid and helpful technique if you’re faced with a heisenbug-type issue in one of your applications. 😉

Common Pitfalls With Using Asserts in Python

Before you move on, there are two important caveats with using assertions in Python that I’d like to call out.

The first one has to do with introducing security risks and bugs into your applications, and the second one is about a syntax quirk that makes it easy to write useless assertions.

This sounds (and potentially is) pretty horrible, so you might at least want to skim these two caveats or read their summaries below.

Caveat #1 – Don’t Use Asserts for Data Validation

Asserts can be turned off globally in the Python interpreter. Don’t rely on assert expressions to be executed for data validation or data processing.

The biggest caveat with using asserts in Python is that assertions can be globally disabled with the -O and -OO command line switches, as well as the PYTHONOPTIMIZE environment variable in CPython.

This turns any assert statement into a null-operation: the assertions simply get compiled away and won’t be evaluated, which means that none of the conditional expressions will be executed.

This is an intentional design decision used similarly by many other programming languages. As a side-effect it becomes extremely dangerous to use assert statements as a quick and easy way to validate input data.

Let me explain—if your program uses asserts to check if a function argument contains a “wrong” or unexpected value this can backfire quickly and lead to bugs or security holes.

Let’s take a look at a simple example. Imagine you’re building an online store application with Python. Somewhere in your application code there’s a function to delete a product as per a user’s request:

def delete_product(product_id, user):
    assert user.is_admin(), 'Must have admin privileges to delete'
    assert store.product_exists(product_id), 'Unknown product id'
    store.find_product(product_id).delete()

Take a close look at this function. What happens if assertions are disabled?

There are two serious issues in this three-line function example, caused by the incorrect use of assert statements:

Checking for admin privileges with an assert statement is dangerous. If assertions are disabled in the Python interpreter, this turns into a null-op. Therefore any user can now delete products. The privileges check doesn’t even run. This likely introduces a security problem and opens the door for attackers to destroy or severely damage the data in your customer’s or company’s online store. Not good.
The product_exists() check is skipped when assertions are disabled. This means find_product() can now be called with invalid product ids—which could lead to more severe bugs depending on how our program is written. In the worst case this could be an avenue for someone to launch Denial of Service attacks against our store. If the store app crashes if we attempt to delete an unknown product, it might be possible for an attacker to bombard it with invalid delete requests and cause an outage.

How might we avoid these problems? The answer is to not use assertions to do data validation. Instead we could do our validation with regular if-statements and raise validation exceptions if necessary. Like so:

def delete_product(product_id, user):
    if not user.is_admin():
        raise AuthError('Must have admin privileges to delete')

    if not store.product_exists(product_id):
        raise ValueError('Unknown product id')

    store.find_product(product_id).delete()

This updated example also has the benefit that instead of raising unspecific AssertionError exceptions, it now raises semantically correct exceptions like ValueError or AuthError (which we’d have to define ourselves).

Caveat #2 – Asserts That Never Fail

It’s easy to accidentally write Python assert statements that always evaluate to true. I’ve been bitten by this myself in the past. I wrote a longer article about this specific issue you can check out by clicking here.

Alternatively, here’s the executive summary:

When you pass a tuple as the first argument in an assert statement, the assertion always evaluates as true and therefore never fails.

For example, this assertion will never fail:

assert(1 == 2, 'This should fail')

This has to do with non-empty tuples always being truthy in Python. If you pass a tuple to an assert statement it leads to the assert condition to always be true—which in turn leads to the above assert statement being useless because it can never fail and trigger an exception.

It’s relatively easy to accidentally write bad multi-line asserts due to this unintuitive behavior. This quickly leads to broken test cases that give a false sense of security in our test code. Imagine you had this assertion somewhere in your unit test suite:

assert (
    counter == 10,
    'It should have counted all the items'
)

Upon first inspection this test case looks completely fine. However, this test case would never catch an incorrect result: it always evaluates to True, regardless of the state of the counter variable.

Like I said, it’s rather easy to shoot yourself in the foot with this (mine still hurts). Luckily, there are some countermeasures you can apply to prevent this syntax quirk from causing trouble:

>> Read the full article on bogus assertions to get the dirty details.

Python Assertions — Summary

Despite these caveats I believe that Python’s assertions are a powerful debugging tool that’s frequently underused by Python developers.

Understanding how assertions work and when to apply them can help you write more maintainable and easier to debug Python programs. It’s a great skill to learn that will help bring your Python to the next level and make you a more well-rounded Pythonista.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Comprehending Python’s Comprehensions

Wed, 11 Jan 2017 00:00:00 GMT

Comprehending Python’s Comprehensions

One of my favorite features in Python are list comprehensions. They can seem a bit arcane at first but when you break them down they are actually a very simple construct.

The key to understanding list comprehensions is that they’re just for-loops over a collection expressed in a more terse and compact syntax. Let’s take the following list comprehension as an example:

>>> squares = [x * x for x in range(10)]

It computes a list of all integer square numbers from 0 to 9:

>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

If we wanted to build the same list using a plain for-loop we’d probably write something like this:

>>> squares = []
>>> for x in range(10):
...    squares.append(x * x)

That’s a pretty straightforward loop, right? If you try and generalize some of this structure you might end up with a template similar to this:

(values) = [ (expression) for (item) in (collection) ]

The above list comprehension is equivalent to the following plain for-loop:

(values) = []
for (item) in (collection):
    (values).append( (expression) )

Again, a fairly simple cookiecutter pattern you can apply to most for loops. Now there’s one more useful element we need to add to this template, and that is element filtering with conditions.

List comprehensions can filter values based on some arbitrary condition that decides whether or not the resulting value becomes a part of the output list. Here’s an example:

>>> even_squares = [x * x for x in range(10)
                    if x % 2 == 0]

This list comprehension will compute a list of the squares of all even integers from 0 to 9.

If you’re not familiar with what the modulo (%) operator does—it returns the remainder after division of one number by another. In this example the %-operator gives us an easy way to test if a number is even by checking the remainder after we divide the number by 2.

>>> even_squares
[0, 4, 16, 36, 64]

Similarly to the first example, this new list comprehension can be transformed into an equivalent for-loop:

even_squares = []
for x in range(10):
    if x % 2 == 0:
        even_squares.append(x * x)

Let’s try and generalize the above list comprehension to for-loop transform again. This time we’re going to add a filter condition to our template to decide which values end up in the resulting list.

Here’s the list comprehension template:

values = [expression
          for item in collection
          if condition]

And we can transform this list comprehension into a for-loop with the following pattern:

values = []
for item in collection:
    if condition:
        values.append(expression)

Again, this is a straightforward transformation—we simply apply our cookiecutter pattern again. I hope this dispelled some of the “magic” in how list comprehensions work. They’re really quite a useful tool.

Before you move on I want to point out that Python not only supports list comprehensions but also has similar syntax for sets and dictionaries.

Here’s what a set comprehension looks like:

>>> { x * x for x in range(-9, 10) }
set([64, 1, 36, 0, 49, 9, 16, 81, 25, 4])

And this is a dict comprehension:

>>> { x: x * x for x in range(5) }
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Both are useful tools in practice. There’s one caveat to Python’s comprehensions—as you get more proficient at using them it becomes easier and easier to write code that’s difficult to read. If you’re not careful you might have to deal with monstrous list, set, dict comprehensions soon. Remember, too much of a good thing is usually a bad thing.

After much chagrin I’m personally drawing the line at one level of nesting for comprehensions. I found that in most cases it’s better (as in “more readable” and “easier to maintain”) to use for-loops beyond that point.

📺🐍 Learn More With This Video Tutorial

I recorded a step-by-step video tutorial that teaches you how list comprehensions work in Python to go along with the article. Watch it embedded below or on my YouTube channel:

Key Takeaways

Comprehensions are a key feature in Python. Understanding and applying them will make your code much more Pythonic.
Comprehensions are just fancy syntax for a simple for-loop pattern. Once you understand the pattern, you’ll develop an intuitive understanding for comprehensions.
There are more than just list comprehensions.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

The Difference Between “is” and “==” in Python

Wed, 28 Dec 2016 00:00:00 GMT

The Difference Between “is” and “==” in Python

Python has two operators for equality comparisons, “is” and “==” (equals). In this article I’m going to teach you the difference between the two and when to use each with a few simple examples.

When I was a kid, our neighbors had two twin cats.

Both cats looked seemingly identical—same charcoal fur, same piercing green eyes. Some personality quirks aside, you just couldn’t tell them apart just from looking at them. But of course they were two different cats, two separate beings, even though they looked exactly the same.

There’s a difference in meaning between equal and identical. And this difference is important when you want to understand how Python’s is and == comparison operators behave.

The == operator compares by checking for equality: If these cats were Python objects and we’d compare them with the == operator, we’d get “both cats are equal” as an answer.

The is operator, however, compares identities: If we compared our cats with the is operator, we’d get “these are two different cats” as an answer.

But before I get all tangled up in this ball of twine of a cat analogy, let’s take a look at some real Python code.

First, we’ll create a new list object and name it a, and then define another variable b that points to the same list object:

>>> a = [1, 2, 3]
>>> b = a

Let’s inspect these two variables. We can see they point to identical looking lists:

>>> a
[1, 2, 3]
>>> b
[1, 2, 3]

Because the two list objects look the same we’ll get the expected result when we compare them for equality using the == operator:

>>> a == b
True

However, that doesn’t tell us whether a and b are actually pointing to the same object. Of course, we know they do because we assigned them earlier, but suppose we didn’t know—how might we find out?

The answer is comparing both variables with the is operator. This confirms both variables are in fact pointing to one list object:

>>> a is b
True

Let’s see what happens when we create an identical copy of our list object. We can do that by calling list() on the existing list to create a copy we’ll name c:

>>> c = list(a)

Again you’ll see that the new list we just created looks identical to the list object pointed to by a and b:

>>> c
[1, 2, 3]

Now this is where it gets interesting—let’s compare our list copy c with the initial list a using the == operator. What answer do you expect to see?

>>> a == c
True

Okay, I hope this was what you expected. What this result tells us is that c and a have the same contents. They’re considered equal by Python. But are they actually pointing to the same object? Let’s find out with the is operator:

>>> a is c
False

Boom—this is where we get a different result. Python is telling us that c and a are pointing to two different objects, even though their contents might be the same.

So, to recap let’s try and break the difference between is and == down to two short definitions:

An is expression evaluates to True if two variables point to the same (identical) object.
An == expression evaluates to True if the objects referred to by the variables are equal (have the same contents).

Just remember, think of twin cats (dogs should work, too) whenever you need to decide between using is and == in Python. You’ll be fine.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to Speak at a Python Conference

Tue, 13 Dec 2016 00:00:00 GMT

How to Speak at a Python Conference

My tutorial on how you can get a first-time speaking gig at a tech conference like PyCon as a software developer.

I gave a talk at PyCon Germany this year and I was just chatting with my friend Sergei who wants to get into presenting at tech conferences and is looking for a way to get started. Here’s Segei’s question:

My goal for now it to get ready for the spring season of conferences, and pick one where I could present something.

I want to try presenting stuff on conferences, as it is a great public speaking experience for me and… free attendance of a good conference :)

Where do you start if you want to present something?

If you’re thinking about speaking at a tech conference that’s a fantastic idea! It’s a lot of fun (it’s a lot of work, too)—but it’s totally worth it.

It’s great for getting in touch with people, making new connections, finding a new or better job, and to improve your public speaking skills. Plus, you get tagged as a speaker on your conference badge—a great conversation starter 😜

Alright, I’m sold—How do I become a first-time conference speaker?

For most tech conferences you can become a speaker by applying with a talk proposal a few months before the conference takes place.

Basically, all the legwork is on you. You’re picking the topic and pitching it to the conference organizers and they decide which talks they accept. If you’ve built a bit of a reputation for yourself then conference organizers might contact you and ask you to speak at their conference. But for first time speakers that’s unlikely.

The key thing here is to apply early enough and before the “call for proposals” deadline. The deadline usually ends well in advance before the conference takes place. It absolutely helps and increases your chances to get in if you plan ahead of time.

Where can I find upcoming conferences to speak at?

For conferences related to Python there’s the Python Events calendar at:

Use the events calendar to find a conference that looks interesting. Then check out the conference website to see what the application process is like and when the call for proposals (submission deadline) ends.

At this point you should know which event you want to speak at, who to send your talk proposal to, and when the submission deadline is. The next step is to put together and send in your talk proposal.

How do I write a good talk proposal?

The most important thing you need to realize about talk proposals is that they’re all about the organizers. You need to put yourself in the shoes of the organizers when you’re writing the proposal—that’s the biggest trick to getting your first talk accepted.

Your goal is to convince the organizers that:

inviting you as a speaker won’t embarrass them; and
their audience is likely going to enjoy your talk.

Keep this in mind while writing your proposal and it’ll increase your chances to get your talk accepted immensely. Whatever information you can give to convince the organizers your talk will do these two things for them helps.

For example, be sure to mention the intended audience for your talk and to share your previous speaking experience (meetup/work/university presentations). Also mention your own expertise on the talk’s topic.

If you don’t have previous public speaking experience it can be a bit tough to get into conferences at first. To “hack” this system you could also point people to a YouTube video tutorial you recorded. That way the organizers can get a sense for your presentation skills (I always include a link to my YouTube channel for that reason.)

To make this a bit easier for you I’ve put together a conference proposal template you can use. Click here and I’ll email it over to you.

My template will help you cover all of the relevant info. I used this template to get a talk into PyCon Germany not long ago, for example. Good luck 😃

What happens after I send in my proposal?

After you’ve sent in your proposal all you need to do is wait for an answer. If you get accepted—congratulations, you’ll be speaking at a conference soon!

Usually you won’t get feedback when your talk gets rejected (it’s kind of similar to a job interview that way). But don’t be too upset if your talk is rejected! You can always try again with the same talk at a different conference.

My gut feeling is that the biggest factor in whether or not your talk gets accepted is “quality indicators” like your previous speaking experience—so be sure to talk about that in your proposal.

It’ll get much easier to get your talks accepted once you’ve spoken a few times.

Can I attend for free if I’m speaking?

Most non-commercial tech conferences don’t waive the attendance fee even if you’re a speaker. I believe it’s different with keynote speakers but if you’re a regular speaker at a small to medium size conference you’ll often pay the full fee or a slightly reduced fee. Still worth it though 😊

Got any more tips?

Sure do! Watch this video I recorded to get some extra tips on how to bootstrap your speaking career (there’s more videos on this playlist):

Extra Resources

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Cool new features in Python 3.6

Thu, 08 Dec 2016 00:00:00 GMT

Cool new features in Python 3.6

Python 3.6 adds a couple of new features and improvements that’ll affect the day to day work of Python coders. In this article I’ll give you an overview of the new features I found the most interesting.

Improved numeric literals

This is a syntactic tweak that makes numeric literals easier to read. You can now add underscores to numbers in order to group them to your liking. This is handy for expressing large quantities or constants in binary or hexadecimal:

>>> six_figures = 100_000
>>> six_figures
100000

>>> programmer_error = 0xbad_c0ffee
>>> flags = 0b_0111_0101_0001_0101

Remember, this change doesn’t introduce any new semantics. It’s just a way to represent numeric literals differently in your source code. A small but neat addition.

You can learn more about this change in PEP 515.

String interpolation

Python 3.6 adds yet another way to format strings called Formatted String Literals. This new way of formatting strings lets you use embedded Python expressions inside string constants. Here’s are two simple examples to give you a feel for the feature:

>>> name = 'Bob'
>>> f'Hello, {name}!'
'Hello, Bob!'

>>> a = 5
>>> b = 10
>>> f'Five plus ten is {a + b} and not {2 * (a + b)}.'
'Five plus ten is 15 and not 30.'

String literals also support the existing format string syntax of the str.format() method. That allows you to do things like:

>>> error = 50159747054
>>> f'Programmer Error: {error:#x}'
'Programmer Error: 0xbadc0ffee'

Python’s new Formatted String Literals are similar to the JavaScript Template Literals added in ES2015/ES6. I think they’re quite a nice addition to the language and I look forward to using them in my day to day work.

You can learn more about this change in PEP 498.

Type annotations for variables

Starting with Python 3.5 you could add type annotations to functions and methods:

>>> def my_add(a: int, b: int) -> int:
...    return a + b

In Python 3.6 you can use a syntax similar to type annotations for function arguments to type-hint standalone variables:

>>> python_version : float = 3.6

Nothing has changed in terms of the semantics–CPython simply records the type as a type annotation but doesn’t validate or check types in any way.

Type-checking is purely optional and you’ll need a tool like Mypy for that, which basically works like a code linter.

You can learn more about this change in PEP 526.

Watch a video summary of the best new features in Python 3.6

Other notable changes

I think Python 3.6 will be an interesting release. There are many more interesting additions and improvements that are worth checking out. You can learn more about them in the links below or by reading the official “What’s New In Python 3.6” announcement.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

3 Reasons why you need a programming blog

Wed, 30 Nov 2016 00:00:00 GMT

3 Reasons why you need a programming blog

One of the best things I ever did for my dev career: A little story and three reasons why you should start a programming portfolio website right now.

At PyCon Germany I chatted with Astrid, a freelance Python (Django) developer looking for ways to improve her career and to find more contracts.

Astrid seemed quite frustrated with her situation—it was tough for her to get the contracts and jobs she really wanted.

Often when she sent out her resume for more desirable gigs she wouldn’t even receive an answer. It sounded like she was stuck with a certain quality of clients and couldn’t really push past that invisible barrier.

I always love to help a sister (or brother) out and went into full-on diagnosing mode. Usually I just end up spouting unsolicited advice in these situations but with Astrid I think I actually hit the nail on the head… 😉

Eventually I asked Astrid if she had a website or blog as a “programmer portfolio” of sorts.

She did not.

And I think that was a BIG mistake –

Looking back I’d say starting my personal website at dbader.org was probably the best thing I ever did for my programming career:

Reason #1: Employers loved it–it made it much easier to get interviews

In fact once I had my website up for a while companies started contacting me through it. And they were no longer the crappy recruiter emails I got through LinkedIn, but from managers and dev leads at companies that I found actually interesting.

Reason #2: It was easier to get started than I thought

I launched my site with just 3 articles I wrote over the holidays hanging out with my family one year. I was surprised to find I got more (not less) traffic over time even though I didn’t post new stuff constantly. More people started linking to my posts and they ranked higher in Google (also search engines seem to favor content that has been around for a while). It was incredibly fun to see that growth and to find new ways of reaching developers.

Reason #3: It put me in touch with so many fine folks (like you!)

Most of the places I lived in didn’t have strong software dev / meetup communities. Starting a website was a fantastic way to make friends with other developers around the world and to exchange ideas.

How you can get started today

I know it seems super difficult to get everything set up in the beginning. And the work involved can seem kind of boring at first… “it’s just a website”.

What finally got me started with setting up my own website was turning it into a programming exercise.

Instead of using a pre-fab framework like Wordpress I wrote my own Python framework for generating the website.

I figured even if I wouldn’t follow through with the site I’d learn some web development skills in the process… And this was exactly true 😃

Putting myself in Astrid’s shoes again I really believe every software developer should have a personal website. The time investment is so small in comparison to the awesome benefits and opportunities it can generate for you.

If you’re sold on the idea of starting a programming blog but you don’t know how to go about it yet then check out this video I created for you.

In the video embedded below I’m going over my own website as an example and how it looks very different today compared to when I started it in 2012.

It doesn’t take much to get started with your own programming blog or portfolio website and the benefits can be huge.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

A Python Riddle: The Craziest Dict Expression in the West

Wed, 23 Nov 2016 00:00:00 GMT

A Python Riddle: The Craziest Dict Expression in the West

Let’s pry apart this slightly unintuitive Python dictionary expression to find out what’s going on in the uncharted depths of the Python interpreter.

Sometimes you strike upon a tiny code example that has real depth to it—a single line of code that can teach you a lot about a programming language if you ponder it enough. Such a code snippet feels like a Zen kōan: a question or statement used in Zen practice to provoke doubt and test the student’s progress.

The tiny little code snippet we’ll discuss in this tutorial is one such example. Upon first glance, it might seem like a straightforward dictionary expression, but when considered at close range, it takes you on a mind-expanding journey through the CPython interpreter.

I get such a kick out of this little one-liner that at one point I had it printed on my Python conference badges as a conversation starter. It also led to some rewarding conversations with members of my Python newsletter.

So without further ado, here is the code snippet. Take a moment to reflect on the following dictionary expression and what it will evaluate to:

>>> {True: 'yes', 1: 'no', 1.0: 'maybe'}

I’ll wait here…

Ok, ready?

This is the result we get when evaluating the above dict expression in a CPython interpreter session:

>>> {True: 'yes', 1: 'no', 1.0: 'maybe'}
{True: 'maybe'}

I’ll admit I was pretty surprised about this result the first time I saw it. But it all makes sense when you investigate what happens, step by step. So, let’s think about why we get this—I want to say slightly unintuitive—result.

Where Baby Dictionaries Come From

When Python processes our dictionary expression, it first constructs a new empty dictionary object; and then it assigns the keys and values to it in the order given in the dict expression.

Therefore, when we break it down, our dict expression is equivalent to this sequence of statements that are executed in order:

>>> xs = dict()
>>> xs[True] = 'yes'
>>> xs[1] = 'no'
>>> xs[1.0] = 'maybe'

Oddly enough, Python considers all dictionary keys used in this example to be equal:

>>> True == 1 == 1.0
True

Okay, but wait a minute here. I’m sure you can intuitively accept that 1.0 == 1, but why would True be considered equal to 1 as well? The first time I saw this dictionary expression it really stumped me.

After doing some digging in the Python documentation, I learned that Python treats bool as a subclass of int. This is the case in Python 2 and Python 3:

“The Boolean type is a subtype of the integer type, and Boolean values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that when converted to a string, the strings ‘False’ or ‘True’ are returned, respectively.” (Source)

And yes, this means you can technically use bools as indexes into a list or tuple in Python:

>>> ['no', 'yes'][True]
'yes'

But you probably should not use boolean variables like that for the sake of clarity (and the sanity of your colleagues.)

Anyway, let’s come back to our dictionary expression.

As far as Python is concerned, True, 1, and 1.0 all represent the same dictionary key. As the interpreter evaluates the dictionary expression, it repeatedly overwrites the value for the key True. This explains why, in the end, the resulting dictionary only contains a single key.

Before we move on, let’s have another look at the original dictionary expression:

>>> {True: 'yes', 1: 'no', 1.0: 'maybe'}
{True: 'maybe'}

Why do we still get True as the key here? Shouldn’t the key also change to 1.0 at the end, due to the repeated assignments?

After some mode research in the CPython interpreter source code, I learned that Python’s dictionaries don’t update the key object itself when a new value is associated with it:

>>> ys = {1.0: 'no'}
>>> ys[True] = 'yes'
>>> ys
{1.0: 'yes'}

Of course this makes sense as a performance optimization—if the keys are considered identical, then why spend time updating the original? In the last example you saw that the initial True object is never replaced as the key. Therefore, the dictionary’s string representation still prints the key as True (instead of 1 or 1.0.)

With what we know now, it looks like the values in the resulting dict are getting overwritten only because they compare as equal. However, it turns out that this effect isn’t caused by the __eq__ equality check alone, either.

Wait, What About the Hash Code?

Python dictionaries are backed by a hash table data structure. When I first saw this surprising dictionary expression, my hunch was that this behavior had something to do with hash collisions.

You see, a hash table internally stores the keys it contains in different “buckets” according to each key’s hash value. The hash value is derived from the key as a numeric value of a fixed length that uniquely identifies the key.

This allows for fast lookups. It’s much quicker to search for a key’s numeric hash value in a lookup table instead of comparing the full key object against all other keys and checking for equality.

However, the way hash values are typically calculated isn’t perfect. And eventually, two or more keys that are actually different will have the same derived hash value, and they will end up in the same lookup table bucket.

If two keys have the same hash value, that’s called a hash collision, and it’s a special case that the hash table’s algorithms for inserting and finding elements need to handle.

Based on that assessment, it’s fairly likely that hashing has something to do with the surprising result we got from our dictionary expression. So let’s find out if the keys’ hash values also play a role here.

I’m defining the following class as our little detective tool:

class AlwaysEquals:
     def __eq__(self, other):
         return True

     def __hash__(self):
         return id(self)

This class is special in two ways.

First, because its __eq__ dunder method always returns True, all instances of this class will pretend they’re equal to any other object:

>>> AlwaysEquals() == AlwaysEquals()
True
>>> AlwaysEquals() == 42
True
>>> AlwaysEquals() == 'waaat?'
True

And second, each AlwaysEquals instance will also return a unique hash value generated by the built-in id() function:

>>> objects = [AlwaysEquals(),
               AlwaysEquals(),
               AlwaysEquals()]
>>> [hash(obj) for obj in objects]
[4574298968, 4574287912, 4574287072]

In CPython, id() returns the address of the object in memory, which is guaranteed to be unique.

With this class we can now create objects that pretend to be equal to any other object but have a unique hash value associated with them. That’ll allow us to test if dictionary keys are overwritten based on their equality comparison result alone.

And, as you can see, the keys in the next example are not getting overwritten, even though they always compare as equal:

>>> {AlwaysEquals(): 'yes', AlwaysEquals(): 'no'}
{ <AlwaysEquals object at 0x110a3c588>: 'yes',
  <AlwaysEquals object at 0x110a3cf98>: 'no' }

We can also flip this idea around and check to see if returning the same hash value is enough to cause keys to get overwritten:

class SameHash:
    def __hash__(self):
        return 1

Instances of this SameHash class will compare as non-equal with each other but they will all share the same hash value of 1:

>>> a = SameHash()
>>> b = SameHash()
>>> a == b
False
>>> hash(a), hash(b)
(1, 1)

Let’s look at how Python’s dictionaries react when we attempt to use instances of the SameHash class as dictionary keys:

>>> {a: 'a', b: 'b'}
{ <SameHash instance at 0x7f7159020cb0>: 'a',
  <SameHash instance at 0x7f7159020cf8>: 'b' }

As this example shows, the “keys get overwritten” effect isn’t caused by hash value collisions alone either.

Umm Okay, What’s the Executive Summary Here?

Python dictionaries check for equality and compare the hash value to determine if two keys are the same. Let’s try and summarize the findings of our investigation:

The {True: 'yes', 1: 'no', 1.0: 'maybe'} dictionary expression evaluates to {True: 'maybe'} because the keys True, 1, and 1.0 all compare as equal, and they all have the same hash value:

>>> True == 1 == 1.0
True
>>> (hash(True), hash(1), hash(1.0))
(1, 1, 1)

Perhaps not-so-surprising anymore, that’s how we ended up with this result as the dictionary’s final state:

>>> {True: 'yes', 1: 'no', 1.0: 'maybe'}
{True: 'maybe'}

We touched on a lot of subjects here, and this particular Python Trick can be be a bit mind-boggling at first—that’s why I compared it to a Zen kōan in the beginning.

If it’s difficult to understand what’s going on in this tutorial, try playing through the code examples one by one in a Python interpreter session. You’ll be rewarded with an expanded knowledge of Python’s internals.

It’s a Python Trick!

There’s one more thing I want to tell you about:

I’ve started a series of these Python “tricks” delivered over email. You can sign up at dbader.org/python-tricks and I’ll send you a new Python trick as a code screenshot every couple of days.

This is still an experiment and a work in progress but I’ve heard some really positive feedback from the developers who’ve tried it out so far.

Thanks to JayR, Murat, and kurashu89 for their feedback on this article.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

The Ultimate List of Python Podcasts

Tue, 15 Nov 2016 00:00:00 GMT

The Ultimate List of Python Podcasts

I couldn’t find a good and updated list of Python developer or Python programming podcasts online. So I created my own list with the best Python podcasts.

I enjoy listening to all kinds of podcasts when I’m at the gym or driving. There are some really good podcasts about Python development out there but I just couldn’t find a good (and updated) list.

I initially created this list on forum posts and by searching the iTunes podcast directory and I will continue to grow it with user feedback. I plan to keep this list updated, so feel free to shoot me an email if you think anything is missing or if you’d like to see your own Python podcast added.

My criteria for inclusion on this list are:

episode download links must work; and
the podcast must be active (new episodes are coming out) OR at least have an interesting archive with old episodes worth listening to.

Enjoy the podcasts! 🎙🐍

The Real Python Podcast

A weekly Python podcast hosted by Christopher Bailey with interviews, coding tips, and conversation with guests from the Python community. The show covers a wide range of topics including Python programming best practices, career tips, and related software development topics. Join us to hear what’s new in the world of Python programming and become a more effective Pythonista.

Website: realpython.com/podcast
Twitter: @realpython
Subscribe: RSS ⋅ iTunes ⋅ Overcast.fm

Talk Python To Me

Talk Python to Me is a weekly podcast hosted by Michael Kennedy. The show covers a wide array of Python topics as well as many related topics (e.g. MongoDB, AngularJS, DevOps). The format is a casual 45 minute conversation with industry experts.

Website: talkpython.fm
Twitter: @TalkPython
Subscribe: RSS ⋅ iTunes ⋅ Overcast.fm

Podcast.init

A podcast about Python and the people who make it great. Hosted by Tobias Macey.

Website: www.podcastinit.com
Twitter: @Podcast__init__
Subscribe: RSS ⋅ iTunes ⋅ Overcast.fm

Python Bytes

Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. Python Bytes podcast delivers headlines directly to your earbuds. If you want to stay up on the Python developer news but don’t have time to scour reddit, twitter, and other news sources, just subscribe and you’ll get the best picks delivered weekly.

Website: pythonbytes.fm
Twitter: @PythonBytes
Subscribe: RSS ⋅ iTunes ⋅ Overcast.fm

Test & Code

A podcast about Software Development, Software Testing, and Python. How did you become a software developer/tester/engineer/lead, etc? Odds are we all are missing some important information to do our jobs most effectively. This podcast is an attempt to fill those education gaps. I focus on testing and process questions like “How do I know it works?”, “How do I effectively test?”, and the like. But really, anything in the software development realm is fair game.

Website: pythontesting.net/podcast
Twitter: @TestPodcast
Subscribe: RSS ⋅ iTunes ⋅ Overcast.fm

Teaching Python

We’re two teachers from South Florida teaching Python to middle school students. One of us has taught for a long time and just recently started coding in Python. The other one is making this website. Our goal is to help teachers with the art and science of teaching Python so that more students can learn how to code.

Website: www.teachingpython.fm
Twitter: @teachingpython
Subscribe: RSS ⋅ iTunes ⋅ Overcast.fm

Import This: A Python Podcast for Humans

Featuring Kennneth Reitz and a random Python co-host.

Website: www.kennethreitz.org/import-this/
Twitter: @kennethreitz
Subscribe: RSS ⋅ iTunes ⋅ Overcast.fm

Radio Free Python

A monthly podcast focused on the Python programming language and its community.

(No updates since 2013 but an interesting archive of episodes.)

Website: radiofreepython.com
Twitter: @RadioFreePython
Subscribe: RSS

from python import podcast

A small-batch artisanal podcast for irreverent pythonistas. Easy-going, conversational, often silly, and occasionally earning our iTunes “explicit” tag, From Python Import Podcast is news, analysis, discussion, and general shenanigans about the Python language and community. Put on your headphones and come hang out with us!

(No updates since 2014 but an interesting archive of episodes.)

Website: frompythonimportpodcast.com
Twitter: @__fpip__
Subscribe: RSS ⋅ iTunes ⋅ Overcast.fm

Castálio Podcast (Portuguese & English)

Um podcast inspirado prá castálio! Podcast com o objetivo de entrevistar e ao mesmo tempo apresentar pessoas e projetos que sejam fonte de inspiração para os ouvintes. Novos episódios toda semana.

A primarily Portuguese podcast that often features Python topics and people in the Python community.

Website: castalio.info
Twitter: @castaliopod
Subscribe: RSS ⋅ iTunes ⋅ Overcast.fm

If you think anything is missing from this list or if you’d like to see your own Python podcast added, then please email me at mail@dbader.org.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How code linting will make you awesome at Python

Thu, 10 Nov 2016 00:00:00 GMT

How code linting will make you awesome at Python

In Python code reviews I’ve seen over and over that it can be tough for developers to format their Python code in a consistent way: extra whitespace, irregular indentation, and other “sloppiness” then often leads to actual bugs in the program.

Luckily automated tools can help with this common problem. Code linters make sure your Python code is always formatted consistently—and their benefits go way beyond that.

What code linters can do for you

A code linter is a program that analyses your source code for potential errors. The kinds of errors a linter can detect include:

syntax errors;
structural problems like the use of undefined variables;
best practice or code style guideline violations.

I find code linting to be an indispensable productivity tool for writing Python. It’s possible to integrate linting into your editing environment. This gives you immediate feedback on your code right when you type it:

For some classes of errors, linting can shorten the usual write code, run code, catch error, fix error loop to write code, see and fix error. This difference might not seem much—but in the course of a day these time savings add up quickly and can have a huge impact on your productivity.

In short, code linters are great!

Which Python linter should I use?

Python has several good options for code linters. The ones I’m listing here are available for free and are open-source software:

Flake8 is my personal favorite these days. It’s fast and has a low rate of false positives. Flake8 is actually a combination of several other tools, mainly the Pyflakes static analysis tool and the Pycodestyle (former pep8) code style checker.
Pylint is another good choice. It takes a little more effort to set up than Flake8 and also triggers more false positives. On the other hand it provides a more comprehensive analysis. Definitely not a bad choice—but I’d stick with Flake8 if you’re just starting out.

I’m sold—what’s the quickest way to get started?

If you’re not using a linter yet you’re missing out on some really great benefits. But don’t worry, I’ve got your back—I recorded a 5 minute Python linting video tutorial you can watch below.

In the video I’ll give you the run down on how to set up the Flake8 Python linter from scratch. With a few simple steps you’ll be able run a code linter on your own Python programs. I’ll also demonstrate how linter feedback can be integrated with your code editor (I’m using Sublime Text 3 in the video).

I’ve seen great results from using linters. I believe they’re one of the quickest ways to improve your Python skills. Spend 5 minutes to try out Flake8—I’m sure it’ll be well worth your time 😊

Enjoy the video:

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

A Python refactoring gone wrong

Thu, 03 Nov 2016 00:00:00 GMT

A Python refactoring gone wrong

Ever witnessed a colleague make a refactoring to “clean up” some Python code only to make it worse and harder to understand?

I know I did. And I’ve also been that colleague to others many times 😊

There’s often a fine line between making code better by “cleaning it up” and just shuffling around or even making it slightly worse. Refactoring is hard!

It’s challenging to come up with good examples for this – so I was delighted when I got this Python question from Bev:

I came across something in Python that I’m having difficulty understanding. As part of a code rewrite in a Python related Youtube video, an “if” statement was changed to:

if any([self.temperature > MAX_TEMPERATURE,
        self.pressure > MAX_PRESSURE]):

Why was that used rather than the simpler:

if (self.temperature > MAX_TEMPERATURE
    or self.pressure > MAX_PRESSURE):

Why create a list and call a function in the if statement when there are only two (2) comparisons?

I agree with Bev, this is a surprising change and I don’t think it’s for the better!

It complicates the code for no apparent gain.

Let’s take a look at the definition for the any() function first:

Return True if any element of the iterable is true. If the iterable is empty, return False (Source: Python docs)

any() – and its colleague, all() – are handy if you need to check the elements of an iterable (like a list or a generator) for truthiness.

In some cases using any() can help avoid having to write a loop to check the elements of the iterable individually¹. Imagine you needed to do something like this:

result = False
for elem in my_iterable:
    if elem:
        result = True
        break

You could replace these 5 lines with a simple assignment using the any() function:

result = any(my_iterable)

In this case it makes sense to go with the any() solution because it is shorter and easier to understand.

Yet, too much of a good thing can make you sick… 😷

In my mind it doesn’t make much sense to use any() if all you have is a fixed size list with 2-3 elements, like in the example Bev found:

Constructing a list so we have an iterable to pass to any() is confusing. It adds visual clutter and is a non-standard pattern.
On top of that, using any() is also going to be slower than a nice and simple “or” expression: The Python interpreter first needs to construct that list and then call any() on it².

In summary, I think this refactoring was not helpful. I’d say it actually made the code worse by making it slightly harder to read and understand.

As developers we need to be careful and deliberate about using the tools and patterns we know. “Fancy” often won’t equal “better”.

Great question, Bev! I think you were spot-on in doubting this refactoring 😃

It’s good to note that any() has short-circuit evaluation semantics, so using it is equivalent to a chain of or operations. (Thanks to Lev Maximov for his feedback.) ↩
In all fairness, this won’t make a big real-world difference in performance or memory usage – but it’s always good to keep these things in mind. ↩

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python Code Review: Unplugged – Episode 3

Wed, 26 Oct 2016 00:00:00 GMT

Python Code Review: Unplugged – Episode 3

In this third episode of my video code review series I take a look at a reader’s web scraping project and start adding some unit tests to it.

This is a Python code review I did for Sunny’s web scraping project on GitHub. Sunny reached out to me after watching one of my previous code review videos, asking me if I could give him some feedback on his web scraping pet project.

In this episode you’ll see Flake8 and Python code linting tools make a comeback. Also I’m doing an intro to adding Pytest unit tests to an existing Python code base in the second half of the video.

By the way, I love how eager Sunny was to get feedback on his Python code:

As a pet project I wanted to learn all the best practice as much as I could about python, trying to code as pythonic as I can, checking pep8 and autopep8 (didn’t knew about flake8!!) Eventually I re-factored using classes and methods. If you’ll see that it originally did everything in one monolithic function in the 2.x pull request on github. I still have a few things that I want to implement, so I welcome all kinds of feedback about it!

This is exactly the right mindset that turns people into productive and successful software engineers. Even tiny – but constant – improvements add up and compound over time.

I’ve seen this in friends and colleagues alike. Those developers who seek out constant small improvements eventually go on and do amazing things.

Enjoy the video! And be sure to check out my other Python screencasts if you liked this code review 😊

Links & Resources:

» Click here to watch my other Python Code Review: Unplugged videos

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Click & jump to any file or folder from the terminal

Mon, 17 Oct 2016 00:00:00 GMT

Click & jump to any file or folder from the terminal

iTerm2 for macOS has a little known feature that lets you open files and folders simply by Cmd+Clicking on them in the terminal. Among other things, this is super handy for debugging tests.

With this so called Semantic History feature you can configure iTerm2 to open folders and files in their default application when you press Cmd and then click on them.

So if you click on a folder name it will open in the Finder, and if you click on a .py file, for example, it will open in your editor.

The amazingly cool part is that this also works with line numbers, so if you click on something like test_myapp.py:42 in the terminal your editor opens test_myapp.py and moves the cursor to line 42! 😀

This unbelievably handy if you’re running your unit tests from the command line. I use it all the time to click and jump to failed test cases with the Pytest test runner, for example.

Here’s how to set up Semantic History in iTerm2:

Open the iTerm2 preferences by clicking on iTerm2 → Preferences in the menu bar (or press Cmd+,)
Click on Profiles in the top row, then click Advanced all the way to the right. Find the section that says Semantic History.
Under Semantic History, set the first option to Open with editor… and then pick your favorite editor (I use Sublime Text 3).
Close the preferences window – that’s it!

If you need some more help setting this up and a quick demo of what you can do with this feature, watch my video below:

Like I said, I found this “click to jump to file” feature extremely helpful for working with tests.

I usually run my Python tests with Pytest and it prints test failure messages in a format that iTerm2 understands. So I can simply Cmd+click on a failed test assertion and that’ll open up the test case Sublime Text, placing the cursor at the exact line that caused the test to fail.

This feature should be completely language agnostic by the way. You’ll be able to use it with any test runner or programming language – and any editor.

P.S. Unfortunately iTerm2 is only available on macOS. I’d love to learn if there’s a way to get the same functionality on Windows or Linux, so far I haven’t been able to find anything. If you know how to do this on Linux or Windows please get in touch and tell me how to do it :) Thanks!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Sublime Text plugin review: Djaneiro

Tue, 11 Oct 2016 00:00:00 GMT

Sublime Text plugin review: Djaneiro

A review of Djaneiro, a Sublime Text plugin for Django development.

I’ll admit I was skeptical at first when a friend of mine recommended Djaneiro to enhance my Django development workflow in Sublime Text.

I’d been happy with the Python development setup I built for myself over the years and I didn’t really understand what Djaneiro was going to add to that.

But when I tried out Djaneiro I was impressed how helpful it turned out to be! I decided to write another Sublime Text plugin review to share my findings.

Djaneiro’s main selling points are adding:

syntax highlighting for Django HTML templates; and
code completion snippets for Django HTML templates and Python files.

In this review I’ll explain how Djaneiro can make your Django development workflow more productive and I’ll go over the pros and cons of the plugin as I experienced them. After that I’ll take a look at alternatives to Djaneiro in the Sublime Text plugin landscape. At the end I’ll share my final verdict and rating.

Pros

Syntax highlighting for Django templates: In its default configuration Sublime Text doesn’t have syntax definitions for Django’s HTML templating syntax.

This means that typos and syntax errors in templates are harder to catch visually. As you can see in the screenshot below (in the editing pane on the left), the standard HTML syntax highlighting in Sublime Text 3 uses a uniform white color for Django’s template tags.

Djaneiro adds a HTML (Django) syntax that properly highlights Django’s template tags. As you can see in the right-hand editing pane in the screenshot, proper syntax highlighting makes these templates quite a bit easier to read. Also, syntax errors and typos stand out more due to the proper highlighting.

This simple change adds a lot of value – I found that I was making fewer typos in my templates with Djaneiro’s syntax highlighting. Also, templates seemed easier to read and scan quickly with Djaneiro installed.

Improved syntax highlighting for Django Python files: Djaneiro also makes some small tweaks to the default Python syntax highlighting. For example, it knows the standard Django settings constants like INSTALLED_APPS and highlights them differently so that they stand out more and typos are easier to find.

This also happens for things like field definitions when writing Django model classes, which I found handy. In summary I found that the syntax highlighting changes introduced by Djaneiro make it easier to grasp the structure of the code I’m writing.

Code completion snippets for Django templates and Python files: Another helpful feature provided by Djaneiro is a library of pre-made code completion snippets for common Django code and patterns.

For example, you can insert an {% if _____ %} {% endif %} block by typing if as an abbreviation and hitting the auto-complete key (Tab by default). You can see a quick demo of that in the screenshot below. Generally, I found the list of snippets included with Djaneiro to be comprehensive and well-chosen.

Besides snippets for Django HTML templates Djaneiro also includes a snippet library for Django Python code. These snippets let you quickly scaffold out whole view definitions or a barebones model classes, for example.

Once you’ve gotten used to these snippets they can save you a lot of typing. Be sure to check out the full list of snippets in the Djaneiro README.

Cons

Snippets might get in the way: Because Djaneiro adds quite a substantial number of new code snippets I found myself triggering some of them accidentally, especially in the beginning. I really don’t want to hold this up against Djaneiro because the snippets do add a lot of value once I learned to use them well.

It’s possible to disable individual code snippets in Sublime Text but unfortunately this process is a bit involved.

If you find that the snippets get in your way occasionally you can temporarily switch them off by selecting a different syntax highlighting definition. Just open the Sublime Text Command Palette, type Set syntax, and select the default HTML or Python syntax.

Alternatives

There are a few more Django-specific plugins available on Package Control but Djaneiro seems to be the most popular and also the most powerful of the pack.

The verdict

I’ve grown quite fond of Djaneiro since I started using it. I immediately loved the improved syntax highlighting for Django templates and I’d say Djaneiro is worth installing for that feature alone.

Once I’d gotten the hang of Djaneiro’s code snippets and their shortcodes I felt a noticeable improvement in my productivity. The snippets added by Djaneiro cover many things I encountered in day to day Django development. It’s great not having to go through the work of writing these snippets myself.

I’d recommend any Django developer using Sublime Text to at least try out Djaneiro for a few days. There aren’t any substantial downsides to it and I’m sure it will make you more productive. You can install Djaneiro via Package Control.

🐍 🐍 🐍 🐍 🐍 (5 out of 5 snakes)

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

The Complete Guide to Setting up Sublime Text for Python Developers – Now Available!

Tue, 04 Oct 2016 00:00:00 GMT

The Complete Guide to Setting up Sublime Text for Python Developers – Now Available!

Hey folks, I’m super excited to announce the launch of my first book – It’s called “The Complete Guide to Setting up Sublime Text for Python Developers”.

It’s a detailed, step-by-step guidebook aimed at getting you to a kickass, professional-grade Python development setup built around Sublime Text in the shortest amount of time possible.

I created this because I’ve been using Sublime Text for almost four years now in my Python workflow and I think it’s an amazing combo.

However I kept getting so many emails and questions about this development setup when I used it in my screencasts.

That made me realize how difficult it can be to set up an enjoyable Python development environment – and I decided to do something about it by writing the ULTIMATE setup guide for Sublime Text + Python 😃.

If you want to become a better and more productive developer then this guide is really going to help you get more out of your Python workflow.

Check out SublimeTextPython.com to see what it’s all about! Thanks so much for your support! Enjoy the guide and let me know what you think!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python Code Review: Unplugged – Episode 2

Tue, 27 Sep 2016 00:00:00 GMT

Python Code Review: Unplugged – Episode 2

This is the second episode of my video code review series where I record myself giving feedback and refactoring a reader’s Python code.

The response to the first Code Review: Unplugged video was super positive. I got a ton of emails and comments on YouTube saying that the video worked well as a teaching tool and that I should do more of them.

And so I did just that 😃. Milton sent me a link to his Python 3 project on GitHub and I recorded another code review based on his code. You can watch it below:

Milton is on the right track with his Python journey. I liked how he used functions to split up his web scraper program into functions that each handle a different phase, like fetch the html, parse it, and generate the output file.

The main thing that this code base could benefit from would be consistent formatting. Making the formatting as regular and consistent as possible really helps with keeping the “mental overhead” low when you’re working on the code or handing it off to someone else.

And the beautiful thing is that there’s an easy fix for this, too. I demo a tool called Flake8 in the video. Flake8 is a code linter and code style checker – and it’s great for making sure your code has consistent formatting and avoids common pitfalls or anti-patterns.

You can even integrate Flake8 into your editing environment so that it checks your code as you write it.

(Shameless plug: The book I’m working on has a whole chapter on integrating Flake8 into the Sublime Text editor. Check it out if you’d like to learn how to set up a Python development environment just like the one I’m using in the video).

Besides formatting, the video also covers things like writing a great GitHub README, how to name functions and modules, and the use of constants to simplify your Python code. So be sure to watch the whole thing when you get the chance.

Again, I left the video completely unedited. That’s why I’m calling this series Code Review: Unplugged. It’s definitely not a polished tutorial or course. But based on the feedback I got so far that seems to be part of the appeal.

Links & Resources:

One more quick tip for you: You can turn these videos into a fun Python exercise for yourself. Just pause the video before I dig into the code and do your own code review first. Spend 10 to 20 minutes taking notes and refactoring the code and then continue with the video to compare your solution with mine. Let me know how this worked out! 😊

» Click here to watch my other Python Code Review: Unplugged videos

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How Do I Make My Own Command-Line Commands Using Python?

Fri, 23 Sep 2016 00:00:00 GMT

How Do I Make My Own Command-Line Commands Using Python?

How to turn your Python scripts into “real” command-line commands you can run from the system terminal.

The Python script you just wrote would make a great little command-line tool—but having to type python myscript.py all the time to launch your program gets daunting fast.

At the end of this tutorial you’ll know how to make your Python command-line script executable, so that you can start it from the terminal without explicitly calling the python interpreter.

Together we’ll write a little Python program that simulates the Unix echo command and can be started from the command-line just like it:

$ myecho Hello, World!

Ready? Let’s jump right in!

Imagine you have the following short Python script called myecho.py that just prints the command-line arguments you pass to it back to the console:

import sys

for arg in sys.argv:
    print(arg)

You can run this command just fine by passing it to the Python interpreter like so:

$ python myecho.py Hello, World!
myecho.py
Hello,
World!

But how can you give your users a more polished experience that allows them simply type myecho Hello, World! and get the same result?

Easy—there are three things you need to do:

Step 1: Mark your Python file as executable

The first thing you’ll need to do is mark your Python script as executable in the file system, like so:

$ chmod +x myecho.py

This sets the executable flag on myecho.py, which tells the shell that it’s a program that can be run directly from the command-line. Let’s try it:

$ ./myecho.py Hello, World!

We need to prefix our command with ./ because usually the current directory is not included in the PATH environment variable on Unix. This “dot slash” prefix is a security feature. If you’re wondering how it works exactly then check out this in-depth article.

Anyway—the result will be that you get a crazy error message when you try to run myecho.py. It’ll probably look something like this:

./myecho.py: line 4: syntax error near unexpected token `print'
./myecho.py: line 4: `    print(arg)'

The reason for that is that now the system doesn’t know it’s supposed to execute a Python script. So instead it takes a wild guess and tries to run your Python script like a shell script with the /bin/sh interpreter.

That’s why you’re getting these odd syntax errors. But there’s an easy fix for this in the next step. You just need to …

Step 2: Add an interpreter “shebang”

Okay, admittedly this sounds completely crazy if you’ve never heard of Unix shebangs before…😃 But it’s actually a really simple concept and super useful:

Whenever you run a script file on an Unix-like operating system (like Linux or macOS) the program loader responsible for loading and executing your script checks the first line for an interpreter directive. Here’s an example:

#!/bin/sh

You’ve probably seen those before. These interpreter directives are also called shebangs in Unix jargon. They tell the program loader which interpreter should execute the script. You can read more about Unix shebangs here.

The point is, you can use this mechanism to your advantage by adding a shebang line that points to the system Python interpreter:

#!/usr/bin/env python

You may be wondering why you should be using env to load the Python interpreter instead of simply using an absolute path like /usr/local/bin/python.

The reason for that is that the Python interpreter will be installed in different locations on different systems. On a Mac using Homebrew it might be in /usr/local/bin/python. On a Ubuntu Linux box it might be in /usr/bin/python.

Using another level of indirection through env you can select the Python interpreter that’s on the PATH environment variable. That’s usually the right way to go about it. If you’re interested in a quick detour you can learn more about env and its merits here.

Okay, so now that you’ve added that #!/usr/bin/env python line your script should look like this:

#!/usr/bin/env python
import sys

for arg in sys.argv:
    print(arg)

Let’s try to run it again!

$ ./myecho.py Hello, World!
./myecho.py
Hello,
World!

Yes! Success!

Now that you’re using the interpreter directive shebang in the script you can also drop the .py extension. This will make your script look even more like a system tool:

$ mv myecho.py myecho

This is starting to look pretty good now:

$ ./myecho Hello, World!
./myecho
Hello,
World!

Step 3: Make sure your program is on the PATH

The last thing you need to change to make your Python script really seem like a shell command or system tool is to make sure it’s on your PATH.

That way you’ll be able to launch it from any directory by simply running myecho Hello, World!, just like the “real” echo command.

Here’s how to achieve that.

I don’t recommend that you try to copy your script to a system directory like /usr/bin/ or /usr/local/bin because that can lead to all kinds of odd naming conflicts (and, in the worst case, break your operating system install).

So instead, what you’ll want to do is to create a bin directory in your user’s home directory and then add that to the PATH.

First, you need to create the ~/bin directory:

$ mkdir -p ~/bin

Next, copy your script to ~/bin:

$ cp myecho ~/bin

Finally, add ~/bin to your PATH:

export PATH=$PATH":$HOME/bin"

Adding ~/bin to the PATH like this is only temporary, however. It won’t stick across terminal sessions or system restarts. If you want to make your command permanently available on a system, do the following:

Add the this line to .profile or .bash_profile in your home directory: export PATH=$PATH":$HOME/bin".
You can either use an editor to do it or run the following command to do that: echo 'export PATH=$PATH":$HOME/bin"' >> .profile
Changes to .profile or .bash_profile only go into effect when your shell reloads these files. You can trigger a reload by either opening a new terminal window or running this command: source .profile

Okay great, now you’ll get the wanted result—your Python script can be run like a “real” shell command from the command-line, without needing a python prefix to work:

$ myecho Hello, World!
/Users/youruser/bin/myecho
Hello,
World!

There’s much more to learn about writing user-friendly command-line apps with Python. Check out this tutorial on writing Python command-line apps with the click module to learn more about structuring your command-line scripts, parsing arguments and options, and more.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Using get() to return a default value from a Python dict

Wed, 14 Sep 2016 00:00:00 GMT

Using get() to return a default value from a Python dict

Python’s dictionaries have a “get” method to look up a key while providing a fallback value. This short screencast tutorial gives you a real-world example where this might come in handy.

Imagine we have the following data structure mapping user IDs to user names:

name_for_userid = {
    382: "Alice",
    950: "Bob",
    590: "Dilbert",
}

Now we’d like to write a function greeting() which returns a greeting for a user given their user ID. Our first implementation might look something like this:

def greeting(userid):
    return "Hi %s!" % name_for_userid[userid]

This implementation works if the user ID is a valid key in name_for_userid, but it throws an exception if we pass in an invalid user ID:

>>> greeting(382)
"Hi Alice!"

>>> greeting(33333333)
KeyError: 33333333

Let’s modify our greeting function to return a default greeting if the user ID cannot be found. Our first idea might be to simply do a “key in dict” membership check:

def greeting(userid):
    if userid in name_for_userid:
        return "Hi %s!" % name_for_userid[userid]
    else:
        return "Hi there!"

>>> greeting(382)
"Hi Alice!"

>>> greeting(33333333)
"Hi there!"

While this implementation gives us the expected result, it isn’t great:

it’s inefficient because it queries the dictionary twice
it’s verbose as part of the greeting string are repeated, for example
it’s not pythonic – the official Python documentation recommends an “easier to ask for forgiveness than permission” (EAFP) coding style:

“This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false.” (Python glossary: “EAFP”)

Therefore a better implementation that follows EAFP could use a try…except block to catch the KeyError instead of doing a membership test:

def greeting(userid):
    try:
        return "Hi %s!" % name_for_userid[userid]
    except KeyError:
        return "Hi there"

Again, this implementation would be correct – but we can come up with a cleaner solution still! Python’s dictionaries have a get() method on them which supports a default argument that can be used as a fallback value:

def greeting(userid):
    return "Hi %s!" % name_for_userid.get(userid, "there")

When get() is called it checks if the given key exists in the dict. If it does, the value for that key is returned. If it does not exist then the value of the default argument is returned instead.

As you can see, this implementation of greeting works as intended:

>>> greeting(950)
"Hi Bob!"

>>> greeting(333333)
"Hi there!"

Our final implementation of greeting() is concise, clean, and only uses features from the Python standard library. Therefore I believe it is the best solution for this particular situation.

P.S. If you enjoyed this screencast and you’d like to see more just like it then subscribe to my » YouTube channel with free screencasts and video tutorials for Python developers «

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Watch me do a “live” Python code review for a reader

Thu, 08 Sep 2016 00:00:00 GMT

Watch me do a “live” Python code review for a reader

This is a bit of an experiment – but you might find it interesting!

A couple of days ago I had a Twitter conversation with Labeeb who’s just getting into Python. (Good news, so far he loves it!)

I think we started out with a classic “Emacs vs Sublime” discussion (😂) until Labeeb mentioned he was struggling with some aspects of Object-Oriented Programming in Python.

I asked him to send me some sample code and offered to take a look and give him some feedback.

He later emailed me an implementation of Conway’s Game of Life (that’s a great coding exercise by the way).

After taking a quick look at his code, I decided the best way forward would be to record myself doing a code review pass and to send Labeeb the screencast recording.

So I did just that. And it turned out to be a pretty… interesting experience. I was happy to hear that Labeeb liked the video:

“The video is extremely helpful. The way you present things was very much appealing. :)”

That made me feel all warm and fuzzy inside. I thought I was onto something… and that this video could help other Python developers, too.

(After asking Labeeb for permission) I’m now sharing the raw and unfiltered “live” code review video with you here:

Note that the video is completely unedited. This is really more of a ”Code Review: Unplugged” session than a polished tutorial or course. But based on the feedback I got so far that seems to be part of the appeal, haha.

I figured it was worth the experiment if there’s a chance it’ll help more people… Shoot me a quick email and let me know how you liked it, I might do more of these in the future if it’s helpful.

P.S. Labeeb is currently doing a programming mentorship program. He’s looking to get a job as a data analyst working with Python. You can check out his GitHub here: github.com/labeebee Give him some love, I think he’s on the right track 😊

P.P.S. This has now turned into a series. Click here to watch my other Python Code Review: Unplugged videos.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

What books should I read to move past the beginner stage in Python?

Wed, 31 Aug 2016 00:00:00 GMT

What books should I read to move past the beginner stage in Python?

Recommendations for intermediate-level Python books that help you get past the basics so you can start working on small projects.

I want to answer a question that I got on Twitter the other day:

Someone asked for book recommendations to move past the “beginner” stage in Python – The person was looking for intermediate-level books that would help them get past the basics so they could improve their skills by working on small projects.

Let me start by saying that I really like this approach to learning a new programming language!

It’s a good idea to start working on real projects as soon as possible, even if they’re small. There’s only so much you can learn from repeated let’s implement this algorithm exercises.

These are the books I recommended:

Automate the Boring Stuff with Python by Al Sweigart has some great “project like” exercises. It covers common real world tasks like web scraping or filling out online forms. This really helps keep your motivation up and getting a sense of accomplishment. The book is free to read online under a Creative Commons license (but you can buy a copy to support Al).
Effective Python by Brett Slatkin is also a great book that will help take your Python skills to the next level. It focuses on teaching you to write more pythonic code and learning the community best practices, without running the danger of overusing some of Python’s more arcane features to the detriment of your code. It’s all about hitting that sweet spot and Brett teaches this lesson well!
Fluent Python by Luciano Ramalho is intended as a hands-on guide covering the features that make Python special. I like how Luciano focuses on teaching the pythonic way to do things, which helps if you’re trying to “unlearn” patterns you’ve picked up from working with other languages. (Jim Anderson emailed me to recommend this book. Thanks Jim!)
Python Cookbook, 3rd Ed. by David Beazley and Brian Jones is more project-based again. It’s chock-full of recipes for common tasks across various application domains like data processing or network programming. This is probably the most advanced-level book of the three, covering topics like metaprogramming. But there’s just so much information in there that I’m sure you’ll learn something useful from it even with beginner-level Python skills.

I hope that helped you out!

P.S. What are your favorite books and resources for moving from junior/entry-level Python to intermediate and beyond? I’m thinking about writing a longer article about this topic and would love to hear about your best resources and learning strategies. Leave a comment below if you’ve got a minute!

Update (2017): I wrote my own Python book for intermediate developers looking to write clean and Pythonic code. This is a bit of shameless plug, but if you like this list I’m sure you’ll love Python Tricks: The Book – A Buffet of Awesome Python Features.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Make your Python code more readable with custom exception classes

Thu, 18 Aug 2016 00:00:00 GMT

Make your Python code more readable with custom exception classes

In this short screencast I’ll walk you through a simple code example that demonstrates how you can use custom exception classes in your Python code to make it easier to understand, easier to debug, and more maintainable.

Let’s say we want to validate an input string representing a person’s name in our application. A simple toy example might look like this:

def validate(name):
    if len(name) < 10:
        raise ValueError

If the validation fails it throws a ValueError. That feels kind of Pythonic already… We’re doing great!

However, there’s one downside to this piece of code: Imagine one of our teammates calls this function as part of a library and doesn’t know much about its internals.

When a name fails to validate it’ll look like this in the debug stacktrace:

>>> validate('joe')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    validate('joe')
  File "<input>", line 3, in validate
    raise ValueError
ValueError

This stacktrace isn’t really all that helpful. Sure, we know that something went wrong and that the problem has to do with an “incorrect value” of sorts.

But to be able to fix the problem our teammate almost certainly has to look up the implementation of validate(). But reading code costs time. And it adds up quickly…

Luckily we can do better! Let’s introduce a custom exception type for when a name fails validation. We’ll base our new exception class on Python’s built-in ValueError, but make it more explicit by giving it a different name:

class NameTooShortError(ValueError):
    pass

def validate(name):
    if len(name) < 10:
        raise NameTooShortError(name)

See how we’re passing name to the constructor of our custom exception class when we instantiate it inside validate? The updated code results in a much nicer stacktrace for our teammate:

>>> validate('jane')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    validate('jane')
  File "<input>", line 3, in validate
    raise NameTooShortError(name)
NameTooShortError: jane

Now, imagine you are the teammate we were talking about… Even if you’re working on a code base by yourself, custom exception classes will make it easier to understand what’s going on when things go wrong. A few weeks or months down the road you’ll have a much easier time maintaining your code. I’ll vouch for that 😃

P.S. If you enjoyed this screencast and you’d like to see more just like it then subscribe to my » YouTube channel with free screencasts and video tutorials for Python developers «

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Sublime Text for Python development — My 2016 review

Tue, 09 Aug 2016 00:00:00 GMT

Sublime Text for Python development — My 2016 review

When you ask for editor recommendations as a Python developer one of the top choices you’ll hear about is Sublime Text. In this post I’ll review the status of Python development with Sublime Text as of 2016.

Disclaimer: I’m a fan and long-time user of Sublime Text. I used a number of editors and IDEs for writing Python including PyCharm, IntelliJ with Python plugins, Atom, Visual Studio Code, BBEdit, and emacs. I’ve worked with colleagues who are big Vim proponents. And while I never got the hang of Vim, I feel like I’m qualified to give Sublime Text a relatively unbiased review. You’ve been warned though 😃.

What I like about Sublime Text

Performance: Sublime is one of the fastest editors available. Pavel Fatin compared typing latencies between several popular editors and Sublime Text is consistently among the fastest and most responsive ones in his list. My (unscientific) personal impression comparing Sublime with similar editors like Atom or VS Code confirms this. Also note that Sublime starts up super fast. I don’t restart my development environment too often, but when I do it’s nice to be back up and running within a few seconds — rather than waiting half a minute for a ginormous IDE to boot up.
Stability and reliability: I’ve been using Sublime as my main editor for almost four years and it’s always been rock solid for me in terms of stability. I don’t think I’ve ever lost any data due to a crash or some other issue. I think that’s impressive. I like my tools to be reliable.
Plugin ecosystem: Something that’s drawn me towards Sublime is its fantastic community that wrote thousands of plugins for it. That way you can build a custom editor setup that does exactly what you want and how you want it. Several fantastic packages for Python development are available. I’ve reviewed some of them here: Sublime Text Plugin Reviews.
Package Control: Sublime Text has Package Control which is a plugin manager that let’s you install and uninstall other plugins directly from within the editor. It’s kind of a “meta plugin” that makes tinkering with your setup super easy. Package Control comes with a directory of available plugins which makes it easy to pick out the good ones based on popularity and recent activity.
Plugins are written in Python: Most Sublime plugins are written in Python. Sublime Text includes an embedded Python interpreter that’s used to run the plugin code. It’s nice being able to look under the hood and read through a plugin’s code to judge its quality. If you’re a Python developer and you’re interested in writing your own Sublime Text package then that’s also a bonus.
It’s pretty: There’s a wide variety of themes for Sublime Text available which allows you to set up the look and feel of your editor to your liking. On top of that, Sublime’s font rendering is excellent. I’m peculiar about the way my editor looks. If I’m going to be staring at this thing for several hours each day then it better be as pretty as it can be 😀. I found Sublime Text to be easier to “prettify” than other editors.
Soft learning curve: Compared to some other editing environments like Vim or Emacs, Sublime Text has a soft learning curve. This is great for beginners. In my experience it’s difficult to be successful with Vim or Emacs without going all-in and spending at least a few weeks or months learning the system. Sublime Text is much easier to pick up in comparison.
UI state restoration: Sublime Text remembers the state of your editor windows when you shut it down so that when you restart Sublime everything looks the way you left it, including modified or unsaved files. This feature is brilliant! I haven’t seen anything quite like it and it’s something that discouraged me from using Atom, for example. I often use new editor tabs as scratchpads for notes. And while those are temporary it’s nice not having to worry about losing them due to an editor crash or restart.
Multiple cursors: Like some other editors Sublime supports editing with multiple cursors at the same time. This is super handy when you want to rename a local variable, for example. Select the variable, hit cmd+d a couple of times to select all other occurrences and then type the new name. Done. The same approach works in other situations like re-formatting a several lines of code at once or cutting out parts from a log file.
Cross-platform: Sublime Text is available for Mac, Linux, and Windows. It’s nice being able to use a familiar editing environment across multiple platforms.
Handles large files: Sublime is good at dealing with large files, like an occasional giant CSV file or a log file you want to take a look at in a familiar environment. I like not having to switch to other tools (like less) for that job, knowing Sublime will handle the file just fine and won’t freeze or crash. Atom dealt with the same files much less gracefully. It often freezed for seconds at a time or even crashed.
Fast global search: Sublime’s global text search is fast. I find it comparable to tools like ack, which is nice because that means I have to switch to the command line less. Sublime also indexes your source files and has a Goto Symbol in Project command that let’s you quickly jump to specific identifiers, functions, or classes. This feature is aware of Python’s syntax so it’s usually accurate.
Command palette: I’m bad at remembering keyboard shortcuts for commands I use infrequently. Sublime’s solution to that problem is the Command Palette. You can open it with cmd+p and find what you’re looking for with a fuzzy text search. Let’s say I want to rename a file and I can’t remember the keyboard shortcut for that – what I’ll do is open the Command Palette and type ren to select the File: Rename command and then hit return. Boom, this let’s me rename a file without ever moving my hands away from the keyboard – and without having to remember some arcane shortcut. This feature is a great time saver!

Things I dislike

Can be difficult to set up for a beginner: While using Sublime Text the way it comes out of the box is okay, getting most of the good stuff requires spending some time. It’s not as simple as installing an IDE like PyCharm that comes with batteries included. On the other hand, you can start with a simple setup using Sublime. Then simply add more plugins and custom configurations over time to turn it into a completely personalized tool.
It’s not free: I was on the fence about adding this point because I believe in paying for the tools that allow me to do my job better. I realize though that some people might find a free solution (like Atom, emacs, or vim) more attractive.
Not open-source, “bus factor”: Many of the Sublime Text alternatives are open-source which makes them more future proof. Sublime Text is developed by just one developer, ex-Googler Jon Skinner. And while Jon is clearly a genius and great at what he’s doing, it’s an open question what would happen if Jon decided (or was forced) to halt development of Sublime Text. Would the project just disappear? Would he be able to pull a TextMate and open-source the project? What if he decides to sell Sublime Text to a company and they do a bad job maintaining it? Essentially, one of the biggest problems with Sublime Text is that it has a bad bus factor — there’s just one developer working on it and its source code is not publicly available. Of course I hope the best for Jon and Sublime Text. My perspective on this issue is that I chose not to worry about it — I’d rather use the best tool for the job now than waste time trying to future proof my setup. If it doesn’t work out I can always switch later. (Edit: Will Bond, the creator of Package Control, joined the ST team in February 2016. This makes Sublime Text’s long-term survival more likely. But it still has a comparatively small team behind it and isn’t open-source like some of the alternatives. If ST ever stops being maintained we’ll probably see open-source reimplementations of the core editor functionality. There’s already projects like Lime Text, an open-source editor that aims to be compatible with Sublime’s plugin API.)
No great solution for “semantic auto-complete”: While there are packages that offer IntelliSense-like code completion, those I’ve tried weren’t satisfying. Due to the dynamic nature of Python as a language it will be difficult to get to the point where the auto-complete works as well as it does for Java in IntelliJ or for C# in Visual Studio. So it’s difficult to strike this up against Sublime Text and it’s plugin ecosystem. However, if you’re relying on a feature like that then it may be worth trying out the PyCharm IDE. I found it’s implementation of Python auto-complete the most promising. (Update: I’ve done more research on Python code completion with Sublime Text and after trying out several plugins I think the Anacoda plugin is the best solution. Configured correctly its auto-complete rivals that of PyCharm. I’m now happily using Anaconda in my Python development workflow.)

Conclusion

All things considered I believe Sublime Text is still the top editor choice for Python development. I haven’t found an alternative that would make me want to switch.

In my mind Sublime Text offers the best combination of performance, stability, and ergonomics. With some tuning it can look attractive, too. It does everything I want out of my programming environment and has been a central tool for me over more than three years.

By the way, if you’re looking for help setting up Sublime Text for Python development then check out this tutorial I wrote: » Setting up Sublime Text for Python development «

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to use Python’s min() and max() with nested lists

Tue, 26 Jul 2016 00:00:00 GMT

How to use Python’s min() and max() with nested lists

Let’s talk about using Python’s min and max functions on a list containing other lists. Sometimes this is referred to as a nested list or a lists of lists.

Finding the minimum or maximum element of a list of lists¹ based on a specific property of the inner lists is a common situation that can be challenging for someone new to Python.

To give us a more concrete example to work with, let’s say we have the following list of item, weight pairs²:

nested_list = [['cherry', 7], ['apple', 100], ['anaconda', 1360]]

We want Python to select the minimum and maximum element based on each item’s weight stored at index 1. We expect min and max to return the following elements:

min(nested_list) should be ['cherry', 7]
max(nested_list) should be ['anaconda', 1360]

But if we simply call min and max on that nested list we don’t get the results we expected.

The ordering we get seems to be based on the item’s name, stored at index 0:

>>> min(nested_list)
['anaconda', 1360]  # Not what we expected!

>>> max(nested_list)
['cherry', 7]  # Not what we expected!

Alright, why does it pick the wrong elements?

Let’s stop for a moment to think about how Python’s max function works internally. The algorithm looks something like this:

def my_max(sequence):
    """Return the maximum element of a sequence"""
    if not sequence:
        raise ValueError('empty sequence')

    maximum = sequence[0]

    for item in sequence:
        if item > maximum:
            maximum = item

    return maximum

The interesting bit of behavior here can be found in the condition that selects a new maximum: if item > maximum:.

This condition works nicely if sequence only contains primitive types like int or float because comparing those is straightforward (in the sense that it’ll give an answer that we intuitively expect; like 3 > 2).

However, if sequence contains other sequences then things get a little more complex. Let’s look at the Python docs to learn how Python compares sequences:

Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.

When max needs to compare two sequences to find the “larger” element then Python’s default comparison behavior might not be what we want³.

Now that we understand why we get an unexpected result we can think about ways to fix our code.

How can we change the comparison behavior?

We need to tell max to compare the items differently.

In our example, Python’s max looks at the first item in each inner list (the string cherry, apple, or anaconda) and compares it with the current maximum element. That’s why it returns cherry as the maximum element if we just call max(nested_list).

How do we tell max to compare the second item of each inner list?

Let’s imagine we had an updated version of my_max called my_max_by_weight that uses the second element of each inner list for comparison:

def my_max_by_weight(sequence):
    if not sequence:
        raise ValueError('empty sequence')

    maximum = sequence[0]

    for item in sequence:
        # Compare elements by their weight stored
        # in their second element.
        if item[1] > maximum[1]:
            maximum = item

    return maximum

That would do the trick! We can see that my_max_by_weight selects the maximum element we expected:

>>> my_max_by_weight(nested_list)
['anaconda', 1360]

Now imagine we needed to find the maximum of different kinds of lists.

Perhaps the index (or key) we’re interested in won’t always be the second item. Maybe sometimes it’ll be the third or fourth item, or a different kind of lookup is necessary all together.

Wouldn’t it be great if we could reuse the bulk of the code in our implementation of my_max? Some parts of it will always work the same, for example checking if an empty sequence was passed to the function.

How can we make max() more flexible?

Because Python allows us to treat functions as data we can extract the code selecting the comparison key into its own function. We’ll call that the key func. We can write different kinds of key funcs and pass them to my_max as necessary.

This gives us complete flexibility! Instead of just being able to choose a specific list index for the comparison, like index 1 or 2, we can tell our function to select something else entirely – for example, the length of the item’s name.

Let’s have a look at some code that implements this idea:

def identity(x):
    return x

def my_max(sequence, key_func=None):
    """
    Return the maximum element of a sequence.
    key_func is an optional one-argument ordering function.
    """
    if not sequence:
        raise ValueError('empty sequence')

    if not key_func:
        key_func = identity

    maximum = sequence[0]

    for item in sequence:
        # Ask the key func which property to compare
        if key_func(item) > key_func(maximum):
            maximum = item

    return maximum

In the code example you can see how by default we let my_max use a key func we called identity, which just uses the whole, unmodified item to do the comparison.

With identity as the key func we expect my_max to behave the same way max behaves.

nested_list = [['cherry', 7], ['apple', 100], ['anaconda', 1360]]

>>> my_max(nested_list)
['cherry', 7]

And we can confirm that we’re still getting the same (incorrect) result as before, which is a pretty good indication that we didn’t screw up the implementation completely 😃.

Now comes the cool part – we’re going to override the comparison behavior by writing a key_func that returns the second sub-element instead of the element itself⁴:

def weight(x):
    return x[1]

>>> my_max(nested_list, key_func=weight)
['anaconda', 1360]

And voilà, this is the maximum element we expected to get!

Just to demonstrate the amount of flexibility this refactoring gave us, here’s a key_func that selects the maximum element based on the length of the item’s name:

def name_length(x):
    return len(x[0])

>>> my_max(nested_list, key_func=name_length)
['anaconda', 1360]

Is there a shorthand for this stuff?

Instead of defining the key func explicitly with def and giving it a name we can also use Python’s lambda keyword to define a function anonymously. This shortens the code quite a bit (and won’t create a named function):

my_max(nested_list, key_func=lambda x: x[1])
>>> ['anaconda', 1360]

To make the naming a little slicker (albeit less expressive) imagine we’ll shorten the key_func arg to key and we’ve arrived at a code snippet that works with the max function in vanilla Python.

This means we’ll no longer need our own re-implementation of Python’s max function to find the “correct” maximum element:

# This is pure, vanilla Python:
>>> max(nested_list, key=lambda x: x[1])
['anaconda', 1360]

The same also works for Python’s built-in min:

>>> min(nested_list, key=lambda x: x[1])
['cherry', 7]

It even works for Python’s sorted function, making the “key func” concept really valuable in a number of situations you might face as a Python developer:

>>> sorted(nested_list, key=lambda x: x[1])
[['cherry', 7], ['apple', 100], ['anaconda', 1360]]

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

A better Python REPL: bpython vs python

Fri, 22 Jul 2016 00:00:00 GMT

A better Python REPL: bpython vs python

A quick video that demonstrates bpython, an awesome alternative Python interpreter.

Compared to the vanilla Python interpreter bpython knows a few extra tricks like syntax highlighting, auto indent (yay!), and auto completion.

Check it out, it’s a really great tool!

If you’d like to learn more about bpython, the following links should help you out:

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

6 things you’re missing out on by never using classes in your Python code

Fri, 15 Jul 2016 00:00:00 GMT

6 things you’re missing out on by never using classes in your Python code

Maybe you’ve been using Python for a while now and you’re starting to feel like you’re getting the hang of it. But one day you catch yourself thinking: “What about classes?”

You see other developers at your company or online discussing the merits of object oriented programming. “Classes are incredibly useful and everyone should know about them,” they say.

You’re beginning to get worried about writing your code the right way:

“I’ve never defined a class… I mean I know how to get stuff done quickly. I use functions and modules for everything, often many small functions whose output I feed into another. I simply don’t use classes because I never felt the need for them.

What am I really missing out on?

Am I shorting myself on something by never using classes in my Python code?“

I noticed that this is a common question for Python developers with a (data) science or numerical computing background.

You may be familiar with languages like R or Matlab and you haven’t really felt the need to use classes in your day to day work writing Python code. Most of your computations are sequential in nature and you found that a structured programming style worked just fine for the category of analytical problems you’re facing.

But still … there’s this nagging feeling in the back of your head – “What am I really missing out on by not writing classes?”

Let me try and help you out there. In this post I’m going over 6 things you might be missing out on by never using classes in your Python code. I’m not an OOP zealot by any means but I figured a post like that would be helpful to some folks in the Python community.

Alright, let’s dive in!

1. Easier collections of fields

If you categorically avoid the use of classes it’s easy to find yourself in a situation where you just re-invent your own “ad-hoc classes” using other built-in data structures, like lists or dicts.

For example, you might end up with lots of lists or dicts that share the same keys to access different kinds of data associated with a single logical object:

car_colors[23] = 'yellow'  # Color of Car 23
car_mileage[23] = 38189.4  # Mileage of Car 23

By switching to classes you could have a single list of objects, each of which has several named fields on it to address the associated data:

cars[23].color = 'yellow'
cars[23].mileage = 38189.4

Instead of using lists and dictionaries that happen to contain all your data you can keep all of it under one roof, which makes accessing and passing these objects around much more convenient.

You’ll also no longer need to pass around big tuples of stuff from function to function.

(Side note: You may want to go and learn more about Python’s namedtuple objects which reduce the amount of boilerplate class-code you need to write if you just want simple C-style records of fields. Notice that namedtuple objects are implemented as Python classes internally.)

2. Simpler domain models

It’s easier to think about a domain model with a number of classes that have some relation to each other, than it is to think about lists and dictionaries that are linked together by shared keys and indexes.

Some domains lend themselves well to being modeled with classes, for example GUI controls. I found that with an object-oriented approach the domain model sometimes becomes easier to discuss and reason about with other developers. And that’s always a good thing.

3. The ability to chain objects together and let them interact in an expressive way

This is where object oriented programming really shines – when the objects you’re dealing with have behavior on them. For example, a button that can be clicked or a car that can accelerate and brake.

In this case it helps if you encapsulate those behaviors by making methods on your Button and Car classes so that other objects can call those methods and change the Button’s or Car’s internal state without knowing how that operation is implemented.

Especially when you have a lot of behavior on your objects it’s helpful to have it all in the same place and under one roof that is the object itself. That way you can chain objects together and let them interact in an expressive way that’s hard to emulate in a procedural coding style.

For an example of this compare the following two code samples implementing the same sequence of interactions.

The OOP version:

if not garage.is_full:
    garage.add(my_car)
    my_car.turn_off()
    garage.close()

vs the non-OOP / procedural version:

if not is_garage_full(garage):
    add_car_to_garage(my_car, garage)
    turn_off_car(my_car)
    close_garage(garage)

4. Custom exceptions

If you want to define custom exception types in Python there isn’t really a way around using classes. Custom exception help communicate the programmer’s intent more clearly and they’re a great debugging aid.

Here’s an example of a more specific exception type that’s based on one of Python’s built-in exceptions:

class ValueTooSmallError(ValueError):
    pass

I’ve recorded a quick screencast tutorial that shows you how to use this technique to make your code more readable:

5. Fitting in with the coding style used by your colleagues

Another good reason for learning about OOP is that it is still a popular paradigm in the programming community. Therefore it is likely that you’ll be working with an object-oriented code base at some point in your career.

It can be difficult to integrate and maintain code in a single code base that was written in a different style, let’s say functional vs object-oriented.

If you’d rather avoid being the one engineer sticking out with an idiosyncratic coding style then it might be worth familiarizing yourself with OOP and classes.

6. Useful OOP design patterns

There’s a large body of OOP design patterns out there that can speed up development and improve readability. Design patterns are not a panacea and as with all things they can be overused – but in some situations being able to apply common design patterns will be helpful.

I’m going to shamelessly plug one of my open-source Python modules, schedule, as an example here. It uses the Builder pattern to let you schedule periodic jobs with a developer-friendly API:

import schedule

def job():
    print("I'm working...")

schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Parsing ISO 8601 timestamps in plain Django

Mon, 11 Jul 2016 00:00:00 GMT

Parsing ISO 8601 timestamps in plain Django

“How do I parse an ISO 8601 formatted date in Django without bringing in extra dependencies?”

If you do any web development with Python and Django then you’ll inevitable find yourself wanting to parse ISO 8601 timestamps into Python’s native datetime.datetime objects at some point. In other words, given a timestamp string like '2016-12-11T09:27:24.895' we want to convert it into a proper Python datetime object for further processing.

If you search Google on how to do this you’ll often find people recommend the 3rd party python-dateutil module. Python-dateutil is a great choice – but in some cases it does more than you really need.

If you’re already using Django you can parse ISO 8601 timestamps without bringing in another dependency using django.utils.dateparse.parse_datetime.

Here’s how:

from django.utils.dateparse import parse_datetime
parsed = parse_datetime('2001-12-11T09:27:24.895551')
assert parsed == datetime(2001, 12, 11, 9, 27, 20, 608645)

Note that if you pass a malformed value to parse_datetime it can throw a KeyError, ValueError, or TypeError exception so you might want to be ready to handle those.

Importantly, parse_datetime also understands timezone-aware timestamps and correctly sets the UTC offset on the resulting datetime object:

from django.utils.dateparse import parse_datetime
from django.utils.timezone import is_aware, utc

expected = datetime.datetime(2016, 4, 27, 11, 18, 42, 303886, tzinfo=utc)
parsed = parse_datetime('2016-04-27T11:18:42.303886+00:00')

assert parsed == expected
assert is_aware(parsed)

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Catching bogus Python asserts on CI

Wed, 30 Mar 2016 00:00:00 GMT

Catching bogus Python asserts on CI

It’s easy to accidentally write Python assert statements that always evaluate to true. Here’s how to avoid this mistake and catch bad assertions as part of your continuous integration build.

Asserts that are always true

There’s an easy mistake to make with Python’s assert:

When you pass it a tuple as the first argument, the assertion always evaluates as true and therefore never fails.

To give you a simple example, this assertion will never fail:

assert(1 == 2, 'This should fail')

Especially for developers new to Python this can be a surprising result.

Let’s take a quick look at the syntax for Python’s assert statement to find out why this assertion is bogus and will never fail.

Here’s the syntax for assert from the Python docs:

assert_stmt ::=  "assert" expression1 ["," expression2]

expression1 is the condition we test, and the optional expression2 is an error message that’s displayed if the assertion fails.

At execution time, the Python interpreter transforms each assert statement into the following:

if __debug__:
   if not expression1:
        raise AssertionError(expression2)

Let’s take the broken example assertion and apply the transform.

assert(1 == 2, 'This should fail')

becomes the following:

if __debug__:
    if not (1 == 2, 'This should fail'):
        raise AssertionError()

Now we can see where things go wrong.

Because assert is a statement and not a function call, the parentheses lead to expression1 containing the whole tuple (1 == 2, 'This should fail').

Non-empty tuples are always truthy in Python and therefore the assertion will always evaluate to true, which is maybe not what we expected.

This behavior can make writing multi-line asserts error-prone. Imagine we have the following assert statement somewhere in our test code:

assert (
    counter == 10,
    'It should have counted all the items'
)

This test case would never catch an incorrect result. The assertion always evaluates to True regardless of the state of the counter variable.

Pytest encourages you to use plain assert statements in unit tests instead of the assertEquals, assertTrue, …, assertXYZ methods provided by the unittest module in the standard library.

It’s relatively easy to accidentally write bad multi-line asserts this way. They can lead to broken test cases that give a falls sense of security in our test code.

Why isn’t this a warning in Python?

Well, it actually is a syntax warning in Python 2.6+:

>>> assert (1==2, 'This should fail')
<input>:2: SyntaxWarning: assertion is always true, perhaps remove parentheses?

The trouble is that when you use the py.test test runner, these warnings are hidden:

$ cat test.py
def test_foo():
    assert (1==2, 'This should fail')

$ py.test -v test.py
======= test session starts =======
platform darwin -- Python 3.5.1, pytest-2.9.0,
py-1.4.31, pluggy-0.3.1
rootdir: /Users/daniel/dev/, inifile: pytest.ini
collected 1 items

test.py::test_foo PASSED

======= 1 passed in 0.03 seconds =======

This makes it easy to write a broken assert in a test case. The assertion will always pass and it’s hard to notice that anything is wrong because py.test hides the syntax warning.

The solution: Code linting

Luckily, the “bogus assert” issue is the kind of problem that can be easily caught with a good code linter.

pyflakes is an excellent Python linter that strikes a nice balance between helping you catch bugs and avoiding false positives. I highly recommend it.

Starting with pyflakes 1.1.0 asserts against a tuple cause a warning, which will help you find bad assert statements in your code.

I also recommend to run the linter as part of the continuous integration build. The build should fail if the program isn’t lint free. This helps avoid issues when developers forget to run the linter locally before committing their code.

Besides going with code linting as a purely technical solution, it’s also a good idea to adopt the following technique when writing tests:

When writing a new unit test, always make sure the test actually fails when the code under test is broken or delivers the wrong result.

The interim solution

If you’re not using pyflakes but still want to be informed of faulty asserts you can use the following grep command as part of your build process:

(egrep 'assert *\(' --include '*.py' --recursive my_app/ || exit 0 && exit 1;)

This will fail the build if there’s an assert statement followed by open parentheses. This is obviously not perfect and you should use pyflakes, but in a pinch it’s better than nothing.

(Go with pyflakes if you can! 😃)

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Generate and host your API documentation for free with open-source tools

Thu, 17 Mar 2016 00:00:00 GMT

Generate and host your API documentation for free with open-source tools

How to generate documentation for a RESTful API as part of your continuous integration build and then automatically deploy it to a website. Includes a full example project on GitHub.

For a recent project I worked on a distributed system that communicated internally through RESTful APIs. The implementation work was distributed around different teams working across several timezones.

Early on we identified integrating the different systems as one of the likely challenges for the project. To reduce the communication risk, we decided to follow a shared set of API design guidelines across all systems.

The goal was to have up-to-date API documentation for each system available at all times. We used a combination of tools and automated workflows that’s been working out well for the project.

In this article I’d like to share the approach we used for auto-generating our API documentation. We used the apiDoc documentation generator, automated builds for the documentation on CI, and automated deploys of the generated documentation to a website.

Tl;dr: Here’s what we’re going to build in this tutorial: apidoc-example.surge.sh

Intro to apiDoc
- Setting up apiDoc
Hosting your API docs
- Deploying to Surge
Setting up auto-deploys
A full, live example on GitHub
- apidoc-example.surge.sh
- github.com/dbader/apidoc-example

1. Intro to apiDoc

apiDoc is a command-line tool for generating API documentation directly from annotations in the source code of your app. Its syntax is similar to JavaDoc and relatively easy to pick up.

apiDoc works with most popular programming languages, which means you can use the same annotation syntax across multiple projects in a polyglot environment.

To give you a taste of apiDoc’s syntax, here are some example annotations describing a simple endpoint on a RESTful API. I’m using Python as an example here but things would look similar for JavaScript or Ruby:

@app.route('/api/v1/random/', method=['OPTIONS', 'GET'])
def get_random_number():
    """
    @api {GET} /v1/random/ Generate a random number
    @apiName GetRandomNumber
    @apiGroup Random

    @apiDescription Generates a random number in the range `[0.0, 1.0)`.

    @apiSuccess (Success 200) {UUID}   request_id Unique id for the request
    @apiSuccess (Success 200) {Number} results    Random number in `[0.0, 1.0)`

    @apiSampleRequest /v1/random/

    @apiExample cURL example
    $ curl https://apidoc-example.herokuapp.com/api/v1/random/

    @apiSuccessExample {js} Success-Response:
        HTTP/1.0 200 OK
        {
            "request_id": "ad506913-a073-4d23-9f95-388d1c1e2c46",
            "result": 0.3606252123151169
        }
    """
    # ...

apiDoc parses out these annotations and then generates a static website with API documentation meant for people to read. Here’s an example of what that website export looks like:

The standard website template that comes with apiDoc looks fairly clean and organized. You can fully customize the look and feel, for example, to use your company branding.

The sites generated by apiDoc also include fancy features like the ability to send a sample request to your API directly from the website and inspect the returned result. Here’s what that looks like¹:

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to stop Django Rest Framework from leaking docstrings into OPTIONS responses

Thu, 03 Mar 2016 00:00:00 GMT

How to stop Django Rest Framework from leaking docstrings into OPTIONS responses

When you make an HTTP OPTIONS request against an endpoint in a Django Rest Framework app you might be surprised about what you’ll find in the response to that request.

In its default configuration Rest Framework returns a bunch of metadata that you might not want to return as part of the response. Here’s an example:

$ http OPTIONS localhost:8000/api/v1/test/

HTTP/1.0 200 OK
Allow: POST, OPTIONS
Content-Type: application/json
Date: Tue, 02 Mar 2016 8:23:00 GMT
Server: WSGIServer/0.2 CPython/3.5.1
Vary: Cookie

{
    "description": "This is the docstring of the view handling the
        request\nThis might contain information you don't want to leak
        out in an OPTIONS request.\n",
    "name": "Test Endpoint",
    "parses": [
        "application/x-www-form-urlencoded",
        "multipart/form-data",
        "application/json"
    ],
    "renders": [
        "application/json"
    ]
}

As you can see, by default the response includes the full docstring for the view as part of the description field. If that’s not what you want you can configure the metadata returned by Django Rest Framework through the metadata scheme mechanism.

Here’s a null metadata scheme that configures OPTIONS responses to be empty:

from rest_framework.metadata import BaseMetadata

class NoMetaData(BaseMetadata):
    def determine_metadata(self, request, view):
        return None

To set that metadata class globally we can use the DEFAULT_METADATA_CLASS setting in Rest Framework:

REST_FRAMEWORK = {
    'DEFAULT_METADATA_CLASS': 'yourapp.metadata.NoMetaData'
}

When we make the same OPTIONS request now we get the empty response we wanted:

$ http OPTIONS localhost:8000/api/v1/test/

HTTP/1.0 200 OK
Allow: POST, OPTIONS
Content-Type: application/json
Date: Tue, 02 Mar 2016 8:42:00 GMT
Server: WSGIServer/0.2 CPython/3.5.1
Vary: Cookie

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Debugging memory usage in a live Python web app

Tue, 09 Feb 2016 00:00:00 GMT

Debugging memory usage in a live Python web app

I worked on a Python web app a while ago that was struggling with using too much memory in production. A helpful technique for debugging this issue was adding a simple API endpoint that exposed memory stats while the app was running.

Enter Pympler

There’s a great module called Pympler for debugging memory stats in CPython. It walks your process heap and reports the object types, number of objects, and their size in bytes for all allocated Python objects.

The following function generates a memory summary using Pympler and returns it as a string:

def memory_summary():
    # Only import Pympler when we need it. We don't want it to
    # affect our process if we never call memory_summary.
    from pympler import summary, muppy
    mem_summary = summary.summarize(muppy.get_objects())
    rows = summary.format_(mem_summary)
    return '\n'.join(rows)

Let’s plug this into an example app that allocates some memory and then calls memory_summary:

"""
Don't forget to $ pip install pympler.
"""
import sys
from StringIO import StringIO

def memory_summary():
    # ... (see above)

# Allocate some memory
my_str = 'a' * 2**26
class MyObject(object):
    def __init__(self):
        self.memory = str(id(self)) * 2**10
my_objs = [MyObject() for _ in xrange(2**16)]

print(memory_summary())

Running this example will result in a printout like the one below, which should give you a rough idea which objects are taking up the most space in your app:

                       types |   # objects |   total size
============================ | =========== | ============
                         str |        6727 |     64.61 MB
   <class '__main__.MyObject |       65536 |      4.00 MB
                        dict |         596 |    950.84 KB
                        list |         251 |    601.54 KB
                        code |        1872 |    234.00 KB
          wrapper_descriptor |        1094 |     85.47 KB
                        type |          96 |     85.45 KB
  builtin_function_or_method |         726 |     51.05 KB
           method_descriptor |         586 |     41.20 KB
                         set |         135 |     36.59 KB
                     weakref |         386 |     33.17 KB
                       tuple |         384 |     28.27 KB
            _sre.SRE_Pattern |          42 |     19.31 KB
         <class 'abc.ABCMeta |          20 |     17.66 KB
           member_descriptor |         231 |     16.24 KB

For example, we see that the str objects we allocated take up the biggest chunk of memory at around 65 MB. And as expected, there are also 2^16 = 65536 MyObject instances, taking up 4 MB of space in total.

But how can we access this information in a production web app?

I ended up just exposing the output of memory_summary() as a /debug/memory plaintext endpoint secured with HTTP basic auth. This allowed us to access the allocation stats for the app while it was running in production.

A more advanced way to track these stats in a production web app would be to feed them into a service like DataDog to plot and track them over time. However, in many cases a simple solution like printing the stats to the application log might suffice.

Please also note that these stats are per interpreter process. If you’re running your web app as multiple CPython processes behind a load balancer (like you should) then you’ve got to be sure to take that into account when making sense of these memory stats.

Still, I found that just getting a rough sample of which objects are taking up the most space gave me a better idea of the memory usage pattern of the app and helped reduce memory consumption with some follow up work.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

How to store photos in the cloud and avoid vendor lock-in

Sat, 28 Nov 2015 00:00:00 GMT

How to store photos in the cloud and avoid vendor lock-in

I’ve been burned by relying 100% on a cloud service before. Some time ago a photographer friend convinced me to sign up for an awesome photo storage service called Everpix.

My internet connection ran red hot for a couple of days and nights until my whole photo library was finally transferred to Everpix’s cloud. And I loved the service. It was fast and had great UX. Finally, a cloud photo storage solution that worked well for me.

It was simply a joy to use.

At least for a few days—then they sent me an email telling me they ran out of money and had to shut down the company.

There was a grace period where Everpix let you download your photos in their original quality as a giant zip archive. So at least people didn’t lose any data if they acted quickly enough.

A few weeks later Everpix was finally gone and I felt frustrated¹.

I had really enjoyed being able to access all of my photos from any device I owned. I had liked the fact that I didn’t have to worry about manual backups as much.

I decided that I wasn’t going to be tied to a single cloud service ever again and set out to build my own photo storage solution. It’s not as fancy as Everpix was but it get’s the job done and feels much more future proof.

Let me give you a quick overview of how it works.

One folder structure to rule them all

Instead of using a proprietary storage format like Apple’s Photos.app or Everpix, all of my photos simply go into a nested folder structure based on their timestamp.

I give each photo a path and filename based on the time it was taken and then I sort it into the following folder structure:

├── 2014
│   ├── 2014-01
│   │   ├── 2014-01-05 13.24.45.jpg
│   │   ├── 2014-01-05 21.28.48.jpg
│   │   ├── 2014-01-05 21.28.48-1.jpg
│   │   ├── 2014-01-06 21.14.38.jpg
│   ├── 2014-02
│   |   ├─ ...
│   ├── ...
│   └── 2014-12
├── 2015
│   ├── 2015-01
│   ├── 2015-02
│   ├── ...
│   └── 2015-12
├── ...

This is a dead simple scheme that I’ll be able to keep using as long as there are hierarchical file systems. And the good news is that all of this sorting and structuring can happen automatically based on EXIF timestamps or file creation dates.

I found that a simple folder structure is a perfect fit for my photo storage needs. I sometimes create “albums” by moving some photos into a separate folder, for example:

├── 2015
│   ├── 2015-01
│   ├── 2015-02
│   ├── 2015-02 My Album
│   ├── 2015-03
├── ...

This let’s me keep the year-month sort order in the yearly folders and provides enough structure to find important events quickly. Occasionally I also create “virtual” albums in Carousel to share with friends and family, but more on that in a minute.

Dropbox & Photosorter

The setup I use now is built around Dropbox for cloud storage and my open-source photosorter tool. The complete workflow is fully automated and looks like this:

New photos go into the Camera Uploads folder on my Dropbox. This either happens by me manually copying them off an SD card into the Camera Uploads folder or the Dropbox iOS app automatically uploads new photos when my phone has a Wi-Fi connection.
Photosorter runs on my home server and watches Camera Uploads for new photos. It then takes them and moves them into the appropriate place in my Photos folder which also lives in my Dropbox. Photosorter detects and ignores duplicates through their SHA1 hash. Photos taken in the same instant are deduplicated by adding a suffix (-1, -2, etc) to the filename.
Dropbox picks up the new files in my Photos folder and distributes them to all my devices. Once the photos are in Dropbox I can also access them from anywhere using the Dropbox website.

This setup has the nice side-effect that I have a physical backup of my photos in several places, like my home server and my Mac. This works because my photo library is only about 100 GB in size. For a larger library I’m either going to just buy more storage or keep a full backup on my home server and disable syncing on my Mac.

If want to give photosorter a try there’s documentation and a deployment example on its GitHub page.

Carousel

Update: Dropbox is going to kill Carousel, meh.

Since I wrote this article Dropbox announced that they will shut down Carousel on March 31st, 2016. This is a bit of a bummer because parts of the workflow I’m describing here worked really well with Carousel.

However, they said that they’ll port most of Carousel’s functionality back into the Dropbox app and website. That’s fine by me and would work well with my photos workflow. I don’t really care which app I need to launch to look at my photos (that’s also kind of the whole point of this article). Once Carousel’s gone I’ll update the article with new recommendations for tools.

Carousel is Dropbox’s new product for managing photos in your Dropbox account. I really like the Carousel app and website. It’s a super convenient way to browse through my photos from anywhere I want. I also frequently use it to share photos with friends and family by creating ad-hoc albums on Carousel.

Their iOS app let’s me access all of my photos while not taking up much space on my phone. This is thanks to Carousel’s smart caching system that only keeps high-quality version of photos you viewed recently on your phone. It’s similar to iCloud photos on iOS 9, works well and usually requires zero babysitting.

Carousel also has a cool flashbacks feature that shows you photos that you took in the same week one or more years ago. Everpix had that too and it’s a neat way to enjoy older photos from my library.

Like I said before I also use the iOS app to automatically upload new photos from my iPhone when I’m on Wi-Fi. This pretty much guarantees that I won’t lose photos while I’m traveling. It also helps keep enough free space on my phone so I can continue taking photos.

The future

I’m currently running photosorter on my home server. At some point I might replace it with a virtual machine on S3 or Digital Ocean which will provide cheaper storage and better fault tolerance. I don’t really trust that little Toshiba notebook drive spinning 24/7.

This setup has served me well over the past two years. Obviously setting this up is more involved than just using a turnkey solution. But I also feel like it’s more future proof than using an off-the-shelf service like Apple’s iCloud Photo Library or Google Photos.

I’ve been burned by Everpix’s sudden disappearance and if Dropbox goes away I’ll just use a different filesystem-based sync service like BitTorrent Sync. If you’re worried about privacy then running your own photo storage solution might be appealing, too.

I think I’m going to feel frustrated pretty soon again when Rdio shuts down… ↩

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

OS X notifications for your pytest runs

Mon, 18 May 2015 00:00:00 GMT

OS X notifications for your pytest runs

This article shows you how to use the pytest-osxnotify, a plugin for pytest that adds native Mac OS X notifications to the pytest terminal runner.

pytest + OS X notifications = happy developers

pytest-osxnotify is a plugin for the pytest testing tool. It adds OS X notifications to your test runs so you know when a test run completes and whether it failed or succeeded without looking at your terminal window.

This is especially useful when you re-run your tests automatically every time a source file was modified.

A quick example

Installing pytest-osxnotify is easy. Let’s set up a simple example that shows you how to use pytest so that it watches your source files for modifications and re-runs the tests as necessary.

We start by installing pytest, pytest-xdist and pytest-osxnotify¹.

$ pip install pytest pytest-xdist pytest-osxnotify

Let’s also create a simple test file for us to run. Save the following as example_test.py in the current folder.

def test_example1():
    assert True

def test_example2():
    assert True

def test_example3():
    assert True

Now we start the pytest watcher that monitors our source file for modifications and re-runs the tests when necessary.

$ py.test -f example_test.py

That’s it. We can now move our terminal to the background and hack away in our favourite editor knowing that we’ll stay informed about the results of our test runs.

You’ll typically want to install your dependencies into a Python virtualenv so that they don’t pollute your system install. Look here for a good tutorial on using virtualenv. ↩

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Software engineer reading list: My favourite books about programming

Sat, 24 May 2014 00:00:00 GMT

Software engineer reading list: My favourite books about programming

Reading books is one of the best ways to improve your craftsmanship and to become a better software developer. This is a continuously updated list with my favourite programming books, sorted by topic. I link to the ebook version where possible but most books should be available made from dead trees as well.

Architecture & System Design

How to build reliable software that works well.

Release It! by Michael T. Nygard
The Architecture of Open Source Applications by Amy Brown
The Architecture of Open Source Applications, Volume II by Amy Brown
The Performance of Open Source Applications by Tavish Armstrong

Craftsmanship

Books about best practices, code quality and professionalism. Every single one of these books is fantastic and I got so much out of them. If you don’t know which area to focus on first then start here.

Clean Code by Robert C. Martin
Team Geek by Brian W. Fitzpatrick
The Clean Coder by Robert C. Martin
The Passionate Programmer by Chad Fowler
The Zen Programmer by Christian Grobmeier
HBR’s 10 Must Reads on Managing Yourself
Better: A surgeon’s notes on performance by Atul Gawande
Code Complete by Steve McConnell

Programming Languages

Books about specific programming languages that I enjoyed. There’s often free resources available online but sometimes it’s nice to just buy a book that takes you through many aspects of a language. Some of these books are great reads even if you’re not interested in the language specifically, as they teach you important universal concepts.

Haskell

Learn You a Haskell for Great Good! by Miran Lipovača
Parallel and Concurrent Programming in Haskell by Simon Marlow

JavaScript

Effective JavaScript by David Herman
JavaScript: The Good Parts by Douglas Crockford

Python

Writing Idiomatic Python by Jeff Knupp
Effective Python by Bret Slatkin
Python Cookbook, 3rd Ed. by David Beazley and Brian Jones
Two Scoops of Django by Daniel and Audrey Roy Greenfeld
Fluent Python by Luciano Ramalho
Automate the Boring Stuff with Python by Al Sweigart

Scala

Programming in Scala by Martin Odersky

Interviews & Hiring

These books work both ways. If you’re trying to be hired as an engineer or hiring others then you can learn a lot from them.

Elements of Programming Interviews (Python Ed.) by Aziz, Lee and Prakash
Cracking the Coding Interview by Gayle Laakmann McDowell
Programming Interviews Exposed by John Morgan

Leadership & Managing developers

These are useful even if you’re not in a leadership position. They’ll help you understand your manager better and will make you a more effective communicator.

Managing Humans by Michael Lopp
Leading Snowflakes by Oren Ellenbogen
How to Win Friends & Influence People by Dale Carnegie
It’s Not All About Me by Robin Dreeke

CompSci fundamentals, algorithms, and math

This stuff is important. Languages and frameworks come and go but the foundations remain largely static. Re-visit these every once in a while.

The Algorithm Design Manual by Steven S. Skiena
Algorithms by Dasgupta, Papadimitriou, and Vazirani
Introduction to Algorithms by Thomas H. Cormen
Concrete Mathematics by Ronald L. Graham

Postmortems

The best software engineering war stories around. I get inspired by reading about successful or failed software projects that others have worked on. These books let you learn from the experiences and careers of some of the best people in the field.

Coders at Work by Peter Seibel
FoxTales by Kerry Nietz
Masters of Doom by David Kushner
Postmortems from Game Developer by Austin Grossman
Showstopper by G. Pascal Zachary
The Future Was Here: The Commodore Amiga by Jimmy Maher
The Making of Karateka by Jordan Mechner
The Making of Prince of Persia by Jordan Mechner

Writing

Being able to communicate succinctly in writing is often more important than raw technical ability. Especially if you want to convince others. These books have helped me to structure my thinking and improved my English. Especially if English is your second language like it is for me this is an area you should focus on.

On Writing Well by William Zinsser
Oxford Guide to Plain English by Martin Cutts
Writing for Computer Science by Justin Zobel

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Abstract Base Classes in Python

Mon, 21 Oct 2013 00:00:00 GMT

Abstract Base Classes in Python

Abstract Base Classes (ABCs) ensure that derived classes implement particular methods from the base class. In this tutorial you’ll learn about the benefits of abstract base classes and how to define them with Python’s built-in abc module.

What the ABC?

So what are Abstract Base Classes good for? A while ago I had a discussion at work about which pattern to use for implementing a maintainable class hierarchy in Python. More specifically, the goal was to define a simple class hierarchy for a service backend in the most programmer-friendly and maintainable way.

We had a BaseService class that defined a common interface and several concrete implementations. The concrete implementations do different things but all of them provide the same interface (MockService, RealService, and so on). To make this relationship explicit, the concrete implementations all subclass BaseService.

To make this code as maintainable and programmer-friendly as possible we wanted to make sure that:

instantiating the base class is impossible; and
forgetting to implement interface methods in one of the subclasses raises an error as early as possible.

When to Use Python’s `abc` Module

Now why would you want to use Python’s abc module to solve this problem? The above design is pretty common in more complex systems. To enforce that a derived class implements a number of methods from the base class, something like this Python idiom is typically used:

class Base:
    def foo(self):
        raise NotImplementedError()

    def bar(self):
        raise NotImplementedError()

class Concrete(Base):
    def foo(self):
        return 'foo() called'

    # Oh no, we forgot to override bar()...
    # def bar(self):
    #     return "bar() called"

So, what do we get from this first attempt at solving the problem? Calling methods on an instance of Base correctly raises NotImplementedError exceptions:

>>> b = Base()
>>> b.foo()
NotImplementedError

Furthermore, instantiating and using Concrete works as expected. And, if we call an unimplemented method like bar() on it, this also raises an exception:

>>> c = Concrete()
>>> c.foo()
'foo() called'
>>> c.bar()
NotImplementedError

This first implementation is decent, but it isn’t perfect yet. The downsides here are that we can still:

instantiate Base just fine without getting an error; and
provide incomplete subclasses—instantiating Concrete will not raise an error until we call the missing method bar().

With Python’s abc module that was added in Python 2.6, we can do better and solve these remaining issues. Here’s an updated implementation using an Abstract Base Class defined with the abc module:

from abc import ABCMeta, abstractmethod

class Base(metaclass=ABCMeta):
    @abstractmethod
    def foo(self):
        pass

    @abstractmethod
    def bar(self):
        pass

class Concrete(Base):
    def foo(self):
        pass

    # We forget to declare bar() again...

This still behaves as expected and creates the correct class hierarchy:

assert issubclass(Concrete, Base)

Yet, we do get another very useful benefit here. Subclasses of Base raise a TypeError at instantiation time whenever we forget to implement any abstract methods. The raised exception tells us which method or methods we’re missing:

>>> c = Concrete()
TypeError:
"Can't instantiate abstract class Concrete \
with abstract methods bar"

Without abc, we’d only get a NotImplementedError if a missing method was actually called. Being notified about missing methods at instantiation time is a great advantage. It makes it more difficult to write invalid subclasses. This might not be a big deal if you’re writing new code, but a few weeks or months down the line, I promise it’ll be helpful.

This pattern is not a full replacement for compile-time type checking, of course. However, I found it often makes my class hierarchies more robust and more readily maintainable. Using ABCs states the programmer’s intent clearly and thus makes the code more communicative. I’d encourage you to read the abc module documentation and to keep an eye out for situations where applying this pattern makes sense.

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Functional linked lists in Python

Mon, 05 Aug 2013 00:00:00 GMT

Functional linked lists in Python

Linked lists are fundamental data structures that every programmer should know. This article explains how to implement a simple linked list data type in Python using a functional programming style.

Inspiration

The excellent book Programming in Scala inspired me to play with functional programming concepts in Python. I ended up implementing a basic linked list data structure using a Lisp-like functional style that I want to share with you.

I wrote most of this using Pythonista on my iPad. Pythonista is a Python IDE-slash-scratchpad and surprisingly fun to work with. It’s great when you’re stuck without a laptop and want to explore some CS fundamentals :)

So without further ado, let’s dig into the implementation.

Constructing linked lists

Our linked list data structure consists of two fundamental building blocks: Nil and cons. Nil represents the empty list and serves as a sentinel for longer lists. The cons operation extends a list at the front by inserting a new value.

The lists we construct using this method consist of nested 2-tuples. For example, the list [1, 2, 3] is represented by the expression cons(1, cons(2, cons(3, Nil))) which evaluates to the nested tuples (1, (2, (3, Nil))).

Nil = None

def cons(x, xs=Nil):
    return (x, xs)

assert cons(0) == (0, Nil)
assert cons(0, (1, (2, Nil))) == (0, (1, (2, Nil)))

Why should we use this structure?

First, the cons operation is deeply rooted in the history of functional programming. From Lisp’s cons cells to ML’s and Scala’s :: operator, cons is everywhere – you can even use it as a verb.

Second, tuples are a convenient way to define simple data structures. For something as simple as our list building blocks, we don’t necessarily have to define a proper class. Also, it keeps this introduction short and sweet.

Third, tuples are immutable in Python which means their state cannot be modified after creation. Immutability is often a desired property because it helps you write simpler and more thread-safe code. I like this article by John Carmack where he shares his views on functional programming and immutability.

Abstracting away the tuple construction using the cons function gives us a lot of flexibility on how lists are represented internally as Python objects. For example, instead of using 2-tuples we could store our elements in a chain of anonymous functions with Python’s lambda keyword.

def cons(x, xs=Nil):
    return lambda i: x if i == 0 else xs

To write simpler tests for more complex list operations we’ll introduce the helper function lst. It allows us to define list instances using a more convenient syntax and without deeply nested cons calls.

def lst(*xs):
    if not xs:
        return Nil
    else:
        return cons(xs[0], lst(*xs[1:]))

assert lst() == Nil
assert lst(1) == (1, Nil)
assert lst(1, 2, 3, 4) == (1, (2, (3, (4, Nil))))

Basic operations

All operations on linked lists can be expressed in terms of the three fundamental operations head, tail, and is_empty.

head returns the first element of a list.
tail returns a list containing all elements except the first.
is_empty returns True if the list contains zero elements.

You’ll see later that these three operations are enough to implement a simple sorting algorithm like insertion sort.

def head(xs):
    return xs[0]

assert head(lst(1, 2, 3)) == 1

def tail(xs):
    return xs[1]

assert tail(lst(1, 2, 3, 4)) == lst(2, 3, 4)

def is_empty(xs):
    return xs is Nil

assert is_empty(Nil)
assert not is_empty(lst(1, 2, 3))

Length and concatenation

The length operation returns the number of elements in a given list. To find the length of a list we need to scan all of its n elements. Therefore this operation has a time complexity of O(n).

def length(xs):
    if is_empty(xs):
        return 0
    else:
        return 1 + length(tail(xs))

assert length(lst(1, 2, 3, 4)) == 4
assert length(Nil) == 0

concat takes two lists as arguments and concatenates them. The result of concat(xs, ys) is a new list that contains all elements in xs followed by all elements in ys. We implement the function with a simple divide and conquer algorithm.

def concat(xs, ys):
    if is_empty(xs):
        return ys
    else:
        return cons(head(xs), concat(tail(xs), ys))

assert concat(lst(1, 2), lst(3, 4)) == lst(1, 2, 3, 4)

Last, init, and list reversal

The basic operations head and tail have corresponding operations last and init. last returns the last element of a non-empty list and init returns all elements except the last one (the initial elements).

def last(xs):
    if is_empty(tail(xs)):
        return head(xs)
    else:
        return last(tail(xs))

assert last(lst(1, 3, 3, 4)) == 4

def init(xs):
    if is_empty(tail(tail(xs))):
        return cons(head(xs))
    else:
        return cons(head(xs), init(tail(xs)))

assert init(lst(1, 2, 3, 4)) == lst(1, 2, 3)

Both operations need O(n) time to compute their result. Therefore it’s a good idea to reverse a list if you frequently use last or init to access its elements. The reverse function below implements list reversal, but in a slow way that takes O(n²) time.

def reverse(xs):
    if is_empty(xs):
        return xs
    else:
        return concat(reverse(tail(xs)), cons(head(xs), Nil))

assert reverse(Nil) == Nil
assert reverse(cons(0, Nil)) == (0, Nil)
assert reverse(lst(1, 2, 3, 4)) == lst(4, 3, 2, 1)
assert reverse(reverse(lst(1, 2, 3, 4))) == lst(1, 2, 3, 4)

Prefixes and suffixes

The following operations take and drop generalize head and tail by returning arbitrary prefixes and suffixes of a list. For example, take(2, xs) returns the first two elements of the list xs whereas drop(3, xs) returns everything except the last three elements in xs.

def take(n, xs):
    if n == 0:
        return Nil
    else:
        return cons(head(xs), take(n-1, tail(xs)))

assert take(2, lst(1, 2, 3, 4)) == lst(1, 2)

def drop(n, xs):
    if n == 0:
        return xs
    else:
        return drop(n-1, tail(xs))

assert drop(1, lst(1, 2, 3)) == lst(2, 3)
assert drop(2, lst(1, 2, 3, 4)) == lst(3, 4)

Element selection

Random element selection on linked lists doesn’t really make sense in terms of time complexity – accessing an element at index n requires O(n) time. However, the element access operation apply is simple to implement using head and drop.

def apply(i, xs):
    return head(drop(i, xs))

assert apply(0, lst(1, 2, 3, 4)) == 1
assert apply(2, lst(1, 2, 3, 4)) == 3

More complex examples

The three basic operations head, tail, and is_empty are all we need to implement a simple (and slow) sorting algorithm like insertion sort.

def insert(x, xs):
    if is_empty(xs) or x <= head(xs):
        return cons(x, xs)
    else:
        return cons(head(xs), insert(x, tail(xs)))

assert insert(0, lst(1, 2, 3, 4)) == lst(0, 1, 2, 3, 4)
assert insert(99, lst(1, 2, 3, 4)) == lst(1, 2, 3, 4, 99)
assert insert(3, lst(1, 2, 4)) == lst(1, 2, 3, 4)

def isort(xs):
    if is_empty(xs):
        return xs
    else:
        return insert(head(xs), isort(tail(xs)))

assert isort(lst(1, 2, 3, 4)) == lst(1, 2, 3, 4)
assert isort(lst(3, 1, 2, 4)) == lst(1, 2, 3, 4)

The following to_string operation flattens the recursive structure of a given list and returns a Python-style string representation of its elements. This is useful for debugging and makes for a nice little programming exercise.

def to_string(xs, prefix="[", sep=", ", postfix="]"):
    def _to_string(xs):
        if is_empty(xs):
            return ""
        elif is_empty(tail(xs)):
            return str(head(xs))
        else:
            return str(head(xs)) + sep + _to_string(tail(xs))
    return prefix + _to_string(xs) + postfix

assert to_string(lst(1, 2, 3, 4)) == "[1, 2, 3, 4]"

Where to go from here

This article is more of a thought experiment than a guide on how to implement a useful linked list in Python. Keep in mind that the above code has severe restrictions and is not fit for real life use. For example, if you use this linked list implementation with larger example lists you’ll quickly hit recursion depth limits (CPython doesn’t optimize tail recursion).

I spent a few fun hours playing with functional programming concepts in Python and I hope I inspired you to do the same. If you want to explore functional programming in ‘real world’ Python check out the following resources:

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Setting up Sublime Text for Python development

Sun, 12 May 2013 00:00:00 GMT

Setting up Sublime Text for Python development

I recently started using Sublime Text 2 more and more as my main editor for Python development. This article explains my setup and some tweaks that make Python programmers happy.

Why Sublime Text?

I’ve been an avid user of TextMate for a long time. It’s light-weight, open-source, and as a native OS X application it feels very Mac-esque. While TextMate is a great editor it seems very bare bones sometimes.

For some projects I used the more beefy IntelliJ IDEA with the Python plug-in. I especially like its debugger and test runner. Yet, often a full-blown IDE like IntelliJ is overkill when working on small to medium-sized projects.

Over the last few weeks I began using Sublime Text more and more. Once I took the time to set it up I felt very much at home. It’s really fast, receives steady updates, and – as a big bonus – fully cross-platform. What finally won me over compared to TextMate was Sublime’s great plug-in ecosystem. There are several plug-ins available that make Python development very smooth and enjoyable.

I’m still switching editors on a per project basis now. But I noticed that for me Sublime Text seems to hit the sweet spot between a bare bones editor and a full-blown IDE for Python development.

Update: Is Sublime Text still the best choice for Python devs?

Since I wrote this article quite a few things have changed in the world of Python editors and IDEs. If you’re wondering whether Sublime Text is still the right choice for you then this review article I wrote may be helpful:

» Sublime Text for Python development — My 2016 review «

Font choice

Ubuntu Mono is a great font. I’ve switched from primarily using Menlo a few days ago and I’m not regretting it so far.

With Ubuntu Mono, I find font size 16 very comfortable to read on my 15-inch MacBook. At 1680 × 1050 the sidebar plus two editor views (wrapped at 80 characters) fit nicely next to each other.

If you want to go nuclear on making the ideal font choice, this topic on slant.co gives a good overview. It includes screenshots and download links for popular programming fonts.

Installed plug-ins

As mentioned before, Sublime has a very extensive plug-in ecosystem. I’m currently using the following plug-ins:

Package Control A package manager for installing additional plug-ins directly from within Sublime. This should be the only package you have to install manually. All other packages listed here can be installed via Package Control. It’s also possible to update installed packages with Package Control. Simply think of it as the apt-get of Sublime packages.
Color Scheme - Tomorrow Night Color schemes determine the font colors used for syntax highlighting in the editor view. Tomorrow is a nice dark color scheme.
Theme - Soda Dark Themes change the color and style of Sublime’s UI elements. This one fits perfectly with the Tomorrow color scheme.
SideBarEnhancements This plug-in provides additional context menu options in the sidebar, such as “New file” or “New Folder”. These should be in there by default, but they are not.
All Autocomplete Sublime’s default autocomplete only considers words found in the current file. This plug-in extends the autocomplete word list to find matches across all open files.
SublimeCodeIntel Enhances autocomplete for some languages including Python. The plug-in also lets you jump to symbol definitions across files by pressing alt and then clicking on a symbol. Very handy.
SublimeREPL Allows you to run a Python interpreter session in an editor view. I tend to use bpython in a separate terminal window but sometimes SublimeREPL is helpful.
GitGutter Adds little icons to the editor’s gutter area indicating whether a line has been inserted, modified, or deleted according to Git. To get colored icons update your color scheme file as instructed in the GitGutter readme.
Pylinter This plug-in provides the best pylint editor integration I’ve seen so far. It automatically lints .py files whenever they’re saved and displays pylint violations directly in the editor view. It also has a convenient shortcut that locally disables a pylint check by inserting a #pylint: disable comment. This plug-in sealed the deal for me.

Preferences files

One of the nice things about Sublime Text is that it can be completely configured using simple JSON-based preferences files. This allows you to easily transfer your settings to another system. I’ve also seen people use Dropbox to automatically synchronize their settings on every computer they’re using.

Preferences.sublime-settings configures Sublime’s look-and-feel and its built-in behavior. You can open the prefs file for editing within Sublime via Preferences > Settings – User. I’m using the following settings:

{
    // Colors
    "color_scheme": "Packages/Tomorrow Color Schemes/Tomorrow-Night.tmTheme",
    "theme": "Soda Dark.sublime-theme",

    // Font
    "font_face": "Ubuntu Mono",
    "font_size": 16.0,
    "font_options": ["subpixel_antialias", "no_bold"],
    "line_padding_bottom": 0,
    "line_padding_top": 0,

    // Cursor style - no blinking and slightly wider than default
    "caret_style": "solid",
    "wide_caret": true,

    // Editor view look-and-feel
    "draw_white_space": "all",
    "fold_buttons": false,
    "highlight_line": true,
    "auto_complete": false,
    "show_minimap": false,
    "show_full_path": true,

    // Editor behavior
    "scroll_past_end": false,
    "highlight_modified_tabs": true,
    "find_selected_text": true,

    // Word wrapping - follow PEP 8 recommendations
    "rulers": [ 72, 79 ],
    "word_wrap": true,
    "wrap_width": 80,

    // Whitespace - no tabs, trimming, end files with \n
    "tab_size": 4,
    "translate_tabs_to_spaces": true,
    "trim_trailing_white_space_on_save": true,
    "ensure_newline_at_eof_on_save": true,

    // Sidebar - exclude distracting files and folders
    "file_exclude_patterns":
    [
        ".DS_Store",
        "*.pid",
        "*.pyc"
    ],
    "folder_exclude_patterns":
    [
        ".git",
        "__pycache__",
        "env",
        "env3"
    ]
}

Pylinter.sublime-settings configures the pylinter plug-in. I use the following settings to lint Python files automatically on save and to display graphical icons for lint violations:

{
    // Configure pylint's behavior
    "pylint_rc": "/Users/daniel/dev/pylintrc",

    // Show different icons for errors, warnings, etc.
    "use_icons": true,

    // Automatically run Pylinter when saving a Python document
    "run_on_save": true,

    // Don't hide pylint messages when moving the cursor
    "message_stay": true
}

Key bindings

Sublime’s key bindings are also fully user-configurable via JSON-based sublime-keymap preferences files. I’ve made a few changes to the default bindings to better serve my existing TextMate/IntelliJ muscle memory. You may not need to make changes to the key bindings at all. But if you want to, modifying them is very easy and transferable across platforms. I use the following additional key bindings:

[
    // Rebind "go to file" to cmd+shift+O
    { "keys": ["super+shift+o"], "command": "show_overlay", "args": {
        "overlay": "goto",
        "show_files": true
    }},

    // Rebind swap line up/down to cmd+shift+up/down
    { "keys": ["super+shift+up"], "command": "swap_line_up" },
    { "keys": ["super+shift+down"], "command": "swap_line_down" },

    // Delete a line with cmd+delete
    { "keys": ["super+backspace"], "command": "run_macro_file", "args": {
        "file": "Packages/Default/Delete Line.sublime-macro"
    }},

    // Reindent selection with cmd+alt+L
    { "keys": ["super+alt+l"], "command": "reindent"}
]

Command line tools

Similarly to TextMate’s mate, Sublime Text includes a command line tool that allows you to open the editor from the shell. The tool called subl is not enabled by default. To make it available from any shell do the following:

ln -s /Applications/Sublime\ Text\ 2.app/Contents/SharedSupport/bin/subl /usr/local/bin/subl

To use Sublime as the default editor for interactive Git commands, for example when composing commit messages, add the following line to your ~/.profile:

export GIT_EDITOR="subl --wait --new-window"

I’ve recorded a quick screencast that shows you how to do to this in some more detail: » Using Sublime Text as your Git editor «

Further inspiration

I hope this little guide was helpful to you. If you’ve got any comments or suggested improvements, please feel free to drop me a line on Twitter or send an email. I’d like to thank the following authors for their articles on setting up Sublime. They inspired my setup and may teach you some more tricks as well:

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Monochrome font rendering with FreeType and Python

Tue, 30 Apr 2013 00:00:00 GMT

Monochrome font rendering with FreeType and Python

For my Raspberry Pi internet radio project I needed a way to render text suitable for a low resolution monochrome LCD. This article describes how to render 1-bit text using FreeType and Python.

What we’re going to do

I’ve structured this tutorial into four main sections. First, there’ll be a brief introduction to the FreeType font rendering library. Second, we’ll attempt to render bitmap images of single characters. Third, we expand the previous functionality to render strings of multiple characters. Fourth, you’ll learn how to add support for kerning in order to improve the visual quality of your font rendering. The image above shows what results to expect from this tutorial.

At the end of the article you’ll also find the full example code for download.

Update: What it looks like on a real display

Some people have asked for images of the font rendering code being used with a real LCD. The above picture shows an earlier version of the code running on a Raspberry Pi Model B connected to the “Raspi-LCD” board by Emsystech Engineering. The board contains a backlit 128 × 64 pixels display and five buttons. It comes with a C library that I use from Python with the ctypes module. The board is high quality and the haptics of the buttons are very good as well (they’re very clicky). I recommend it very much.

The FreeType library

FreeType is a popular open source C library for rendering fonts. Apparently more than a billion consumer devices with graphical display use FreeType to display text. The widespread use and high-quality output make the library an ideal choice for rendering text. FreeType works with the most common font formats like TrueType (.ttf files) and OpenType (.otf files).

For using FreeType with Python I recommend freetype-py by Nicolas Rougier which provides Python bindings for FreeType 2.

Rendering single characters

The first thing we want to achieve is to render monochromatic images for single characters. Once we can do that it’ll be reasonably simple to extend our code to display strings with multiple characters. To generate a bitmap image representation for a single character (glyph) with FreeType we need to do the following:

Load the font file.
Get the glyph bitmap for the given character.
Unpack the glyph bitmap into a more convenient format.

After this we’re able to render monochrome bitmaps for single characters. For example, the character e would look like this:

We’re going to work on this list from top to bottom and start by defining a class Font that represents a fixed-size font as loaded from a file on disk:

class Font(object):
  def __init__(self, filename, size):
    self.face = freetype.Face(filename)
    self.face.set_pixel_sizes(0, size)

  def glyph_for_character(self, char):
    # Let FreeType load the glyph for the given character and
    # tell it to render a monochromatic bitmap representation.
    self.face.load_char(char, freetype.FT_LOAD_RENDER |
                              freetype.FT_LOAD_TARGET_MONO)
    return Glyph.from_glyphslot(self.face.glyph)

  def render_character(self, char):
    glyph = self.glyph_for_character(char)
    return glyph.bitmap

We’ve used a yet undefined class called Glyph in the glyph_for_character() method. The Glyph class is our wrapper around FreeType’s glyph representations and primarily helps with unpacking FreeType’s bitmap format for monochrome glyphs. FreeType stores monochrome bitmaps in a packed format where multiple pixels are encoded within a single byte. This format is slightly inconvenient to use because it involves some bit-fiddling.

To give an example on how to access individual pixels in this format we’re going to unpack the glyph bitmap into a Python bytearray. In this unpacked format each pixel is represented by a single byte. A value of 0 means that the pixel is off and any other value means that it is on. The Glyph class with the bitmap unpacking code looks as follows:

class Glyph(object):
  def __init__(self, pixels, width, height):
    self.bitmap = Bitmap(width, height, pixels)

  @staticmethod
  def from_glyphslot(slot):
    """Construct and return a Glyph object from a FreeType GlyphSlot."""
    pixels = Glyph.unpack_mono_bitmap(slot.bitmap)
    width, height = slot.bitmap.width, slot.bitmap.rows
    return Glyph(pixels, width, height)

  @staticmethod
  def unpack_mono_bitmap(bitmap):
    """
    Unpack a freetype FT_LOAD_TARGET_MONO glyph bitmap into a bytearray where
    each pixel is represented by a single byte.
    """
    # Allocate a bytearray of sufficient size to hold the glyph bitmap.
    data = bytearray(bitmap.rows * bitmap.width)

    # Iterate over every byte in the glyph bitmap. Note that we're not
    # iterating over every pixel in the resulting unpacked bitmap --
    # we're iterating over the packed bytes in the input bitmap.
    for y in range(bitmap.rows):
      for byte_index in range(bitmap.pitch):

        # Read the byte that contains the packed pixel data.
        byte_value = bitmap.buffer[y * bitmap.pitch + byte_index]

        # We've processed this many bits (=pixels) so far. This determines
        # where we'll read the next batch of pixels from.
        num_bits_done = byte_index * 8

        # Pre-compute where to write the pixels that we're going
        # to unpack from the current byte in the glyph bitmap.
        rowstart = y * bitmap.width + byte_index * 8

        # Iterate over every bit (=pixel) that's still a part of the
        # output bitmap. Sometimes we're only unpacking a fraction of a byte
        # because glyphs may not always fit on a byte boundary. So we make sure
        # to stop if we unpack past the current row of pixels.
        for bit_index in range(min(8, bitmap.width - num_bits_done)):

          # Unpack the next pixel from the current glyph byte.
          bit = byte_value & (1 << (7 - bit_index))

          # Write the pixel to the output bytearray. We ensure that `off`
          # pixels have a value of 0 and `on` pixels have a value of 1.
          data[rowstart + bit_index] = 1 if bit else 0

    return data

Clearly, the most important parts of Glyph class are in the bitmap unpacking code. Once we’re rendering multi-character strings we’ll extend the class with additional metadata, such as the advance width that tells us the horizontal distance between glyphs.

The final part that’s missing is the Bitmap class. It’s a simple helper class for working with bytearray-based bitmaps:

class Bitmap(object):
  """
  A 2D bitmap image represented as a list of byte values. Each byte indicates
  the state of a single pixel in the bitmap. A value of 0 indicates that
  the pixel is `off` and any other value indicates that it is `on`.
  """
  def __init__(self, width, height, pixels=None):
    self.width = width
    self.height = height
    self.pixels = pixels or bytearray(width * height)

  def __repr__(self):
    """Return a string representation of the bitmap's pixels."""
    rows = ''
    for y in range(self.height):
        for x in range(self.width):
            rows += '*' if self.pixels[y * self.width + x] else ' '
        rows += '\n'
    return rows

The class allows us to quickly experiment with font rendering in the Python REPL. Calling repr() on a Bitmap object returns a textual representation of the 2D image encoded in the bitmap. This is going to be very helpful when we start debugging our font rendering code. Next, let’s actually try to render a single glyph bitmap:

>>> fnt = Font("helvetica.ttf", 24)
>>> ch = fnt.render_character("e")
>>> repr(ch)

   *****
  *******
 ***   ***
***     **
**       **
***********
***********
**
**       **
 **     **
  ********
   *****

Great, that means our glyph rendering code works. The most complicated thing here was the bitmap unpacking code. We now continue with rendering strings with multiple characters.

Rendering multiple characters

Now that we know how to render single character glyphs we’re going to extend that functionality into rendering strings with several characters. The critical part here is glyph placement, that is, ensuring that all characters line up correctly. To render multi-character strings we make the following changes to the existing code:

Extend the Glyph class with additional metadata that tells us how characters are placed next to each other (advance width, top-side bearing, ascent, and descent).
Implement a two pass algorithm for rendering strings:
- Pass 1: Compute the dimensions of the bitmap for a given string.
- Pass 2: Successively draw the glyph for each character into an output bitmap.

Once we’ve completed these steps we’ll be able to render strings such as this one:

We start with extending the Glyph class with fields for the glyph’s advance width, top-side bearing, ascent, and descent. I’ll briefly explain the purpose of these fields before we continue. If you want to learn more about these glyph metrics take a look at the FreeType documentation.

The advance width tells us where to place the next character horizontally, that is, how many pixels we move to the right (or to the left) to draw the next glyph.

The ascent, descent, and the top-side bearing determine the vertical placement of the glyph. To understand vertical glyph placement the concept of the baseline is very important. The baseline is defined to be the line upon which most letters sit. The ascent and descent determine how the glyph should be placed relative to the baseline.

In western typography most letters extend above the baseline. We say that they have a positive ascent. Some letters, such as g, extend below the baseline. This means that both their ascent and descent are positive. Of course, other mixtures are also possible, for example, there may be letters with an ascent of zero but a positive descent, and so on.

The top-side bearing is the vertical distance from the glyph’s baseline to its bitmap’s top-most scanline. We need this value to compute the glyph’s ascent and descent.

While these glyph metrics seem straightforward to compute, it took me a few tries and some pencil drawing to get them right. The updated version of the Glyph class with added metrics looks like this:

class Glyph(object):
  def __init__(self, pixels, width, height, top, advance_width):
    self.bitmap = Bitmap(width, height, pixels)

    # The glyph bitmap's top-side bearing, i.e. the vertical distance from the
    # baseline to the bitmap's top-most scanline.
    self.top = top

    # Ascent and descent determine how many pixels the glyph extends
    # above or below the baseline.
    self.descent = max(0, self.height - self.top)
    self.ascent = max(0, max(self.top, self.height) - self.descent)

    # The advance width determines where to place the next character
    # horizontally, that is, how many pixels we move to the right
    # to draw the next glyph.
    self.advance_width = advance_width

  @property
  def width(self):
    return self.bitmap.width

  @property
  def height(self):
    return self.bitmap.height

Next, we’re going to work on the Font class and extend it with a two-pass algorithm for rendering multi-character strings.

The first pass computes the space occupied by the given string, that is, the dimensions of the given text as if it were rendered into a bitmap. Besides the width and height of the resulting bitmap in pixels, we also need to know the position of the baseline for correct vertical glyph placement.

We compute the overall width by summing up the advance widths for all glyphs. The overall height is determined by the maximum ascent and descent. The baseline of a multi-character string equals the maximum descent of all glyphs within¹ the string.

The resulting function text_dimensions() looks as follows:

class Font(object):
  def text_dimensions(self, text):
    """
    Return (width, height, baseline) of `text` rendered in the current font.
    """
    width = 0
    max_ascent = 0
    max_descent = 0
    previous_char = None

    # For each character in the text string we get the glyph
    # and update the overall dimensions of the resulting bitmap.
    for char in text:
      glyph = self.glyph_for_character(char)
      max_ascent = max(max_ascent, glyph.ascent)
      max_descent = max(max_descent, glyph.descent)
      width += glyph.advance_width
      previous_char = char

    height = max_ascent + max_descent
    return (width, height, max_descent)

The second pass successively draws the glyph images into an output Bitmap. For the second pass we must know the text dimensions in order to allocate a bitmap of sufficient size and to correctly place each character vertically.

You can see the render_text() function that performs the second pass here:

class Font(object):
  def render_text(self, text, width=None, height=None, baseline=None):
    """
    Render the given `text` into a Bitmap and return it.

    If `width`, `height`, and `baseline` are not specified they
    are computed using the `text_dimensions' method.
    """
    if None in (width, height, baseline):
        width, height, baseline = self.text_dimensions(text)

    x = 0
    previous_char = None
    outbuffer = Bitmap(width, height)

    for char in text:
      glyph = self.glyph_for_character(char)
      y = height - glyph.ascent - baseline
      outbuffer.bitblt(glyph.bitmap, x, y)
      x += glyph.advance_width
      previous_char = char

    return outbuffer

Drawing characters into the outbuffer bitmap is done by Bitmap.bitblit(). It performs a bit blit operation to copy pixels from one bitmap into another:

class Bitmap(object):
  def bitblt(self, src, x, y):
    """Copy all pixels from `src` into this bitmap, starting at (`x`, `y`)."""
    srcpixel = 0
    dstpixel = y * self.width + x
    row_offset = self.width - src.width

    for sy in range(src.height):
      for sx in range(src.width):
        self.pixels[dstpixel] = src.pixels[srcpixel]
        srcpixel += 1
        dstpixel += 1
      dstpixel += row_offset

Using the new code we’re able to render our first multi-character string:

>>> fnt = Font("helvetica.ttf", 24)
>>> txt = fnt.render_text("hello")
>>> repr(txt)

**                        **   **
**                        **   **
**                        **   **
**                        **   **
**                        **   **
** *****        *****     **   **      ******
*********      *******    **   **     ********
****   ***    ***   ***   **   **    ***    ***
***     **   ***     **   **   **   ***      ***
**      **   **       **  **   **   **        **
**      **   ***********  **   **   **        **
**      **   ***********  **   **   **        **
**      **   **           **   **   **        **
**      **   **       **  **   **   ***      ***
**      **    **     **   **   **    ***    ***
**      **     ********   **   **     ********
**      **      *****     **   **      ******

Great, this is starting to look useful. The tricky parts in this section were handling the advance width and vertical glyph placement correctly. So, be sure to also try some combinations of characters that descent below the baseline. For example, the string “greetings, world” should render correctly with parts of the g and the comma descending below the baseline.

Adding kerning support

Kerning adjusts the horizontal space between glyphs to achieve visually pleasing typography. A typical example where kerning leads to a more pleasing result is the letter pair AV. With kerning the bounding boxes of both letters overlap slightly to prevent superfluous horizontal space. In the following picture the first line was rendered without kerning and the second line was rendered with kerning:

As you can see, kerning is a visual optimization – it’s not mandatory but can make quite a difference in the quality of your text rendering. For displaying text on a 128 × 64 pixels monochrome display it’s probably overkill to implement kerning². But with FreeType it’s reasonably simple to add kerning support so let’s go ahead with it anyways.

To add kerning to our existing codebase we need to make three changes:

Add a way to access kerning information for a character pair.
Take kerning information into account during multi-character rendering.
Fix a small visual artefact in the glyph drawing code.

So we start by extending the Font class with the following function that returns the kerning offset for a character pair, that is, two characters that are to be drawn in sequence:

class Font(object):
  def kerning_offset(self, previous_char, char):
    """
    Return the horizontal kerning offset in pixels when rendering `char`
    after `previous_char`.
    """
    kerning = self.face.get_kerning(previous_char, char)

    # The kerning offset is given in FreeType's 26.6 fixed point format,
    # which means that the pixel values are multiples of 64.
    return kerning.x / 64

We then use the resulting kerning offset to adjust the glyph’s drawing position. This reduces extraneous horizontal whitespace.

Let’s go back briefly to our kerning example with the letter pair AV. We saw there that the the glyph bitmaps for A and V overlapped slightly. In this case the glyph for V has a negative horizontal kerning offset and it is moved slightly left towards the A. To do this automatically we update Font.text_dimensions() and Font.render_text() to take the kerning offset into account:

class Font(object):
  def text_dimensions(self, text):
    width = 0
    max_ascent = 0
    max_descent = 0
    previous_char = None

    for char in text:
      glyph = self.glyph_for_character(char)
      max_ascent = max(max_ascent, glyph.ascent)
      max_descent = max(max_descent, glyph.descent)
      kerning_x = self.kerning_offset(previous_char, char)

      # With kerning, the advance width may be less than the width of the
      # glyph's bitmap. Make sure we compute the total width so that
      # all of the glyph's pixels fit into the returned dimensions.
      width += max(glyph.advance_width + kerning_x, glyph.width + kerning_x)

      previous_char = char

    height = max_ascent + max_descent
    return (width, height, max_descent)

class Font(object):
  def render_text(self, text, width=None, height=None, baseline=None):
    if None in (width, height, baseline):
        width, height, baseline = self.text_dimensions(text)

    x = 0
    previous_char = None
    outbuffer = Bitmap(width, height)

    for char in text:
      glyph = self.glyph_for_character(char)

      # Take kerning information into account before we render the
      # glyph to the output bitmap.
      x += self.kerning_offset(previous_char, char)

      # The vertical drawing position should place the glyph
      # on the baseline as intended.
      y = height - glyph.ascent - baseline

      outbuffer.bitblt(glyph.bitmap, x, y)

      x += glyph.advance_width
      previous_char = char

    return outbuffer

If we run the code at this stage we’ll see that it adjusts the glyph placement correctly – but produces unpleasant visual artefacts in some cases. If the glyph bounding boxes overlap, the glyph rendered last overwrites some of the previous glyph’s pixels.

To fix this visual artefact we update Bitmap.bitblt() with a simple blending operation. We need this to draw text that contains glyphs with overlapping bounding boxes correctly. The updated method looks as follows:

class Bitmap(object):
  def bitblt(self, src, x, y):
    """Copy all pixels from `src` into this bitmap"""
    srcpixel = 0
    dstpixel = y * self.width + x
    row_offset = self.width - src.width

    for sy in range(src.height):
      for sx in range(src.width):
        # Perform an OR operation on the destination pixel and the source pixel
        # because glyph bitmaps may overlap if character kerning is applied,
        # e.g. in the string "AVA", the "A" and "V" glyphs must be rendered
        # with overlapping bounding boxes.
        self.pixels[dstpixel] = self.pixels[dstpixel] or src.pixels[srcpixel]
        srcpixel += 1
        dstpixel += 1
      dstpixel += row_offset

Once you’ve made the change you should see the visual artefacts from glyph overlapping disappear. Congratulations for implementing kerning support! This also concludes

Example code / Demo

To see how it all fits together you can access the full source code here as a GitHub Gist.

For the example program to run you need to install freetype-py. Additionally, place a font file called helvetica.ttf in the program’s working directory.

What next?

Here are a few ideas for making this code more useful and/or to have some fun with it. If this article was helpful to you or if you’ve got suggestions I’d love to hear from you.

Add a glyph cache to optimize text rendering. Rendering the same characters repeatedly should not require unpacking the glyph’s bitmap each time.
Add support for rendering multiline text. This should take the font’s linegap value into account. Check the FreeType documentation for more information.
Add support for vertical text rendering.
Define your own file format for (bitmap) fonts and make the code work without FreeType.
Use this code to implement a homebrew version of BSD’s banner.

A character string doesn’t really contain glyphs. Instead it contains characters that each map to a glyph as determined by the font face. ↩
It is overkill but I couldn’t really stop before seeing it work. Currently, I’m also not using any fonts that have kerning information on my radio LCD. I learned quite a bit about typography, though… ↩

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

A countdown timer extension for Alfred

Sat, 26 Jan 2013 00:00:00 GMT

A countdown timer extension for Alfred

I wrote a countdown timer extension for the Alfred application launcher for OS X. The extension is open-source, written in Python and uses Mountain Lion’s user notifications.

What is this?

I use countdown timers several times each day. Need to brew some tea? Setup a timer. Need to catch the bus in 20 minutes? Setup a timer. Waiting for the laundry to finish? Setup a timer. You get the idea. Because I use this functionality so much it has to be convenient. Getting out my phone, launching a timer app and telling it to start the countdown is not convenient. Much in the same way, launching applications on OS X is also not convenient if you do it by navigating to the Applications folder and double-clicking an icon. Luckily there is a nice solution for both problems. It is called Alfred.

Alfred is a Spotlight-based application launcher that uses a text-based interface. But this description does not really do it justice. Alfred is much more than that. Alfred not only launches applications it also allows you to navigate the filesystem and to access various contextual actions, for example deleting a file or emailing it to someone. Alfred is very flexible and you can extend it with custom commands called extensions. Extensions can either provide new contextual actions or additional commands.

For my daily countdown timer needs I wrote an Alfred extension. The extension allows me to start countdown timers quickly and without hassle. The extension uses Mountain Lion’s user notifications and sounds to tell you when the time is up.

All code for the extension is available on my GitHub. It is written in Python, so take a look if you are interested in extending Alfred with Python or if you want to find out how to work with Mountain Lion’s user notifications from Python.

What are the benefits?

Helps you make great tea.
Solves your Pomodoro needs.
Uses Mountain Lion’s User Notifications to tell you when time’s up.
Plays a non-intrusive alarm sound.
Allows you to run multiple timers at the same time.
Allows you to add an optional label to the timer, e.g. “Laundry is done!”.
Shows you how to write Alfred extensions in Python.

How to install it?

Depending on whether you’re running Alfred 1 or Alfred 2 you need different versions of the extension. Please also note that this extension / workflow requires OS X Mountain Lion (10.8) or greater to work. Additionally, you need the Alfred PowerPack.

For Alfred 2 download and double-click Timer.alfredworkflow to install the workflow.

For Alfred 1 download and double-click Timer.alfredextension to install the extension.

How to use it?

The general syntax is timer [minutes] [optional:title]
timer 5 sets a countdown timer that goes off after 5 minutes.
timer 0:30 or timer 0.5 sets a timer that goes off after 30 seconds.
timer 40 Laundry is done! adds an optional title to the timer.
timer displays usage information.

Update: Even more notification fixes

Thanks to some more hacking (GitHub issues #1 and #6) the problem where notifications would fail to display was resolved. Additionally, all notification windows now correctly display Alfred’s application icon. These fixes are included in the latest version of the workflow / extension.

Update: Support for Alfred v2

I’ve repackaged the extension into an Alfred v2 workflow. This means it’s now also possible to use the timer from Alfred v2. Functionality and usage are the same in both versions.

Update: Timer labels

Alexander Lehmann suggested that the timer could be improved by adding an additonal label argument. The label is displayed when the timer starts and when it fires. This helps running multiple timers at once without getting confused. Labels are added by simply by typing the label after the time interval. For example, timer 3:30 tea is done will get you the following result:

If you are interested in Scala or writing raytracers in Lisp then you should definitely check out Alexander’s blog.

Update: Notification fixes

Jay Zawrotny has reported an issue with the extension where notifications would not fire correctly. I believe that there is a codesigning problem on systems where Mountain Lion’s vanilla Python install is replaced. If you have issues with getting notifications to show up please try Jay’s suggested fix from this pull request on GitHub. Thanks Jay!

[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

dbader.org - Python

Interfacing Python and C: The CFFI Module

The C Library Code

‘Out-of-line’ vs ‘in-line’ interfaces

Building and running CFFI-based scripts on Linux

Building the C Interface

Creating simple Python classes to mirror C structures

Passing structures by reference

Working around some CFFI limitations

Conclusion

Write More Pythonic Code by Applying the Things You Already Know

Now, what’s the takeaway from all of this?

Working With File I/O in Python

Binary vs Text Files in Python

Where to Find Python’s File I/O Tools

Opening a File in Python

Closing a File in Python

Working With Python File Objects

Reading Data From a File in Python

Reading Text Files Line-by-Line With readline()

Processing an Entire Text File Line-By-Line

Writing to a File With Python Using write()

File Seeks: Moving the Read/Write Pointer

Editing an Existing Text File with Python

Python File I/O – Additional Resources

How to Reverse a String in Python

Option 1: Reversing a Python String With the “[::-1]” Slicing Trick

Option 2: Reversing a Python String Using reversed() and str.join()

Option 3: The “Classic” In-Place String Reversal Algorithm Ported to Python

Performance Comparison

Summary: Reversing Strings in Python

Mastering Click: Writing Advanced Python Command-Line Apps

Building on our existing Python command-line app

Storing the API key in an environment variable

Separating functionality into sub-commands

Storing the API key in a configuration file using another sub-command

Asking the user for command-line input

Introducing Click’s parameter types

Building a custom parameter type to validate user input

Using the Click context to pass parameters between commands

Advanced Python CLIs with Click — Summary

Full code example

Working with Random Numbers in Python

Generating Random Floats Between 0.0 and 1.0

Generating Random Ints Between x and y

Generating Random Floats Between x and y

Picking a Random Element From a List

Randomizing a List of Elements

Picking n Random Samples From a List of Elements

Generating Cryptographically Secure Random Numbers

How to Send an Email With Python

Understanding Email Basics

Sending Email in Python With the smtplib Module

🔐 Enabling SMTP access in Gmail

Additional Resources

Python Tricks: The Book Is Now Available on Kindle

Anyway, here’s the deal:

Where to Get the Kindle Version

Thank You

Learn More About the Kindle Version

Python Multi-line Comments: Your Two Best Options

Option 1: Consecutive Single-line Comments

Option 2: Using Multi-line Strings as Comments

Multi-line Comments in Python – Key Takeaways

Writing Python Command-Line Tools With Click

Why should you write Python command-line scripts and tools?

Basics of a command-line interface

Command-line frameworks available in the Python 3.x standard library

click vs argparse: A better alternative?

Building a simple Python command-line interface with click

A more realistic Python CLI example with click

⏰ Sidebar: Making your click command executable

Parsing a mandatory parameter with click

Parsing optional parameters with click

Adding auto-generated usage instructions to your Python command-line tool

Python CLIs with click: Summary & Recap

Python’s enumerate() Function Demystified

Make Your Loops More Pythonic With enumerate()

Changing the Starting Index

How enumerate() Works Behind The Scenes

Reading Text Files Line-by-Line With `readline()`

Writing to a File With Python Using `write()`

Option 1: Reversing a Python String With the “`[::-1]`” Slicing Trick

Option 2: Reversing a Python String Using `reversed()` and `str.join()`

Generating Random Ints Between `x` and `y`

Generating Random Floats Between `x` and `y`

Picking `n` Random Samples From a List of Elements

Sending Email in Python With the `smtplib` Module

`click` vs `argparse`: A better alternative?

Building a simple Python command-line interface with `click`

A more realistic Python CLI example with `click`

⏰ Sidebar: Making your `click` command executable

Parsing a mandatory parameter with `click`

Parsing optional parameters with `click`

Python CLIs with `click`: Summary & Recap

Make Your Loops More Pythonic With `enumerate()`

How `enumerate()` Works Behind The Scenes

The `enumerate` Function in Python – Key Takeaways

⏰ Sidebar: `timeit.timeit` Arguments

Python Memoization with `functools.lru_cache`

Why You Should Prefer `functools.lru_cache`

Wrapping `ctypes` Functions