Python Training by Dan Bader

The 4 Major Ways to Do String Formatting in Python

Remember the Zen of Python and how there should be “one obvious way to do something in Python”? You might scratch your head when you find out that there are *four* major ways to do string formatting in Python.

String Formatting in Python (updated for Python 3.6 and above)

In this article I’ll demonstrate how these four string formatting approaches work and what their respective strengths and weaknesses are. I’ll also give you my simple “rule of thumb” for how I pick the best general purpose string formatting approach.

Let’s jump right in, as we’ve got a lot to cover. In order to have a simple toy example for experimentation, let’s assume we’ve got the following variables (or constants, really) to work with:

>>> errno = 50159747054
>>> name = 'Bob'

And based on these variables we’d like to generate an output string containing a simple error message:

'Hey Bob, there is a 0xbadc0ffee error!'

Hey… now that error could really spoil a dev’s Monday morning. But we’re here to discuss string formatting. So let’s get to work.

#1 – “Old Style” String Formatting (%-operator)

Strings in Python have a unique built-in operation that can be accessed with the %-operator. This lets you do simple positional formatting very easily. If you’ve ever worked with a printf-style function in C you’ll recognize how this works instantly. Here’s a simple example:

>>> 'Hello, %s' % name

"Hello, Bob"

I’m using the %s format specifier here to tell Python where to substitute the value of name, represented as a string.

There are other format specifiers available that let you control the output format. For example it’s possible to convert numbers to hexadecimal notation or to add whitespace padding to generate nicely formatted tables and reports (cf. Python Docs: “printf-style String Formatting”).

Here, we can use the %x format specifier to convert an int value to a string and to represent it as a hexadecimal number:

>>> '%x' % errno

'badc0ffee'

The “old style” string formatting syntax changes slightly if you want to make multiple substitutions in a single string. Because the %-operator only takes one argument you need to wrap the right-hand side in a tuple, like so:

>>> 'Hey %s, there is a 0x%x error!' % (name, errno)

'Hey Bob, there is a 0xbadc0ffee error!'

It’s also possible to refer to variable substitutions by name in your format string, if you pass a mapping to the %-operator:

>>> 'Hey %(name)s, there is a 0x%(errno)x error!' % {
...     "name": name, "errno": errno }

'Hey Bob, there is a 0xbadc0ffee error!'

This makes your format strings easier to maintain and easier to modify in the future. You don’t have to worry about making sure the order you’re passing in the values matches up with the order the values are referenced in the format string. Of course the downside is that this technique requires a little more typing.

I’m sure you’ve been wondering why this printf-style formatting is called “old style” string formatting. It was technically superseded by “new style” formatting, which we’re going to talk about in a minute.

#2 – “New Style” String Formatting (str.format)

Python 3 introduced a new way to do string formatting that was also later back-ported to Python 2.7. This “new style” string formatting gets rid of the %-operator special syntax and makes the syntax for string formatting more regular. Formatting is now handled by calling a format() function on a string object (cf. Python Docs: “str.format”).

You can use the format() function to do simple positional formatting, just like you could with “old style” formatting:

>>> 'Hello, {}'.format(name)

'Hello, Bob'

Or, you can refer to your variable substitutions by name and use them in any order you want. This is quite a powerful feature as it allows for re-arranging the order of display without changing the arguments passed to the format function:

>>> 'Hey {name}, there is a 0x{errno:x} error!'.format(
...     name=name, errno=errno)

'Hey Bob, there is a 0xbadc0ffee error!'

This also shows that the syntax to format an int variable as a hexadecimal string has changed. Now we need to pass a format spec by adding a :x suffix. The format string syntax has become more powerful without complicating the simpler use cases. It pays off to read up on this string formatting mini-language in the Python documentation (cf. Python Docs: “Format String Syntax”).

In Python 3, this “new style” string formatting is to be preferred over %-style formatting. While “old style” formatting has been de-emphasized it has not been deprecated. It is still supported in the latest versions of Python. According to this discussion on the Python dev email list and this issue on the Python dev bug tracker, %-formatting is going to stick around for a long time to come.

Still, the official Python 3.X documentation doesn’t exactly recommend “old style” formatting or speak too fondly of it:

The formatting operations described here exhibit a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly). Using the newer formatted string literals or the str.format() interface helps avoid these errors. These alternatives also provide more powerful, flexible and extensible approaches to formatting text. (Source: Python 3 Docs)

This is why I’d personally try to stick with str.format for new code moving forward. Starting with Python 3.6 there’s yet another way to format your strings. I’ll tell you all about it in the next section.

#3 – Literal String Interpolation (Python 3.6+)

Python 3.6 adds a new string formatting approach called Formatted String Literals. This new way of formatting strings lets you use embedded Python expressions inside string constants. Here’s a simple example to give you a feel for the feature:

>>> f'Hello, {name}!'

'Hello, Bob!'

This new formatting syntax is powerful. Because you can embed arbitrary Python expressions you can even do inline arithmetic with it. See here for example:

>>> a = 5
>>> b = 10
>>> f'Five plus ten is {a + b} and not {2 * (a + b)}.'

'Five plus ten is 15 and not 30.'

Formatted string literals are a Python parser feature that converts f-strings into a series of string constants and expressions. They then get joined up to build the final string.

Imagine we had the following greet() function that contains an f-string:

>>> def greet(name, question):
...     return f"Hello, {name}! How's it {question}?"
...

>>> greet('Bob', 'going')
"Hello, Bob! How's it going?"

When we disassemble the function and inspect what’s going on behind the scenes we can see that the f-string in the function gets transformed into something similar to the following:

>>> def greet(name, question):
...    return "Hello, " + name + "! How's it " + question + "?"

The real implementation is slightly faster than that because it uses the BUILD_STRING opcode as an optimization. But functionally they’re the same:

>>> import dis
>>> dis.dis(greet)
  2           0 LOAD_CONST               1 ('Hello, ')
              2 LOAD_FAST                0 (name)
              4 FORMAT_VALUE             0
              6 LOAD_CONST               2 ("! How's it ")
              8 LOAD_FAST                1 (question)
             10 FORMAT_VALUE             0
             12 LOAD_CONST               3 ('?')
             14 BUILD_STRING             5
             16 RETURN_VALUE

String literals also support the existing format string syntax of the str.format() method. That allows you to solve the same formatting problems we’ve discussed in the previous two sections:

>>> f"Hey {name}, there's a {errno:#x} error!"

"Hey Bob, there's a 0xbadc0ffee error!"

Python’s new Formatted String Literals are similar to the JavaScript Template Literals added in ES2015. I think they’re quite a nice addition to the language and I’ve already started using them in my day to day (Python 3) work. You can learn more about Formatted String Literals in the official Python documentation (cf. Python Docs: “Formatted string literals”).

#4 – Template Strings (standard library)

Here’s one more technique for string formatting in Python: Template Strings. It’s a simpler and less powerful mechanism, but in some cases this might be exactly what you’re looking for.

Let’s take a look at a simple greeting example:

>>> from string import Template
>>> t = Template('Hey, $name!')
>>> t.substitute(name=name)

'Hey, Bob!'

You see here that we need to import the Template class from Python’s built-in string module. Template strings are not a core language feature but they’re supplied by a module in the standard library.

Another difference is that template strings don’t allow format specifiers. So in order to get our error string example to work we need to transform our int error number into a hex-string ourselves:

>>> templ_string = 'Hey $name, there is a $error error!'
>>> Template(templ_string).substitute(
...     name=name, error=hex(errno))

'Hey Bob, there is a 0xbadc0ffee error!'

That worked great. So when should you use template strings in your Python programs? In my opinion the best use case for template strings is when you’re handling format strings generated by users of your program. Due to their reduced complexity template strings are a safer choice.

The more complex formatting mini-languages of the other string formatting techniques might introduce security vulnerabilities to your programs. For example, it’s possible for format strings to access arbitrary variables in your program.

That means, if a malicious user can supply a format string they can potentially leak secret keys and other sensible information! Here’s a simple proof of concept of how this attack might be used:

>>> SECRET = 'this-is-a-secret'
>>> class Error:
...     def __init__(self):
...         pass
>>> err = Error()
>>> user_input = '{error.__init__.__globals__[SECRET]}'

# Uh-oh...
>>> user_input.format(error=err)

'this-is-a-secret'

See how a hypothetical attacker was able to extract our secret string by accessing the __globals__ dictionary? Scary, huh? Template Strings close this attack vector. And this makes them a safer choice if you’re handling format strings generated from user input:

>>> user_input = '${error.__init__.__globals__[SECRET]}'
>>> Template(user_input).substitute(error=err)

ValueError:
"Invalid placeholder in string: line 1, col 1"

Which String Formatting Method Should I Use?

I totally get that having so much choice for how to format your strings in Python can feel very confusing. This is an excellent cue to bust out this handy flowchart infographic I’ve put together for you:

String Formatting in Python -- Flowchart

This flowchart is based on the following rule of thumb that I apply when I’m writing Python:

<strong><em>Improve Your Python</em></strong> with a fresh 🐍 <strong>Python Trick</strong> 💌 every couple of days

Improve Your Python with a fresh 🐍 Python Trick 💌 every couple of days

🔒 No spam ever. Unsubscribe any time.

This article was filed under: programming, and python.

Related Articles:
What the Virtualenv?!

What the Virtualenv?!
See how to avoid common Python packaging pitfalls with this free email course:
» Click here to get the first lesson

Latest Articles:
“This book makes you write better Python code!”

“This book makes you write better Python code!”
— Bob Belderbos, Software Developer at Oracle

Learn More »

← Browse All Articles