Python Training by Dan Bader

How to use Python’s min() and max() with nested lists

Let’s talk about using Python’s min and max functions on a list containing other lists. Sometimes this is referred to as a nested list or a lists of lists.

Finding the minimum or maximum element of a list of lists1 based on a specific property of the inner lists is a common situation that can be challenging for someone new to Python.

To give us a more concrete example to work with, let’s say we have the following list of item, weight pairs2:

nested_list = [['cherry', 7], ['apple', 100], ['anaconda', 1360]]

We want Python to select the minimum and maximum element based on each item’s weight stored at index 1. We expect min and max to return the following elements:

  • min(nested_list) should be ['cherry', 7]
  • max(nested_list) should be ['anaconda', 1360]

But if we simply call min and max on that nested list we don’t get the results we expected.

The ordering we get seems to be based on the item’s name, stored at index 0:

>>> min(nested_list)
['anaconda', 1360]  # Not what we expected!

>>> max(nested_list)
['cherry', 7]  # Not what we expected!

Alright, why does it pick the wrong elements?

Let’s stop for a moment to think about how Python’s max function works internally. The algorithm looks something like this:

def my_max(sequence):
    """Return the maximum element of a sequence"""
    if not sequence:
        raise ValueError('empty sequence')

    maximum = sequence[0]

    for item in sequence:
        if item > maximum:
            maximum = item

    return maximum

The interesting bit of behavior here can be found in the condition that selects a new maximum: if item > maximum:.

This condition works nicely if sequence only contains primitive types like int or float because comparing those is straightforward (in the sense that it’ll give an answer that we intuitively expect; like 3 > 2).

However, if sequence contains other sequences then things get a little more complex. Let’s look at the Python docs to learn how Python compares sequences:

Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.

When max needs to compare two sequences to find the “larger” element then Python’s default comparison behavior might not be what we want3.

Now that we understand why we get an unexpected result we can think about ways to fix our code.

How can we change the comparison behavior?

We need to tell max to compare the items differently.

In our example, Python’s max looks at the first item in each inner list (the string cherry, apple, or anaconda) and compares it with the current maximum element. That’s why it returns cherry as the maximum element if we just call max(nested_list).

How do we tell max to compare the second item of each inner list?

Let’s imagine we had an updated version of my_max called my_max_by_weight that uses the second element of each inner list for comparison:

def my_max_by_weight(sequence):
    if not sequence:
        raise ValueError('empty sequence')

    maximum = sequence[0]

    for item in sequence:
        # Compare elements by their weight stored
        # in their second element.
        if item[1] > maximum[1]:
            maximum = item

    return maximum

That would do the trick! We can see that my_max_by_weight selects the maximum element we expected:

>>> my_max_by_weight(nested_list)
['anaconda', 1360]

Now imagine we needed to find the maximum of different kinds of lists.

Perhaps the index (or key) we’re interested in won’t always be the second item. Maybe sometimes it’ll be the third or fourth item, or a different kind of lookup is necessary all together.

Wouldn’t it be great if we could reuse the bulk of the code in our implementation of my_max? Some parts of it will always work the same, for example checking if an empty sequence was passed to the function.

How can we make max() more flexible?

Because Python allows us to treat functions as data we can extract the code selecting the comparison key into its own function. We’ll call that the key func. We can write different kinds of key funcs and pass them to my_max as necessary.

This gives us complete flexibility! Instead of just being able to choose a specific list index for the comparison, like index 1 or 2, we can tell our function to select something else entirely – for example, the length of the item’s name.

Let’s have a look at some code that implements this idea:

def identity(x):
    return x

def my_max(sequence, key_func=None):
    Return the maximum element of a sequence.
    key_func is an optional one-argument ordering function.
    if not sequence:
        raise ValueError('empty sequence')

    if not key_func:
        key_func = identity

    maximum = sequence[0]

    for item in sequence:
        # Ask the key func which property to compare
        if key_func(item) > key_func(maximum):
            maximum = item

    return maximum

In the code example you can see how by default we let my_max use a key func we called identity, which just uses the whole, unmodified item to do the comparison.

With identity as the key func we expect my_max to behave the same way max behaves.

nested_list = [['cherry', 7], ['apple', 100], ['anaconda', 1360]]

>>> my_max(nested_list)
['cherry', 7]

And we can confirm that we’re still getting the same (incorrect) result as before, which is a pretty good indication that we didn’t screw up the implementation completely 😃.

Now comes the cool part – we’re going to override the comparison behavior by writing a key_func that returns the second sub-element instead of the element itself4:

def weight(x):
    return x[1]

>>> my_max(nested_list, key_func=weight)
['anaconda', 1360]

And voilà, this is the maximum element we expected to get!

Just to demonstrate the amount of flexibility this refactoring gave us, here’s a key_func that selects the maximum element based on the length of the item’s name:

def name_length(x):
    return len(x[0])

>>> my_max(nested_list, key_func=name_length)
['anaconda', 1360]

Is there a shorthand for this stuff?

Instead of defining the key func explicitly with def and giving it a name we can also use Python’s lambda keyword to define a function anonymously. This shortens the code quite a bit (and won’t create a named function):

my_max(nested_list, key_func=lambda x: x[1])
>>> ['anaconda', 1360]

To make the naming a little slicker (albeit less expressive) imagine we’ll shorten the key_func arg to key and we’ve arrived at a code snippet that works with the max function in vanilla Python.

This means we’ll no longer need our own re-implementation of Python’s max function to find the “correct” maximum element:

# This is pure, vanilla Python:
>>> max(nested_list, key=lambda x: x[1])
['anaconda', 1360]

The same also works for Python’s built-in min:

>>> min(nested_list, key=lambda x: x[1])
['cherry', 7]

It even works for Python’s sorted function, making the “key func” concept really valuable in a number of situations you might face as a Python developer:

>>> sorted(nested_list, key=lambda x: x[1])
[['cherry', 7], ['apple', 100], ['anaconda', 1360]]

Try it out yourself

I hope this post helped you out. What started out as a simple question ended up being a little more involved than you may have expected. But it’s often like that when you learn about new programming concepts.

Feel free to drop me a line of Twitter or over email if you got stuck anywhere. I’d love to improve this tutorial over time :)

  1. Sometimes you’ll see tuples used for the inner lists. Using tuples instead of lists doesn’t really make a difference for how min and max work, but in some cases it can bring a performance benefit. Nothing we’ll have to worry about for now. The code in this tutorial will work fine on a list of tuples. 

  2. I actually googled these for you. Apparently the average cherry sold in a super market weighs 7 grams. I’m not 100 per cent sure about anacondas though. 

  3. Note that Python strings are also sequences so when you compare two strings they will be compared character by character. 

  4. Because this is such a common piece of functionality the Python standard library includes often-used key func implementations in the operator module. You may want to check out operator.getitem, operator.itemgetter, and operator.attrgetter

<strong><em>Improve Your Python</em></strong> with a fresh 🐍 <strong>Python Trick</strong> 💌 every couple of days

Improve Your Python with a fresh 🐍 Python Trick 💌 every couple of days

🔒 No spam ever. Unsubscribe any time.

This article was filed under: programming, and python.

Related Articles:
  • Finding and Choosing Quality Python Packages – PyPI, the Python packaging repository, just crossed 100,000 third-party packages in total the other week. That’s an overwhelming number of packages to choose from.
  • A countdown timer extension for Alfred – I wrote a countdown timer extension for the Alfred application launcher for OS X. The extension is open-source, written in Python and uses Mountain Lion’s user notifications.
  • Monochrome font rendering with FreeType and Python – For my Raspberry Pi internet radio project I needed a way to render text suitable for a low resolution monochrome LCD. This article describes how to render 1-bit text using FreeType and Python.
  • Setting up Sublime Text for Python development – I recently started using Sublime Text 2 more and more as my main editor for Python development. This article explains my setup and some tweaks that make Python programmers happy.
  • Functional linked lists in Python – Linked lists are fundamental data structures that every programmer should know. This article explains how to implement a simple linked list data type in Python using a functional programming style.
Improve Your Python with 🐍 Python Tricks 💌

Improve Your Python with 🐍 Python Tricks 💌
Get a short & sweet Python code snippet delivered to your inbox every couple of days:
» Click here to see examples

Latest Articles:
What the Virtualenv?!

What the Virtualenv?!
See how to avoid common Python packaging pitfalls with this free email course:
» Click here to get the first lesson

← Browse All Articles