How to use Python’s min() and max() with nested lists
Let’s talk about using Python’s min and max functions on a list containing other lists. Sometimes this is referred to as a nested list or a lists of lists.
Finding the minimum or maximum element of a list of lists1 based on a specific property of the inner lists is a common situation that can be challenging for someone new to Python.
To give us a more concrete example to work with, let’s say we have the following list of item, weight pairs2:
nested_list = [['cherry', 7], ['apple', 100], ['anaconda', 1360]]
We want Python to select the minimum and maximum element based on each item’s weight stored at index 1. We expect min
and max
to return the following elements:
min(nested_list)
should be['cherry', 7]
max(nested_list)
should be['anaconda', 1360]
But if we simply call min
and max
on that nested list we don’t get the results we expected.
The ordering we get seems to be based on the item’s name, stored at index 0:
>>> min(nested_list) ['anaconda', 1360] # Not what we expected! >>> max(nested_list) ['cherry', 7] # Not what we expected!
Alright, why does it pick the wrong elements?
Let’s stop for a moment to think about how Python’s max
function works internally. The algorithm looks something like this:
def my_max(sequence): """Return the maximum element of a sequence""" if not sequence: raise ValueError('empty sequence') maximum = sequence[0] for item in sequence: if item > maximum: maximum = item return maximum
The interesting bit of behavior here can be found in the condition that selects a new maximum: if item > maximum:
.
This condition works nicely if sequence
only contains primitive types like int
or float
because comparing those is straightforward (in the sense that it’ll give an answer that we intuitively expect; like 3 > 2
).
However, if sequence
contains other sequences then things get a little more complex. Let’s look at the Python docs to learn how Python compares sequences:
Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.
When max
needs to compare two sequences to find the “larger” element then Python’s default comparison behavior might not be what we want3.
Now that we understand why we get an unexpected result we can think about ways to fix our code.
How can we change the comparison behavior?
We need to tell max
to compare the items differently.
In our example, Python’s max
looks at the first item in each inner list (the string cherry
, apple
, or anaconda
) and compares it with the current maximum element. That’s why it returns cherry
as the maximum element if we just call max(nested_list)
.
How do we tell max
to compare the second item of each inner list?
Let’s imagine we had an updated version of my_max
called my_max_by_weight
that uses the second element of each inner list for comparison:
def my_max_by_weight(sequence): if not sequence: raise ValueError('empty sequence') maximum = sequence[0] for item in sequence: # Compare elements by their weight stored # in their second element. if item[1] > maximum[1]: maximum = item return maximum
That would do the trick! We can see that my_max_by_weight
selects the maximum element we expected:
>>> my_max_by_weight(nested_list) ['anaconda', 1360]
Now imagine we needed to find the maximum of different kinds of lists.
Perhaps the index (or key) we’re interested in won’t always be the second item. Maybe sometimes it’ll be the third or fourth item, or a different kind of lookup is necessary all together.
Wouldn’t it be great if we could reuse the bulk of the code in our implementation of my_max
? Some parts of it will always work the same, for example checking if an empty sequence was passed to the function.
How can we make max() more flexible?
Because Python allows us to treat functions as data we can extract the code selecting the comparison key into its own function. We’ll call that the key func. We can write different kinds of key funcs and pass them to my_max
as necessary.
This gives us complete flexibility! Instead of just being able to choose a specific list index for the comparison, like index 1 or 2, we can tell our function to select something else entirely – for example, the length of the item’s name.
Let’s have a look at some code that implements this idea:
def identity(x): return x def my_max(sequence, key_func=None): """ Return the maximum element of a sequence. key_func is an optional one-argument ordering function. """ if not sequence: raise ValueError('empty sequence') if not key_func: key_func = identity maximum = sequence[0] for item in sequence: # Ask the key func which property to compare if key_func(item) > key_func(maximum): maximum = item return maximum
In the code example you can see how by default we let my_max
use a key func we called identity
, which just uses the whole, unmodified item to do the comparison.
With identity
as the key func we expect my_max
to behave the same way max
behaves.
nested_list = [['cherry', 7], ['apple', 100], ['anaconda', 1360]] >>> my_max(nested_list) ['cherry', 7]
And we can confirm that we’re still getting the same (incorrect) result as before, which is a pretty good indication that we didn’t screw up the implementation completely 😃.
Now comes the cool part – we’re going to override the comparison behavior by writing a key_func
that returns the second sub-element instead of the element itself4:
def weight(x): return x[1] >>> my_max(nested_list, key_func=weight) ['anaconda', 1360]
And voilà, this is the maximum element we expected to get!
Just to demonstrate the amount of flexibility this refactoring gave us, here’s a key_func
that selects the maximum element based on the length of the item’s name:
def name_length(x): return len(x[0]) >>> my_max(nested_list, key_func=name_length) ['anaconda', 1360]
Is there a shorthand for this stuff?
Instead of defining the key func explicitly with def
and giving it a name we can also use Python’s lambda
keyword to define a function anonymously. This shortens the code quite a bit (and won’t create a named function):
my_max(nested_list, key_func=lambda x: x[1]) >>> ['anaconda', 1360]
To make the naming a little slicker (albeit less expressive) imagine we’ll shorten the key_func
arg to key
and we’ve arrived at a code snippet that works with the max
function in vanilla Python.
This means we’ll no longer need our own re-implementation of Python’s max
function to find the “correct” maximum element:
# This is pure, vanilla Python: >>> max(nested_list, key=lambda x: x[1]) ['anaconda', 1360]
The same also works for Python’s built-in min
:
>>> min(nested_list, key=lambda x: x[1]) ['cherry', 7]
It even works for Python’s sorted
function, making the “key func” concept really valuable in a number of situations you might face as a Python developer:
>>> sorted(nested_list, key=lambda x: x[1]) [['cherry', 7], ['apple', 100], ['anaconda', 1360]]