Finding the minimum or maximum element of a list of lists1 based on a specific property of the inner lists is a common situation that can be challenging for someone new to Python.
To give us a more concrete example to work with, let’s say we have the following list of item, weight pairs2:
nested_list = [['cherry', 7], ['apple', 100], ['anaconda', 1360]]
We want Python to select the minimum and maximum element based on each item’s weight stored at index 1. We expect
max to return the following elements:
But if we simply call
max on that nested list we don’t get the results we expected.
The ordering we get seems to be based on the item’s name, stored at index 0:
>>> min(nested_list) ['anaconda', 1360] # Not what we expected! >>> max(nested_list) ['cherry', 7] # Not what we expected!
Alright, why does it pick the wrong elements?
Let’s stop for a moment to think about how Python’s
max function works internally. The algorithm looks something like this:
def my_max(sequence): """Return the maximum element of a sequence""" if not sequence: raise ValueError('empty sequence') maximum = sequence for item in sequence: if item > maximum: maximum = item return maximum
The interesting bit of behavior here can be found in the condition that selects a new maximum:
if item > maximum:.
This condition works nicely if
sequence only contains primitive types like
float because comparing those is straightforward (in the sense that it’ll give an answer that we intuitively expect; like
3 > 2).
sequence contains other sequences then things get a little more complex. Let’s look at the Python docs to learn how Python compares sequences:
Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.
max needs to compare two sequences to find the “larger” element then Python’s default comparison behavior might not be what we want3.
Now that we understand why we get an unexpected result we can think about ways to fix our code.
How can we change the comparison behavior?
We need to tell
max to compare the items differently.
In our example, Python’s
max looks at the first item in each inner list (the string
anaconda) and compares it with the current maximum element. That’s why it returns
cherry as the maximum element if we just call
How do we tell
max to compare the second item of each inner list?
Let’s imagine we had an updated version of
my_max_by_weight that uses the second element of each inner list for comparison:
def my_max_by_weight(sequence): if not sequence: raise ValueError('empty sequence') maximum = sequence for item in sequence: # Compare elements by their weight stored # in their second element. if item > maximum: maximum = item return maximum
That would do the trick! We can see that
my_max_by_weight selects the maximum element we expected:
>>> my_max_by_weight(nested_list) ['anaconda', 1360]
Now imagine we needed to find the maximum of different kinds of lists.
Perhaps the index (or key) we’re interested in won’t always be the second item. Maybe sometimes it’ll be the third or fourth item, or a different kind of lookup is necessary all together.
Wouldn’t it be great if we could reuse the bulk of the code in our implementation of
my_max? Some parts of it will always work the same, for example checking if an empty sequence was passed to the function.
How can we make max() more flexible?
Because Python allows us to treat functions as data we can extract the code selecting the comparison key into its own function. We’ll call that the key func. We can write different kinds of key funcs and pass them to
my_max as necessary.
This gives us complete flexibility! Instead of just being able to choose a specific list index for the comparison, like index 1 or 2, we can tell our function to select something else entirely – for example, the length of the item’s name.
Let’s have a look at some code that implements this idea:
def identity(x): return x def my_max(sequence, key_func=None): """ Return the maximum element of a sequence. key_func is an optional one-argument ordering function. """ if not sequence: raise ValueError('empty sequence') if not key_func: key_func = identity maximum = sequence for item in sequence: # Ask the key func which property to compare if key_func(item) > key_func(maximum): maximum = item return maximum
In the code example you can see how by default we let
my_max use a key func we called
identity, which just uses the whole, unmodified item to do the comparison.
identity as the key func we expect
my_max to behave the same way
nested_list = [['cherry', 7], ['apple', 100], ['anaconda', 1360]] >>> my_max(nested_list) ['cherry', 7]
And we can confirm that we’re still getting the same (incorrect) result as before, which is a pretty good indication that we didn’t screw up the implementation completely 😃.
Now comes the cool part – we’re going to override the comparison behavior by writing a
key_func that returns the second sub-element instead of the element itself4:
def weight(x): return x >>> my_max(nested_list, key_func=weight) ['anaconda', 1360]
And voilà, this is the maximum element we expected to get!
Just to demonstrate the amount of flexibility this refactoring gave us, here’s a
key_func that selects the maximum element based on the length of the item’s name:
def name_length(x): return len(x) >>> my_max(nested_list, key_func=name_length) ['anaconda', 1360]
Is there a shorthand for this stuff?
Instead of defining the key func explicitly with
def and giving it a name we can also use Python’s
lambda keyword to define a function anonymously. This shortens the code quite a bit (and won’t create a named function):
my_max(nested_list, key_func=lambda x: x) >>> ['anaconda', 1360]
To make the naming a little slicker (albeit less expressive) imagine we’ll shorten the
key_func arg to
key and we’ve arrived at a code snippet that works with the
max function in vanilla Python.
This means we’ll no longer need our own re-implementation of Python’s
max function to find the “correct” maximum element:
# This is pure, vanilla Python: >>> max(nested_list, key=lambda x: x) ['anaconda', 1360]
The same also works for Python’s built-in
>>> min(nested_list, key=lambda x: x) ['cherry', 7]
It even works for Python’s
sorted function, making the “key func” concept really valuable in a number of situations you might face as a Python developer:
>>> sorted(nested_list, key=lambda x: x) [['cherry', 7], ['apple', 100], ['anaconda', 1360]]