Monday 29 December 2014

comprehending comprehensions

One of the features of Python I really like, and am using more and more, is its list comprehension.  This allows the items in a list to be manipulated as a whole, rather than having to explicitly iterate over them.

The basic form of a list comprehension is
[ expr for item in list ]
where expr is an expression in the variable item.

For example, let’s say I have heard that the term x3x, where x is an integer, is always divisible by 3.  Before trying to prove (or disprove!) this, I want to have a quick look at a few examples to check there isn’t a really simple counter-example.  I can write:
>>> [ x * x * x - x for x in range(-4,10) ]
[-60, -24, -6, 0, 0, 0, 6, 24, 60, 120, 210, 336, 504, 720]
Well, that looks plausible, but maybe my mental arithmetic isn’t too hot, and I need confirmation that all those numbers really are divisible by 3.

A list comprehension has an optional condition, returning only the (expression of) the items where the condition is true.
[ expr for item in list if cond ]
A few of the integers not divisible by 3 are:
>>> [ x for x in range(-4,10) if x%3 != 0 ]
[-4, -2, -1, 1, 2, 4, 5, 7, 8]
Are any of the cubic expressions not divisible by 3?
>>> [ x for x in range(-4,10) if (x * x * x - x)%3 != 0 ]
[]
Okay, so it’s probably worth having a shot at the proof then.

Of course, comprehensions aren’t limited to lists of numbers.  Let’s say I have a list of strings, and want to get a list of the lengths of the long strings (more than 5 characters long, say).
>>> animals = [ 'cat', 'dog', 'horse', 'elephant', 'meerkat', 'ox']
>>> [ len(s) for s in animals ]
[3, 3, 5, 8, 7, 2]
>>> [ s for s in animals if 5 < len(s) ]
['elephant', 'meerkat']
>>> [ len(s) for s in animals if 5 < len(s) ]
[8, 7]

List comprehensions can be applied to nested lists, too. I might have a list of lists, let’s say a list of a list of animal and of planets (why not?).
>>> planets = [ 'Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto']
>>> big_list = [ animals, planets ]
>>> big_list
[['cat', 'dog', 'horse', 'elephant', 'ox'], ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto']]
(I’m sorry, but as far as I’m concerned, Pluto is indeed a planet.)

I can get a list of the lengths of all the long strings:
>>>[ s for lst in big_list for s in lst ]
['cat', 'dog', 'horse', 'elephant', 'meerkat', 'ox', 'Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto']
>>>[ len(s) for lst in big_list for s in lst ]
[3, 3, 5, 8, 7, 2, 7, 5, 5, 4, 7, 6, 6, 7, 5]
>>>[ len(s) for lst in big_list for s in lst if 5 < len(s) ]
[8, 7, 7, 7, 6, 6, 7]
This way, I can process the nested lists as one big flattened list.

But what if I didn’t want to flatten the list?  Well, since list comprehensions are themselves expressions, they can form the expression in a list comprehension!  Used this way, they maintain the structure of the original nesting.  So
>>>[ [s for s in lst] for lst in big_list ]
[['cat', 'dog', 'horse', 'elephant', 'meerkat', 'ox'], ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto']]
>>>[ [len(s) for s in lst] for lst in big_list ]
[[3, 3, 5, 8, 7, 2], [7, 5, 5, 4, 7, 6, 6, 7, 5]]
>>>[ [len(s) for s in lst if 5 < len(s)] for lst in big_list ]
[[8, 7], [7, 7, 6, 6, 7]]
This approach becomes even more powerful when the list items are objects, and we need to operate on object attribute values.  Let’s assume that the list items above are not strings, but objects with a name attribute that is the relevant string.
>>>[ [s.name for s in lst if 5 < len(s.name)] for lst in big_list ] 
[['elephant', 'meerkat'], ['Mercury', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']]
When using a list comprehension as an expression inside another comprehension in this way, we don’t need to have it as the top level expression.  Let’s assume that we want a list of numpy arrays of the lengths (maybe because we want to do numpy operations on the resulting lists, such as averaging the results).
>>> from numpy import *
>>>[ array([len(s.name) for s in lst if 5 < len(s.name)]) for lst in big_list ]
[array([8, 7]), array([7, 7, 6, 6, 7])]

List comprehensions are powerful, removing the need for explicit for loops that iterate over the list items.  This results in more compact, and efficient, code, and encourages thinking of lists as a whole.  Take care that the compact code remains comprehensible, however, especially with nested comprehensions.

And there are dictionary comprehensions, too.

No comments:

Post a Comment