Travis Oliphant, who is involved with the really useful conda project, recently tweeted about David Beazley’s talk at pycon in chicago. Here’s the link to the whole thing, which is definitely worth watching.
There are a few great tips in there, all around the idea that you only have to use python built-ins to do cool stuff. This seems like an interesting version of Jack White’s idea about how making things difficult for yourself can inspire creativity. Even better, there is a worked example looking at food hygiene info from Chicago!
I just thought I would pull out a few of my favourites, in case you don’t have time to watch the whole thing (although 1.5x speed on youtube really helps!) and to help myself remember them.
Named Tuples – you can think of tuples as a row out of a database/csv file. However, they are accessed by numeric indexes e.g. `my_tuple[3]`. This makes the code much harder to read/debug. The answer to that is a named tuple from the collections module, which lets you access the elements of the tuple by attribute name, rather than index. Much more readable. The advantage over a dict is that it keeps the order, not sure what the difference with ordered dict is…
Counter – a counter is a dictionary that is meant for tabulation, and can be imported from the collections module. These are really useful when you have a list or a string and you want to know how many of each element there is, so if you say `Counter(‘ATGATCGATCGTACG’)`, it will return `Counter({‘A’: 4, ‘T’: 4, ‘G’: 4, ‘C’: 3})`. Likewise, with a list of strings, ints, etc. Useful.
List comprehensions – these are really useful, and totally worth wrapping your head around. The syntax is a little ass backwards, so just work through them a few times and they are really quite straightforward. Essentially they are useful if you have a list and you want to get another list back which relies in some way on the initial list, e.g. filtering or transforming the initial list. They save a lot of faffing about with for loops.
Default dict – this is just a normal dict, but when you add keys to it, they are automatically assigned whatever class you want as the value. This is just a way of making things a bit simpler.
csv module – you can read a csv file which has a header into a list of dicts with keys as the header columns using list(csv.DictReader('filename.csv'))
handy!
using exotic smashes of defaultdicts and Counters – this one took a while to get my head around, but is pretty elegant once you grok it. This code example shows that you can do in 4 lines what it otherwise takes 11 lines to do. Essentially the critical part is that defaultdict and Counter combinations allow you to simultaneously add to the dict and the counter in one line. Useful where you have two categorical variables and you want to get a breakdown of one by the other.
“The advantage over a dict is that it keeps the order, not sure what the difference with ordered dict is…”
The difference is that tuples are immutable types – the references they contain can’t be altered. (A mutable type referred to inside a tuple can still be altered, of course.) This seems kind of pointless until you start looking at distributed or parallel computing, where you can simplify a lot of map-reduce style problems by treating individual processes as functions with no side effects, just return values. In this scenario tuples make good return values since they can’t be mutated, and that has positive implications for speed under the hood. Many times have I come to desire the presence of tuples in other languages!
Wow, interesting, thanks Justin!