There is a certain amount of snootiness amongst bioinformaticians when it comes to Excel. If a graph with the default excel colour scheme is showed at a conference there may be sideways glances and sniggers amongst the technorati. However, excel remains the most commonly used tool for bioinformatics (citation needed). When lab staff first join our section of the lab for a 3-month placement, or visiting students with no bioinfx experience arrive, the first things I show them are 1) tech support 2) useful papers and 3) five useful things in excel. Because I hate repeating myself, I thought I would write them down for future reference.
1) Ok, just so we are on the same page, you do know about the use of ctrl, shift and the direction arrows right? No? Give yourself a firm slap in the face, and then read the link. Also, double clicking is super useful.
2) Right, now that’s out the way – VLOOKUP
You have two tables, that have some overlap, but that aren’t exactly the same. You want to transfer information from one to the other – how to do it? You could do a load of ‘finds’ or sort both lists and transfer it by hand, but c’mon, it’s the 21st century! Use a vlookup! This takes a name and another table, and looks up the name in the table. Then, it will move over the information from the column of your choosing to be next to the original. Simple, but can be a real time saver.
3) Pivot tables
Pivot tables can produce those ‘ah ha!’ moments people get when they realise they have been doing something wrong their whole lives. Say you have a couple of columns of information and you want to see how they co-relate. For example, the antibiotic resistance profile and the MLST sequence type for a bunch of strains. You could sort them, do some counts by hand etc, but then if you change the data you are back to square 1. Use a pivot table instead! Quick as a flash you will have a breakdown of r-type by ST. You can add more columns to the data, use other functions like counting, summing etc. Another use case I like is to compare lists i.e. what is common and unique between two lists. Take your two lists of info, paste them into a single column, give them arbitrary names in the next column, e.g. list1 and list2. Then do a pivot table and you have the comparison between the lists. NB you might have to add a ‘frequency’ column, where you set every cell to 1 to use the count function in the pivot table. Pivot tables are a great way of summarising your data, as they count up each instance of each of your fields at once.
4) Use the filter function
People often use excel sheets as ‘databases’ to store and analyse data (btw, you shouldn’t do this for anything project critical, learn microsoft access or equivalent). One of the many awful things that can happen if you are doing this is you accidentally select only a subset of your table to sort, which means your entire table is fucked. You can guard against this by using the filter function, which will lock your table into a single entity (this is a good thing). It also provides loads of useful filtering functions (as the name implies) where you can exclude things based on whether they are higher, lower, or equal to a number; or whether they start or end with a certain word. Very very useful.
5) text to columns
Finally, got some gene name of interest or other bit of data buried in a longer word? Use text to columns (use the delimited option) to split the text based on anything e.g. spaces, semi-colons, periods etc. Then, if that hasn’t quite got you there, use left or right to get the left or rightmost characters out of a text box.
Here is a file with some examples of the above, and when in doubt, google!