I have just read Sean Eddy’s thought provoking blog post on the problems with how biology is handling high throughput sequencing which you should definitely read.
One of his themes is that bioinformatics is going to be a key part of the 21st century biologists toolkit, and that biologists doing sequencing experiments should be able to tinker in perl or python. After all, you wouldn’t dream of doing the wetlab work required for a genomics experiment without knowing how to pipette or run/analyse a gel, why would you generate big data without being able to write a few lines of code (in this context, and possibly for most biologists, big data means more than can be comfortably analysed in an excel spreadsheet).
Nick Loman tweeted an excerpt from Sean’s blog post saying something along the lines of ‘bioinformatics should be more like kits’, I have to admit my first thought was ‘ugh, no!’. A string of black boxes where you put data in one end and get a result out of the other would be terrible! However, on reading the whole post, I think I can see where Sean is coming from.
Wet lab kits aren’t designed to tell you the answer to your experiment, they are just a nice tool to help you on the way. They are a highly optimised, convenient and reliable way to extract DNA/RNA/run SDS-PAGE etc. You could argue that something like SPAdes is already equivalent to a kit, data goes in one end, results come out the other. Yes, you can tweak various things if you know what you are doing and yes, there are numerous caveats to the results, but the defaults will do a very good job on their own. Then, something like LS-BSR is a neat package (also kit like) for looking at core/accessory genomes, into which you could feed your assemblies. Then you have something that you can start to do some biology on!
However, there are two mian problems I see with this.
1) Tool installation. Bioinformaticians often seem to forget about this, but installing bioinformatics tools on the command line is usually more daunting than running them! There is a lot of assumed knowledge and when things go wrong at this stage, it is very difficult and daunting to try and fix it. This is one reason why I think Docker is very exciting.
2) Lab recipes. If you go into any reasonably well established wet-lab, they will have protocols for most of the work horse assays you will need to do, or step by step instructions from the kit manufacturer. However, when it comes to bioinformatics assays, there will not be the same infrastructure. Which of the 30 flags for your tool are vital (d’uh, why didn’t you use –exp_cov auto?!), and which are obscure. Two things I think will help this are generic tool installation (see above) and blow by blow methods descriptions, probably in something like github, associated with a paper.
Finally, we need to get biologists thinking about bioinformatics in the same way as wetlab experiments i.e. positive and negative controls. These are even more important when you are first starting out and have no feel for what is a reasonable answer.