I have just read Sean Eddy’s thought provoking blog post on the problems with how biology is handling high throughput sequencing which you should definitely read.
One of his themes is that bioinformatics is going to be a key part of the 21st century biologists toolkit, and that biologists doing sequencing experiments should be able to tinker in perl or python. After all, you wouldn’t dream of doing the wetlab work required for a genomics experiment without knowing how to pipette or run/analyse a gel, why would you generate big data without being able to write a few lines of code (in this context, and possibly for most biologists, big data means more than can be comfortably analysed in an excel spreadsheet).
Nick Loman tweeted an excerpt from Sean’s blog post saying something along the lines of ‘bioinformatics should be more like kits’, I have to admit my first thought was ‘ugh, no!’. A string of black boxes where you put data in one end and get a result out of the other would be terrible! However, on reading the whole post, I think I can see where Sean is coming from.
Wet lab kits aren’t designed to tell you the answer to your experiment, they are just a nice tool to help you on the way. They are a highly optimised, convenient and reliable way to extract DNA/RNA/run SDS-PAGE etc. You could argue that something like SPAdes is already equivalent to a kit, data goes in one end, results come out the other. Yes, you can tweak various things if you know what you are doing and yes, there are numerous caveats to the results, but the defaults will do a very good job on their own. Then, something like LS-BSR is a neat package (also kit like) for looking at core/accessory genomes, into which you could feed your assemblies. Then you have something that you can start to do some biology on!
However, there are two mian problems I see with this.
1) Tool installation. Bioinformaticians often seem to forget about this, but installing bioinformatics tools on the command line is usually more daunting than running them! There is a lot of assumed knowledge and when things go wrong at this stage, it is very difficult and daunting to try and fix it. This is one reason why I think Docker is very exciting.
2) Lab recipes. If you go into any reasonably well established wet-lab, they will have protocols for most of the work horse assays you will need to do, or step by step instructions from the kit manufacturer. However, when it comes to bioinformatics assays, there will not be the same infrastructure. Which of the 30 flags for your tool are vital (d’uh, why didn’t you use –exp_cov auto?!), and which are obscure. Two things I think will help this are generic tool installation (see above) and blow by blow methods descriptions, probably in something like github, associated with a paper.
Finally, we need to get biologists thinking about bioinformatics in the same way as wetlab experiments i.e. positive and negative controls. These are even more important when you are first starting out and have no feel for what is a reasonable answer.
Dare I point out that wet lab mol. bio kits are (mostly) made by companies who have successfully commoditized them, mass produced and standardised them.
Is it crazy to suggest that this will (and needs to) increasingly occur with bioinformatic tools? That while bioinformaticians in research will continue to make bespoke tools and develop new methods – and some biologists will prefer to use or make these themselves too – there will be a significant and growing market for commercially produced kits too?
I accept this is far from the current situation but do wonder whether there could be less focus on teaching reluctant biologists to use the command line and more on producing ‘kits’ that allow them to run the analyses they need?
To be clear.. I speak as a lazy ex-experimentalist who have loathed having to learn to code:) I also salute the efforts of you and your colleagues at PHE to at least meet people like me half way by using Galaxy:)
Yes, I did think about addressing this in the post but didn’t want it to sprawl into a 2000 word rant.
I think one reason the commoditisation model will not be so widely adopted in bioinformatics is that there is an open source culture in bioinfx inherited from comp sci/the open source movement, which is a good thing! Most bioinformaticians have an innate discomfort with proprietary solutions. Saying that, I think there is definitely an argument for enterprise grade bioinfx software, perhaps especially in NHS/clinical labs where there are all sorts of accreditation issues that academic developers aren’t used to dealing with and no history of DIY bioinformatics.
Btw, being lazy is a great start to being a bioinformatician, it is all about enlightened laziness i.e. if i take the effort to learn to code, i won’t have to do it manually 100 times 😉
Thanks for the reply Phil. I think you’ve picked out the important distinctions. For those with the skills (and the will to get the skills) the open source, home baked approach is likely to be the most powerful and flexible (particularly in research). However, there should also be a route to doing high quality genomic analysis that does not require you to acquire those skills.
I am of course coming at this from a health service planning perspective and am still in multiple minds about whether we should support outsourced enterprise provided black box solutions, attempt to meet our needs through in house capacity development (PHE or clinical bioinformatics STP style) or a bit of both. I share the concerns of academic bioinformaticians about the enterprise black box, but also wonder how we’re going to find (and afford) enough ‘command line ninjas’ to develop and maintain bespoke tools for every health service genomics, molecular path and micro lab that wants them over the next few years. Impressive as the PHE set up is in terms of bioinformatics capacity and expertise, I’m not sure it can or will be replicated beyond Colindale…
Thanks for the discussion. Illuminating:)
“Saying that, I think there is definitely an argument for enterprise grade bioinfx software, perhaps especially in NHS/clinical labs where there are all sorts of accreditation issues that academic developers aren’t used to dealing with and no history of DIY bioinformatics.”
I wouldn’t place your faith in commercial developers understanding the accrediation issues involved in an NHS Molecular Genetics lab either. An idea of what’s happening to the data from the sequencers and how the reads are aligned, metrics generated and variants called is pretty essential to producing a valid report for the Clinicians.
If you’re going to rely on an external company to provide this as a service you’ll never be sure that they have changed part of the process unless you have someone in the lab to keep an eye on it – based on our own unfortunate experiences with not just one but multiple commercial companies (from variant annotation software through to all-in-one solutions).
At least with open source academic software you can follow what the software is doing and be able to fix any issues without going back to a commercial company which might roll out a fix in their next annual upgrade for only £20,000 extra for a new “feature”.
Hopefully training up Clinical Bioinformaticians will include modules on how Linux works (beyond abstract descriptions and actually have hands on experience), more scripting experience now they have moved from Java to Python (nothing wrong with Java just not as easy to jump into from scratch), etc
Hi, thanks for your comment.
I have a lot of time for your point of view i.e. open source is better, and I have heard of similar experiences with commercial developers. I think there is a lot of uncertainty around who is going to provide the software side of NGS analysis.
Clinical Bioinformatics seems like a very thorough training scheme but is sadly lacking any microbiology focus (https://bitsandbugs.org/2014/03/20/clinical-bioinformatics-for-microbiology/). There is also an understandably limited number of trainees per year (as not many people who can train them) so I’m not sure how much of an impact they will have in the next few years. Also, expecting them to produce high quality (GUI) software is a big ask.
Do you think that the Red Hat model of nicely wrapping and supporting open source tools for businesses could have some legs in this area? Is there anyone already doing this in the healthcare sector?
Hi Phil (sorry should have said Garan here)
Not sure why the focus was purely on Human molecular genetics for the STP Clinical Bioinformatics pathway – it does seem to leave alot up to the STP Microbiology people to catch up on.
We have two trainees here atm, producing the sort of software that a commercial company would sell is beyond the remit of the training scheme – but being able to understand and possibly contribute to open source alternatives might well fit in nicely with the end of the training aims. They do have to produce database and some website applications and it’s probably not a huge jump to some framework based website applications like https://github.com/pasted/clinical_variant_database .
The RedHat service orientated type of commercial offering could work but given the prices we’ve had from Appistry for GATK I’m not so sure that many companies (especially US based ones) understand the kind of constriants the NHS budget is under. Ideally there would be a centralised NHS software development / sysadmin team to help develop, maintain and install the required software and OS across the entire NHS – contributing to various open source projects in a quid pro quo basis.
I think many of the commercial offerings incorporate academic software as part of their overall offerings whether it’s wholesale or re-interpeting common algorithms / services. Even more use databases and web services produced from publications such as HGMD / ClinVar. So it’s hard to argue that many of these don’t already charge mainly for the support they provide rather than the software they have developed.
I guess that you guys already have a decent pipeline up and running for the WGS epidemiology stuff.
To be honest, once you start diving into enterprise grade software you would probably want to have a department in NHS that deals with it. Otherwise, you will end up with different hospitals using/buying their own software/package tools, meaning that standards will differ from one place to the next. Internal department would provide uniform access to the same tools (through Galaxy or otherwise) and is responsible for these tools being accredited (either using internal manpower or going to source producers of the tools). Finally, one can staff the department with applied/clinical bioinformaticians, computer scientists and testers. As it’s publicly funded what they produce will be open source.
In my opinion.
Sounds good to me Alex!
What do you think of the RedHat for bioinfx idea?
I think that would be really cool to have that. 🙂
😀 Alex beat me to it with the centralised NHS bioinformatics development – guess seperate NHS trusts would have the budget or requirements for this sort of thing
Hi Garan! (wordpress isn’t letting me reply to your above comment)
Apparently the Clinical Bioinformatics syllabus was done with the 100 000 (human) genome project in mind, hence the focus on human genetics.
Yeah right! With separate budgets and/or goals individual Trusts may suffer from shortsightedness with regards to data and tools. So I would always be in favor of a unified department/group and necessary centralised one, because as you have mentioned different trusts may have different goals and priorities :). I would be very excited to see something like that happening. But the whole patient records system would suggest I will be waiting for some time!
Agree with Alex about the seperate budgets part – my post should have read “seperate NHS trusts wouldn’t have the budget or requirements”. I think probably the only way would to have an arms length quango like the Health Protection Agency used to be with funding directly from the government or MRC.
The central NHS bioinformatics solution is attractive but with the current direction of the NHS (privatisation) it would likely be easier to set up a ‘company’ to do it – then you become the black box! However, currently no one size fits all in terms of the tests nhs labs do and the analyses they require, labs have already gone their separate ways and done lots of arduous validation utilising ‘lone bioinformaticians’! Perhaps the move to WES and WGS in the near future will change this and GeL 100K (illumina) will standardise the tests & analyses. However, Labs will then no doubt need to be able to do the ‘next’ thing or omics so I think the in house bioinfomatician (the stp route) does have legs. Building a community where lone bioinformaticiand can share issues and examples of best practices (docker containers) critical to avoid mistakes
Hi Chris. Yeah, I think the community building aspect is vital to prevent duplicating effort. Do you have anything in mind for that?
The current group of NHS based bioinformaticians working in Regional Genetics labs from across the UK (n <20) have started an informal group "NHS-NGS". The group largely formed off the back of the Bioinformatics STP training programme as we were all suddenly asked to be the new trainees trainers. Preliminary discussions around forming a "professional body" through talking to the Association Clinical Genetic Science (ACGS) are underway but no idea how long this takes and what will come of it