- CLIMB is fricking awesome
Last weekend, the MRC CLIMB initiative hosted a hackathon, with the broad aim of using the CLIMB resource to do some cool stuff, which Nick Loman has recently got up and running at Birmingham. There was lots of pizza, beer and bbq, as well as hacking.
One of the things we wanted to achieve was to set up our SNP calling pipeline so that it can be easily used by other people. CLIMB is the perfect place to do this, for reasons I will go into below.
What is CLIMB?
CLIMB is a cloud infrastructure for microbial bioinformatics. The cloud basically means that anyone with an internet connection to logon to a big ass server and create an ‘instance’, this is essentially your own computer in the sky. Normally, you have to pay Amazon or Google a non-insignificant amount to use such things, but CLIMB gives it to you for free.
The really brilliant thing about the cloud is that a bioinformatician can set up a machine just right, install all the tools needed for their pipeline, and then create an ‘image’ of that machine. This image can then be loaded by anyone when they start up an instance. The tools get used and there are no install headaches – everyone wins!
I thought you said SNPs?
So how does this relate to SNPs? Well, I have created an image of our SNP calling pipeline. Essentially, all this means is installing and configuring the requirements (bwa, GATK, various python libraries, postgresql) and the python scripts to wrap it all together. This means that instead of having to install the correct version of all the dependencies for our pipeline, you can just load an instance on CLIMB, base it on the phe-gastro-snapperdb image and you have everything set up and ready to use. It is all a bit rough and ready at the moment, and not quite ready for prime time, but if anyone does use it, would be interested in their experiences.
Is it useful in real life?
It sure is! For example, we can set up an instance with some of our strains and their SNPs in a SNPdb, a collaborator can upload some of their own strains to their own CLIMB instance with our pipeline installed (by us), run our pipeline, with the results going into the same SNPdb. Everyone retains control of their data, doing their own analysis, and yet we get all the advantages of collaboration.
Cloud – computers in the sky! (actually Birmingham)
Instance – your own slice of the cloud, a ‘virtual machine’ where you can install programs and generally run amok.
Image – basically, a pre-configured virtual machine, with lots of useful stuff (hopefully) installed on it.
CLIMB – a special cloud for microbial informatics, aren’t we lucky.