HelixIO for sample classification

TL;DR

  • HelixIO is a slick platform for telling you what is in your fastq sample – very useful for microbiologists!
  • Will be interesting to see how it stacks up against KmerID in a larger number of samples.
  • Interesting ycombinator thread

HelixIO have recently launched a public beta of their intriguing bioinformatics platform for ‘fast, portable and scalable’ sequence analysis. You definitely couldn’t accuse them of lacking vision for their platform, saying that their target applications include ‘clinical medicine, food safety and biosecurity’. C’mon guys, leave something for the rest of us to do! 😉

While I doubt that this method would be really useful for brass tacks public health micro (e.g. outbreak detection), I’m interested in what it can do in terms of assessing purity of sequence data.

As a very quick test, I ran two fastq files through their system; one which our existing KmerID methodology had detected as mixed, one which came back as pure.

HelixIO returned the same results as our existing methodology, and seemed very speedy (I wasn’t really paying attention but < 1min) at processing ~100 Mb of gzipped fastqs. I would also say that their interface is very slick, and a lot of thought has been put into their output format (very much OSX compared with NCBI-BLASTs Windows 95).

In the pure sample 93.6% of reads were ‘Salmonella enterica’, while 2.8% were ‘Salmonella’ and 2.6% were ‘Enterobacteriaceae’. Presumably, those ‘Salmonella’ and ‘Enterobacteriaceae’ reads are ambiguous across species/genera, I find it re-assuring that they return that kind of hierarchical result. The mixed sample returned 81% ‘Salmonella enterica’ 3.9% ‘Enterobacteriaceae’ and 3.1% ‘Bacteroides fragilis’, the same contaminant as our KmerID. The $64000 question is, at what threshold is a sample mixed? They show a taxonomic chart of the top 90% of reads, but I doubt this has a ‘functional’ implication.

It will definitely be worth keeping an eye on this software, and perhaps I will see how it stacks up against KmerID with a larger number of mixed samples. 

From the website, it is unclear what their business model will be, normally with bioinfx software this isn’t even a question, but the startup slickness and Ycombinator association implies some sort of pay-for-use. However, this post on ycombinator says that will be open source, free for academic use (hmmm). Homologus has some speculation on the HelixIO guts here.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s