I think some biologists have a bit of a blind spot for bioinformatics when it comes to critical thinking. There may be a bit of a ‘well, the computer says this, therefore it is true’ attitude.
I think that any biologist who is interacting with bioinformatics/NGS results should take the following steps:
1. Take a complete reference genome from NCBI
2. Turn it into fastqs using e.g. pIRS
3. De novo assemble those fastqs. Oh, look at that, gives you a load of contigs!
4. Compare those contigs against the reference – start sweating.
5. Map your reads back to your contigs.
6. Call variants from that alignment – how does this strain have variants compared to itself? Have I created a new form of life, capable of mutating in silico? No, you haven’t.
7. Go and lie down in a dark room
Pingback: Links 5/8/14 | Mike the Mad Biologist
I realize your post was meant to be tongue-and-cheek but I’m still trying this anyways
Question:
What is the correct reference to grab from NCBI if were using pIRS?
For example:
1) ftp://ftp.ncbi.nih.gov/genomes/Arabidopsis_lyrata/
File:NZ_ADBK00000000.scaffold.fna.tgz or File:NZ_ADBK00000000.contig.fna.tgz
or
2) ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_ABU_83972_uid161975/
I assume the sequence data is segregated into directories for each *[not sure what?]*
or, most of the reference genome folders look like this
3) ftp://ftp.ncbi.nlm.nih.gov/genomes/Xenopus_Silurana_tropicalis/
What would I choose in this case?
I’m afraid I don’t have much experience with eukaryotic genomes. However, it looks like maybe the Xenopus genome is finished, while the arabidopsis genome is de novo assembled contigs from a short read experiment.
In terms of the E. coli you point to, there is a plasmid and a chromosome in that directory. It is the .fna files you want (chromosome is the larger file). For the Xenopus, I imagine you want the CHR Un directory, and the .fa file within that. I guess the CHR Mt is the mitochondria? Please bear in mind that this is all complete guess work 🙂