Intro to bacterial genomics

Here, in the interests of ‘if you have to email it twice, write a blog’ is my high-level overview of what a bacterial genomics pipeline looks like.

1. quality assess fastqs with e.g. fastqc, visualise these across your dataset with MultiQC. If data is particularly bad, do quality trimming, if not, then don’t.

2. do species level identification and identification of mixed cultures with e.g. mash, kmerid or kraken

3. do variant calling using the appropriate reference genome (chose this using top mash hit) using e.g. PHEnix

4. Read variants into a database (e.g. SnapperDB) or make a consensus genome by modifying the reference with all the variants you have identified. Most variant calling pipelines have an option to make this consensus genome, or you can do it from the VCF and the reference genome. Two important points for making a consensus genome 1) if a position is mixed in the mapping to reference, it should be called as an N, not as reference 2) all the consensus genomes should be the same length, so it’s fine to have deletions as ‘—‘ but insertions should not be included.

5. Gather all your consensus genomes into a single file and run e.g. snp-sites.

6. Make a phylogenetic tree with IQ-TREE or RAxML or FastTree

7. Annotate and view the tree with Figtree, Microreact or iTOL tree

—–

https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

http://mash.readthedocs.io/en/latest/tutorials.html#screening-a-read-set-for-containment-of-refseq-genomes

https://github.com/phe-bioinformatics/PHEnix

https://github.com/phe-bioinformatics/snapperdb

https://github.com/sanger-pathogens/snp-sites

http://www.iqtree.org/

http://tree.bio.ed.ac.uk/software/figtree/

https://microreact.org/

—

In terms of AMR gene detection, mykrobe and ariba are useful. SRST2 is a nice option for mapping based results (if you ask yourself which is better, mapping or assembly? the answer is both!)

–

Mykrobe

https://github.com/iqbal-lab/Mykrobe-predictor

https://www.nature.com/articles/ncomms10063

–

Ariba

https://github.com/sanger-pathogens/ariba

http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000131

—

For GWAS, bugwas and SEER are primary options.

–

Bugwas code – https://github.com/sgearle/bugwas

Bugwas blog post – http://blog.danielwilson.me.uk/2016/04/making-most-of-bacterial-gwas-new-paper.html

Bugwas paper – https://www.nature.com/articles/nmicrobiol201641

Seer code – https://github.com/mgalardini/pyseer

Seer paper – https://www.nature.com/articles/ncomms12797

–

Here is a review on pathogen GWAS, https://www.nature.com/articles/nrg.2016.132 https://www.nature.com/articles/nrg.2016.132

—

Here is a paper on how to organise bioinformatics projects. I loosely follow this structure.

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424

Plenty for you to get stuck into there!

	Links 8/31/23 \| Mike… on How to find extended-spectrum…
	flashton on Microbiome & infection ine…
	Mat on Microbiome & infection ine…
	flashton on How to do a Dendroscope tangle…
	flashton on How to do a Dendroscope tangle…

Bits and Bugs

Applying bioinformatics to public health microbiology

Intro to bacterial genomics

Leave a comment Cancel reply

Share this:

Related posts

Leave a comment Cancel reply