Quality Assessment and Trimming
- Trimmomatic
- http://www.usadellab.org/cms/index.php?page=trimmomatic
- Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bolger, A. M., Lohse, M., & Usadel, B. (2014). Bioinformatics, btu170.
- A flexible read trimming tool that will remove Illumina adapters, reads below a certain length and low quality ends of the read
- Seqtk
- https://github.com/lh3/seqtk
- Tool for processing sequences in the FASTA or FASTQ that can be used for adapter removal and trimming of low-quality bases
- FastX
- http://hannonlab.cshl.edu/fastx_toolkit/
- Toolkit for FASTQ and FASTA preprocessing that can be used for trimming, clipping, barcode splitting, formatting and quality trimming.
- FastQC
- http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- A quality control tool for assessing the quality of NGS data
Assembly
- VelvetK
- http://bioinformatics.net.au/software.velvetk.shtml
- Perl script to estimate best k-mer size to use for your Velvet de novo
- KmerGenie
- http://kmergenie.bx.psu.edu/
- Informed and Automated k-Mer Size Selection for Genome Assembly. Chikhi R., Medvedev P. HiTSeq 2013.
- Best k-mer length estimator for single-k genome assemblers like velvet.
- Khmer
- http://khmer.readthedocs.io/en/v2.0/
- The khmer software package: enabling efficient nucleotide sequence analysis. Crusoe et al., 2015. F1000 http://dx.doi.org/10.12688/f1000research.6924.1
- Set of command-line tools for dealing with large and noisy datasets to normalise and scale the data for more efficient genome assembly.
- Minia
- http://minia.genouest.org/
- Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Chikhi, Rayan and Rizk, Guillaume. Algorithms for Molecular Biology, BioMed Central, 2013, 8 (1), pp.22.
- Short-read assembler based on a de Bruijn graph for low-memory assembly.
- SPAdes
- http://bioinf.spbau.ru/spades
- SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing.Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander V. Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, and Pavel A. Pevzner. Journal of Computational Biology 19(5) (2012), 455-477. doi:10.1089/cmb.2012.0021
- Short and hybrid-long read assembler based on a de Bruijn graph that also performs error correction and is a multi-k genome assembler.
- Velvet
- https://www.ebi.ac.uk/~zerbino/velvet/
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Daniel R. Zerbino and Ewan Birney. Genome Res. May 2008 18: 821-829; Published in Advance March 18, 2008, doi:10.1101/gr.074492.107
- De novo short read genome assembler with error correction to produce high quality unique contigs.
- Canu
- http://canu.readthedocs.io/en/stable/index.html
- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Sergey Koren, Brian P. Walenz, Konstantin Berlin, Jason R. Miller, Adam M. Phillippy doi: http://dx.doi.org/10.1101/071282
- Long-read assembler designed for high-noise data such as that generated by PacBio or Oxford Nanopore MinION. Canu also performs error correction.
- Bandage
- http://rrwick.github.io/Bandage/
- Bandage: interactive visualization of de novo genome assemblies. Ryan R. Wick, Mark B. Schultz, Justin Zobel, and Kathryn E. Holt. Bioinformatics (2015) 31 (20): 3350-3352 first published online June 22, 2015 doi:10.1093/bioinformatics/btv383
- Program for visualising de novo assembly graphs by displaying connection which are not present in the contigs file for assembly assessment.
Annotation
- Prokka
- http://www.vicbioinformatics.com/software.prokka.shtml
- Prokka: rapid prokaryotic genome annotation. Seemann T. Bioinformatics. 2014 Jul 15;30(14):2068-9. PMID:24642063
- Software tool for the rapid annotation of prokaryotic genomes.
- RAST
- http://rast.nmpdr.org/
- The RAST Server: Rapid Annotations using Subsystems Technology.Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. BMC Genomics, 2008
- Fully-automated service for annotating complete or nearly complete bacterial and archeal genomes.
- Genix
- http://labbioinfo.ufpel.edu.br/cgi-bin/genix_index.py
- Fully automated pipeline for bacterial genome annotation.
Alignment
- BLAST
- http://blast.ncbi.nlm.nih.gov/Blast.cgi
- Basic local alignment search tool.Stephen F. Altschul,Warren Gish,Webb Miller,Eugene W. Myers,David J. Lipman. Journal of Molecular Biology, Volume 215, Issue 3, 5 October 1990, Pages 403-410
- Search tool to find regions of similarity between biological sequences through alignment and calculating statistical significance.
- MUMmer
- http://mummer.sourceforge.net/
- Versatile and open software for comparing large genomes. A.L. Delcher, A. Phillippy, J. Carlton, and S.L. Salzberg, Nucleic Acids Research (2002), Vol. 30, No. 11 2478-2483.
- A system for rapidly aligning entire genomes and finding matches in DNA sequences.
- Mega
- http://www.megasoftware.net/
- MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Kumar S, Stecher G, and Tamura K ( 2016). Molecular Biology and Evolution 33:1870-1874.
- Sophisticated and user-friendly software suite for analysing DNA and protein sequence data from species and populations.
Mapping
- BWA
- http://bio-bwa.sourceforge.net/
- Fast and accurate short read alignment with Burrows-Wheeler Transform. Li H. and Durbin R. (2009) Bioinformatics, 25:1754-60. [PMID: 19451168].
- Software package for mapping low-divergent sequences against a large reference genome using the Burrows-Wheeler transform algorithm.
- Bowtie 2
- http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
- Fast gapped-read alignment with Bowtie 2. Langmead B, Salzberg S. Nature Methods. 2012, 9:357-359.
- Tool for aligning sequencing reads to long reference genomes also based on the Burrows-Wheeler transform algorithm.
- Tablet
- https://ics.hutton.ac.uk/tablet/
- Using Tablet for visual exploration of second-generation sequencing data. Milne I, Stephen G, Bayer M, Cock PJA, Pritchard L, Cardle L, Shaw PD and Marshall D. 2013. Briefings in Bioinformatics 14(2), 193-202.
- Lighweight, high-performance graphical viewer for next generation sequence assemblies and alignments that can be used to view mapping.
Variant Calling
- SAMtools
- http://samtools.sourceforge.net/
- The Sequence alignment/map (SAM) format and SAMtools. Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) Bioinformatics, 25, 2078-9. [PMID: 19505943]
- Toolkit that provides various utilities for manipulating alignments in the SAM format and also can be used generating consensus sequences and variant calling
- GATK
- https://software.broadinstitute.org/gatk/
- The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Aaron McKenna, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran Garimella, David Altshuler, Stacey Gabriel, Mark Daly, and Mark A. DePristoGenome Res. September 2010 20: 1297-1303; Published in Advance July 19, 2010, doi:10.1101/gr.107524.110
- Toolkit with a primary focus on variant discovery and genotyping.
- Picard
- http://broadinstitute.github.io/picard/
- A set of command line tools (in Java) for manipulating high-throughput sequencing data and formats.
Phylogenetic analysis
- RaxML
- http://sco.h-its.org/exelixis/web/software/raxml/index.html
- RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. A. Stamatakis. Bioinformatics (2014) 30 (9): 1312-1313.
- Randomized Axelerated Maximum Likelihood program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees.
- FastTree
- http://www.microbesonline.org/fasttree/
- FastTree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix. Price, M.N., Dehal, P.S., and Arkin, A.P. (2009). Molecular Biology and Evolution 26:1641-1650, doi:10.1093/molbev/msp077.
- Faster tool for speedy inference of approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.
- CSI Phylogeny
- https://cge.cbs.dtu.dk/services/CSIPhylogeny/
- Solving the Problem of Comparing Whole Bacterial Genomes across Different Sequencing Platforms. Rolf S. Kaas , Pimlapas Leekitcharoenphon, Frank M. Aarestrup, Ole Lund. PLoS ONE 2014; 9(8): e104984.
- Tool to call SNPS, filters the SNPs, do site validation and inference of phylogeny through a graphical user interface.
- Harvest
- https://www.cbcb.umd.edu/software/harvest
- The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Treangen TJ, Ondov BD, Koren S, Phillippy AM. Genome Biology, 15 (11), 1-15
- Suite of core-genome alignment and visualization tools for quickly analysing thousands of intraspecific microbial genomes, including variant calls, recombination detection, and phylogenetic trees.
- Gubbins
- http://sanger-pathogens.github.io/gubbins/
- Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R. doi:10.1093/nar/gku1196, Nucleic Acids Research, 2014.
- Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.
- BEAST
- http://beast.bio.ed.ac.uk/
- Bayesian phylogenetics with BEAUti and the BEAST 1.7. Drummond AJ, Suchard MA, Xie D & Rambaut A (2012). Molecular Biology And Evolution 29: 1969-1973.
- Cross-platform program for Bayesian analysis of molecular sequences using MCMC.
- FigTree
- http://tree.bio.ed.ac.uk/software/figtree/
- Windows, Mac OS X and Linux
- A graphical viewer of phylogenetic trees and program for producing publication-ready figures of trees.
Virulence and antimicrobial resistance gene prediction
- PathogenFinder
- https://cge.cbs.dtu.dk/services/PathogenFinder/
- PathogenFinder – Distinguishing Friend from Foe Using Bacterial Whole Genome Sequence Data. Cosentino S, Voldby Larsen M, Møller Aarestrup F, Lund O. (2013) PLoS ONE 8(10): e77302.
- Web-server for the prediction of bacterial pathogenicity by analysing the input proteome, genome, or raw reads provided by the user.
- Antimicrobial resistance prediction
- https://cge.cbs.dtu.dk//services/ResFinder/
- Identification of acquired antimicrobial resistance genes. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, Aarestrup FM, Larsen MV. J Antimicrob Chemother. 2012 Jul 10.
- Web-server that identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria.
Species identification
- Kraken
- https://ccb.jhu.edu/software/kraken/
- Kraken: ultrafast metagenomic sequence classification using exact alignments. Wood DE, Salzberg SL. Genome Biology 2014, 15:R46.
- System for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomics studies.
Comparative genomic tools
- BEDTools
- http://bedtools.readthedocs.io/en/latest/index.html
- BEDTools: a flexible suite of utilities for comparing genomic features. Aaron R. Quinlan and Ira M. Hall. Bioinformatics (2010) 26 (6): 841-842 first published online January 28, 2010 doi:10.1093/bioinformatics/btq033
- Toolkit for the manipulation of genome data for genomic analysis tasks on genomic intervals from multiple files.
- Roary
- https://sanger-pathogens.github.io/Roary/
- Roary: Rapid large-scale prokaryote pan genome analysis. Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill. Bioinformatics, 2015;31(22):3691-3693 doi:10.1093/bioinformatics/btv421.
- High speed stand-alone pan genome pipeline, which takes annotated assemblies in GFF3 format and calculates the pan genome.
- Mauve
- http://darlinglab.org/mauve/mauve.html
- Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements. Aaron C.E. Darling, Bob Mau, Frederick R. Blattner, and Nicole T. Perna. Genome Res. July 2004 14: 1394-1403; doi:10.1101/gr.2289704.
- Interactive genome alignment software that allows for easy browsing of multiple genomes to look for similarities and differences.
- ACT
- http://www.sanger.ac.uk/science/tools/artemis-comparison-tool-act
- ACT: the Artemis Comparison Tool. Carver TJ, Rutherford KM, Berriman M, Rajandream MA, Barrell BG and Parkhill. Bioinformatics (Oxford, England) 2005;21;16;3422-3. PUBMED: 15976072; DOI: 10.1093/bioinformatics/bti553.
- Java application for displaying pairwise comparisons between two or more DNA sequences and allowing browsing of detailed annotation
- BRIG
- http://brig.sourceforge.net/
- BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. NF Alikhan, NK Petty, NL Ben Zakour, SA Beatson (2011). BMC Genomics, 12:402. PMID: 21824423.
- Image generating software that displays circular blast comparisons between a large number of genomes or DNA sequences
- EasyFig
- http://easyfig.sourceforge.net/
- Easyfig: a genome comparison visualiser. Sullivan MJ, Petty NK, Beatson SA. (2011). Bioinformatics; 27 (7): 1009-1010.PMID: 21278367
- Python application for creating linear comparison figures of multiple genomic loci with an easy-to-use graphical user interface (GUI)
- SeqFindR
- https://github.com/mscook/SeqFindR
- Tool to easily create informative genomic feature plots by detecting the presence or absence of genomic features from a database in a set of genomes.
Cloud computing
- MRC CLIMB
- http://www.climb.ac.uk/
- Microbial bioinformatics cyber-infrastructure.
- Amazon Web Services
- https://aws.amazon.com
- Pay per usage cloud computing managed by amazon.com for temporary computing of big data
Blogs and Twitter
- Blogs
- Bits and bugs https://bitsandbugs.org/
- Loman Labs http://lab.loman.net/page3/
- Opinionomics http://www.opiniomics.org/
- The genome factory http://thegenomefactory.blogspot.co.uk/
- Simpson Lab Blog http://simpsonlab.github.io/2016/08/23/R9/
- Jonathon Eisen’s Lab https://phylogenomics.wordpress.com/
- Living in an Ivory Basement http://ivory.idyll.org/blog/
- Holt Lab https://holtlab.net/
- Heng Li’s blog https://lh3.github.io/
- The Darling lab http://darlinglab.org/blog/
- The Quinlan Lab http://quinlanlab.org/
- Bioinformaticians to follow on Twitter
- @pathogenomenick @BioMickWatson @flashton2003 @WvSchaik @mattloose @torstenseemann @tomrconnor @MikeyJ @jaredtsimpson @aphillipy @BillHanage @happy_khan @daanensen @jennifergardy @genomiss @Becctococcus @phylogenomics @ctitusbrown @DrKatHolt @ZaminIqbal @TimDallman @bioinformant @LaurenCowley4 @gkapatai @keithajolley @froggleston @lexnederbragt @jacarrico @biocomputerist@mjpallen @Bio_mscook @bawee @lh3lh3 @andrewjpage @aaronquinlan@koadman @ewanbirney