Parsing treebreaker output

Treebreaker is a nice piece of software which detects changes in the distribution of a phenotype across a phylogenetic tree. Code here. Paper here. In my case, I’m using it to find clades of lineage 4 M. tuberculosis which are associated with East/Southeast Asia.

It takes a phylogenetic tree and a phenotype file (the leaf labels and binary phenotype encoded as 0s and 1s, tab separated), and produces an output file. The modified tree with the per node posterior probability (because it’s bayesian dontcha know) is output as the last line of the output file.

The only slight problem I had with treebreaker was parsing the output tree. It’s newick format, but the nodes have annotations. This is a setup which isn’t covered by the newick specification, and so it isn’t parsed correctly by ete3 or dendropy. If you’re an R-afficianado (an afficionaRdo?), then ape seems to correctly parse the treebreaker output, but I’m not.

Therefore, after a bit of tinkering, I found this workaround.

  1. Open the tree in FigTree, which does correctly parse the node labels – more Rambaut magic!
  2. Export the tree from figtree as a nexus format – tick the ‘include annotations’ box.
  3. Read the tree into dendropy as a nexus format. Code snippet to do this and access the node labels here.

Enjoy!

Leave a comment