Programatically accessing internal nodes of a phylogeny

This post belongs in the ‘write it down before you forget’ category.

Edit – this approach is necessary for handling the trees produced by RAxML, but not those produced by IQ-TREE. 

Edit #2 – Ancestral state reconstruction in RAxML does not seem to work very well. I’m trying IQ-TREE (v1.6) instead.

I have a tree that has labelled internal nodes (from RAxML ancestral state reconstruction), and I want to find the parental internal node of every leaf. I need to do this for hundreds of leaves, so not feasible to do it by hand. I pasted an example part of the tree at the bottom of the post.

First of all, I looked at ete3, which is the go-to python phylogenetics library. However, parsing the tree with ete3, I couldn’t figure out the variable that ete3 had assigned the internal node labels to. The ‘name’ attribute of the internal node labels was blank. Link to the question I asked on the ete3 google group here.

Therefore, I had a go with biopython’s Phylo library. Here, it was quite straightforward to determine that the internal node labels were being read in as node.confidence. A gist using this is here.

Edit – For trees produced by IQ-TREE, the names of the internal nodes are much more sensibly assigned to node.name when read in with biopython.

Example, internal node labelled tree
((14892_1#22,(((((20427_2#5,20427_2#32)1041,BMD942)1040,20427_2#52)1039,20427_2#20)1038,(20427_2#42,20427_2#9)1042)1037)1036,(04CN-63-036,(BMD2216,BMD915)1044)1043);

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s