This post belongs in the ‘write it down before you forget’ category.
Edit – this approach is necessary for handling the trees produced by RAxML, but not those produced by IQ-TREE.
Edit #2 – Ancestral state reconstruction in RAxML does not seem to work very well. I’m trying IQ-TREE (v1.6) instead.
I have a tree that has labelled internal nodes (from RAxML ancestral state reconstruction), and I want to find the parental internal node of every leaf. I need to do this for hundreds of leaves, so not feasible to do it by hand. I pasted an example part of the tree at the bottom of the post.
First of all, I looked at ete3, which is the go-to python phylogenetics library. However, parsing the tree with ete3, I couldn’t figure out the variable that ete3 had assigned the internal node labels to. The ‘name’ attribute of the internal node labels was blank. Link to the question I asked on the ete3 google group here.
Therefore, I had a go with biopython’s Phylo library. Here, it was quite straightforward to determine that the internal node labels were being read in as node.confidence. A gist using this is here.
Edit – For trees produced by IQ-TREE, the names of the internal nodes are much more sensibly assigned to node.name when read in with biopython.