Basic trees

Let’s first import our tree data. We’re going to work with a made-up phylogeny with 13 samples (“tips”). Download the tree_newick.nwk data by clicking here or using the link above. Let’s load the libraries you’ll need if you haven’t already, and then import the tree using read.tree(). Displaying the object itself really isn’t useful. The output just tells you a little bit about the tree itself.

library(tidyverse)
library(ggtree)
library(phylobase)
library(ape)
tree <- read.tree("data/tree_newick.nwk")
tree
# build a ggplot with a geom_tree
ggplot(tree) + geom_tree() + theme_tree()

# This is convenient shorthand
ggtree(tree)

Adding a Scale Bar

There’s also the treescale geom, which adds a scale bar, or alternatively, you can change the default ggtree() theme to theme_tree2(), which adds a scale on the x-axis.

  • The horizontal dimension in this plot shows the amount of genetic change, and the branches and represent evolutionary lineages changing over time.
  • The longer the branch in the horizonal dimension, the larger the amount of change, and the scale tells you this. The units of branch length are usually nucleotide substitutions per site – that is, the number of changes or substitutions divided by the length of the sequence (alternatively, it could represent the percent change, i.e., the number of changes per 100 bases). See this article for more.
# add a scale
ggtree(tree) + geom_treescale()

# or add the entire scale to the x axis with theme_tree2()
ggtree(tree) + theme_tree2()

Removing Scale Bar and Convert into Phylogram

The default is to plot a phylogram, where the x-axis shows the genetic change / evolutionary distance. If you want to disable scaling and produce a cladogram instead, set the branch.length="none" option inside the ggtree() call. See ?ggtree for more.

ggtree(tree, branch.length="none")

The ... option in the help for ?ggtree represents additional options that are further passed to ggplot(). You can use this to change aesthetics of the plot. Let’s draw a cladogram (no branch scaling) using thick blue dotted lines (note that I’m not mapping these aesthetics to features of the data with aes() – we’ll get to that later).

ggtree(tree, branch.length="none", color="blue", size=1, linetype=6)


More Features in Tree Shapes

Look at the help again for ?ggtree, specifically at the layout= option. By default, it produces a rectangular layout.

  1. Create a slanted phylogenetic tree.
  2. Create a circular phylogenetic tree.
  3. Create a circular unscaled cladogram with thick red lines.
suppressWarnings(suppressPackageStartupMessages(library(ggtree)))
tree <- read.tree(file.path(rprojroot::find_rstudio_root_file(), "data", "tree_newick.nwk"))
ggtree(tree, layout="slanted")

ggtree(tree, layout="circular")

ggtree(tree, layout="circular", branch.length="none", color="red", size=3)


Other tree geoms

Let’s add additional layers. As we did in the ggplot2 lesson, we can create a plot object, e.g., p, to store the basic layout of a ggplot, and add more layers to it as we desire. Let’s add node and tip points. Let’s finally label the tips.

Create the basic plot

# create the basic plot
p <- ggtree(tree)
p

Creating Nodepoints from The basic Plot

p + geom_nodepoint()

\[\\[0.05in]\]

Creating Tip Points from the Basic Plot

p + geom_tippoint()

Labeling the Tips from the Basic Plot

p + geom_tiplab()

#tree <- read.tree(file.path(rprojroot::find_rstudio_root_file(), "data", "tree_newick.nwk"))
p <- ggtree(tree) 
p + 
geom_tiplab(color="darkorchid", size=5) + 
geom_tippoint(color="darkorchid", size=2, shape=18) + 
geom_nodepoint(color="goldenrod", size=4, alpha=1/2) + 
ggtitle("Not the prettiest phylogenetic aesthetics, but it'll do.")

Tree annotation

The geom_tiplab() function adds some very rudimentary annotation. Let’s take annotation a bit further. See the tree annotation and advanced tree annotation vignettes for more.

Internal node number

Before we can go further we need to understand how ggtree is handling the tree structure internally. Some of the functions in ggtree for annotating clades need a parameter specifying the internal node number. To get the internal node number, user can use geom_text to display it, where the label is an aesthetic mapping to the “node variable” stored inside the tree object (think of this like the continent variable inside the gapminder object). We also supply the hjust option so that the labels aren’t sitting right on top of the nodes. Read more about this process in the ggtree manipulation vignette.

ggtree(tree) + geom_text(aes(label=node), hjust=-.3)

Another way to get the internal node number is using MRCA() function by providing a vector of taxa names (created using c("taxon1", "taxon2")).. The function will return node number of input taxa’s most recent commond ancestor (MRCA). First, re-create the plot so you can choose which taxa you want to grab the MRCA from.

ggtree(tree) + geom_tiplab()

Let’s grab the most recent common ancestor for taxa C+E, and taxa G+H. We can use MRCA() to get the internal node numbers. Go back to the node-labeled plot from before to confirm this.

MRCA(tree, tip=c("C", "E"))
## [1] 17
MRCA(tree, tip=c("G", "H"))
## [1] 21
MRCA(tree, tip=c("L", "I"))
## [1] 23

Labeling clades

We can use geom_cladelabel() to add another geom layer to annotate a selected clade with a bar indicating the clade with a corresponding label. You select the clades using the internal node number for the node that connects all the taxa in that clade. See the tree annotation vignette for more.

Let’s annotate the clade with the most recent common ancestor between taxa C and E (internal node 17). Let’s make the annotation red. See ?geom_cladelabel help for more.

ggtree(tree) + 
  geom_cladelabel(node=17, label="Some random clade", color="red")

Let’s add back in the tip labels. Notice how now the clade label is too close to the tip labels. Let’s add an offset to adjust the position. You might have to fiddle with this number to get it looking right.

ggtree(tree) + 
  geom_tiplab() + 
  geom_cladelabel(node=17, label="Some random clade", 
                  color="red2", offset=.8)

Now let’s add another label for the clade connecting taxa G and H (internal node 21).

ggtree(tree) + 
  geom_tiplab() + 
  geom_cladelabel(node=17, label="Some random clade", 
                  color="red2", offset=.8) + 
  geom_cladelabel(node=21, label="A different clade 1", 
                  color="blue", offset=.8)+
  geom_cladelabel(node=23, label="A different clade 2", 
                  color="green", offset=.8) +
  geom_cladelabel(node=13, label= "Outgroup", 
                  color="black", offset=.8) 

Uh oh. Now we have two problems. First, the labels would look better if they were aligned. That’s simple. Pass align=TRUE to geom_cladelabel() (see ?geom_cladelabel help for more). But now, the labels are falling off the edge of the plot. That’s because geom_cladelabel() is just adding it this layer onto the end of the existing canvas that was originally layed out in the ggtree call. This default layout tried to optimize by plotting the entire tree over the entire region of the plot. Here’s how we’ll fix this.

  1. First create the generic layout of the plot with ggtree(tree).
  2. Add some tip labels.
  3. Add each clade label.
  4. Remember theme_tree2()? We used it way back to add a scale to the x-axis showing the genetic distance. This is the unit of the x-axis. We need to set the limits on the x-axis. Google around for something like “ggplot2 x axis limits” and you’ll wind up on this StackOverflow page that tells you exactly how to solve it – just add on a + xlim(..., ...) layer. Here let’s extend out the axis a bit further to the right.
  5. Finally, if we want, we can either comment out the theme_tree2() segment of the code, or we could just add another theme layer on top of the plot altogether, which will override the theme that was set before. theme_tree() doesn’t have the scale.
ggtree(tree) + 
  geom_tiplab() + 
  geom_cladelabel(node=17, label="Some random clade", 
                  color="red2", offset=.8, align=TRUE) + 
  geom_cladelabel(node=21, label="A different clade 1", 
                  color="blue", offset=.8, align=TRUE) + 
  geom_cladelabel(node=23, label="A different clade 2", 
                  color="green", offset=.8) +
  geom_cladelabel(node=13, label= "Outgroup", 
                  color="black", offset=.8) +
  theme_tree2() + 
  xlim(0,70) + 
  theme_tree()

Alternatively, we could highlight the entire clade with geom_hilight(). See the help for options to tweak.

ggtree(tree) + 
  geom_tiplab() + 
  geom_hilight(node=17, fill="gold") + 
  geom_hilight(node=21, fill="purple")

Orwe could collapse the entire clade with collapse command. See the help for options to tweak.

p2 <- p + geom_tiplab()
collapse(p2, 19, 'min', color= "red", fill='steelblue', alpha=.4) %>% 
collapse(24, 'max', fill='firebrick', color='blue')

Exercise 1

  • Change the nodes into 18 and 23
  • Change the outer line of the collapse shape into orange and yellow

Connecting taxa

Some evolutionary events (e.g. reassortment, horizontal gene transfer) can be visualized with some simple annotations on a tree. The geom_taxalink() layer draws straight or curved lines between any of two nodes in the tree, allow it to show evolutionary events by connecting taxa. Take a look at the tree annotation vignette and ?geom_taxalink for more.

ggtree(tree) + 
  geom_tiplab() + 
  geom_taxalink("E", "H", color="blue3") +
  geom_taxalink("C", "G", color="orange2", curvature=-.9)

Try different values of curvature

ggtree(tree) + 
  geom_tiplab() + 
  geom_taxalink("E", "H", color="blue3") +
  geom_taxalink("C", "G", color="orange2", curvature=-.2)

Exercise

Produce the figure below.

  1. First, find what the MRCA is for taxa B+C, and taxa L+J. You can do this in one of two ways:
    1. Easiest: use MRCA(tree, tip=c("taxon1", "taxon2")) for B/C and L/J separately.
    2. Alternatively: use ggtree(tree) + geom_text(aes(label=node), hjust=-.3) to see what the node labels are on the plot. You might also add tip labels here too.
  2. Draw the tree with ggtree(tree).
  3. Add tip labels.
  4. Highlight these clades with separate colors.
  5. Add a clade label to the larger superclade (node=17) that we saw before that includes A, B, C, D, and E. You’ll probably need an offset to get this looking right.
  6. Link taxa C to E, and G to J with a dashed gray line (hint: get the geom working first, then try changing the aesthetics. You’ll need linetype=2 somewhere in the geom_taxalink()).
  7. Add a scale bar to the bottom by changing the theme.
  8. Add a title.
  9. Optionally, go back to the original ggtree(tree, ...) call and change the layout to "circular".