Let’s first import our tree data. We’re going to work with a made-up phylogeny with 13 samples (“tips”). Download the tree_newick.nwk data by clicking here or using the link above. Let’s load the libraries you’ll need if you haven’t already, and then import the tree using read.tree()
. Displaying the object itself really isn’t useful. The output just tells you a little bit about the tree itself.
library(tidyverse)
library(ggtree)
library(phylobase)
library(ape)
tree <- read.tree("data/tree_newick.nwk")
tree
# build a ggplot with a geom_tree
ggplot(tree) + geom_tree() + theme_tree()
# This is convenient shorthand
ggtree(tree)
Adding a Scale Bar
There’s also the treescale geom, which adds a scale bar, or alternatively, you can change the default ggtree()
theme to theme_tree2()
, which adds a scale on the x-axis.
# add a scale
ggtree(tree) + geom_treescale()
# or add the entire scale to the x axis with theme_tree2()
ggtree(tree) + theme_tree2()
Removing Scale Bar and Convert into Phylogram
The default is to plot a phylogram, where the x-axis shows the genetic change / evolutionary distance. If you want to disable scaling and produce a cladogram instead, set the branch.length="none"
option inside the ggtree()
call. See ?ggtree
for more.
ggtree(tree, branch.length="none")
The ...
option in the help for ?ggtree
represents additional options that are further passed to ggplot()
. You can use this to change aesthetics of the plot. Let’s draw a cladogram (no branch scaling) using thick blue dotted lines (note that I’m not mapping these aesthetics to features of the data with aes()
– we’ll get to that later).
ggtree(tree, branch.length="none", color="blue", size=1, linetype=6)
More Features in Tree Shapes
Look at the help again for ?ggtree
, specifically at the layout=
option. By default, it produces a rectangular layout.
suppressWarnings(suppressPackageStartupMessages(library(ggtree)))
tree <- read.tree(file.path(rprojroot::find_rstudio_root_file(), "data", "tree_newick.nwk"))
ggtree(tree, layout="slanted")
ggtree(tree, layout="circular")
ggtree(tree, layout="circular", branch.length="none", color="red", size=3)
Let’s add additional layers. As we did in the ggplot2 lesson, we can create a plot object, e.g., p
, to store the basic layout of a ggplot, and add more layers to it as we desire. Let’s add node and tip points. Let’s finally label the tips.
Create the basic plot
# create the basic plot
p <- ggtree(tree)
p
Creating Nodepoints from The basic Plot
p + geom_nodepoint()
\[\\[0.05in]\]
Creating Tip Points from the Basic Plot
p + geom_tippoint()
Labeling the Tips from the Basic Plot
p + geom_tiplab()
#tree <- read.tree(file.path(rprojroot::find_rstudio_root_file(), "data", "tree_newick.nwk"))
p <- ggtree(tree)
p +
geom_tiplab(color="darkorchid", size=5) +
geom_tippoint(color="darkorchid", size=2, shape=18) +
geom_nodepoint(color="goldenrod", size=4, alpha=1/2) +
ggtitle("Not the prettiest phylogenetic aesthetics, but it'll do.")
The geom_tiplab()
function adds some very rudimentary annotation. Let’s take annotation a bit further. See the tree annotation and advanced tree annotation vignettes for more.
Before we can go further we need to understand how ggtree is handling the tree structure internally. Some of the functions in ggtree for annotating clades need a parameter specifying the internal node number. To get the internal node number, user can use geom_text
to display it, where the label is an aesthetic mapping to the “node variable” stored inside the tree object (think of this like the continent variable inside the gapminder object). We also supply the hjust
option so that the labels aren’t sitting right on top of the nodes. Read more about this process in the ggtree manipulation vignette.
ggtree(tree) + geom_text(aes(label=node), hjust=-.3)
Another way to get the internal node number is using MRCA()
function by providing a vector of taxa names (created using c("taxon1", "taxon2")
).. The function will return node number of input taxa’s most recent commond ancestor (MRCA). First, re-create the plot so you can choose which taxa you want to grab the MRCA from.
ggtree(tree) + geom_tiplab()
Let’s grab the most recent common ancestor for taxa C+E, and taxa G+H. We can use MRCA()
to get the internal node numbers. Go back to the node-labeled plot from before to confirm this.
MRCA(tree, tip=c("C", "E"))
## [1] 17
MRCA(tree, tip=c("G", "H"))
## [1] 21
MRCA(tree, tip=c("L", "I"))
## [1] 23
We can use geom_cladelabel()
to add another geom layer to annotate a selected clade with a bar indicating the clade with a corresponding label. You select the clades using the internal node number for the node that connects all the taxa in that clade. See the tree annotation vignette for more.
Let’s annotate the clade with the most recent common ancestor between taxa C and E (internal node 17). Let’s make the annotation red. See ?geom_cladelabel
help for more.
ggtree(tree) +
geom_cladelabel(node=17, label="Some random clade", color="red")
Let’s add back in the tip labels. Notice how now the clade label is too close to the tip labels. Let’s add an offset to adjust the position. You might have to fiddle with this number to get it looking right.
ggtree(tree) +
geom_tiplab() +
geom_cladelabel(node=17, label="Some random clade",
color="red2", offset=.8)
Now let’s add another label for the clade connecting taxa G and H (internal node 21).
ggtree(tree) +
geom_tiplab() +
geom_cladelabel(node=17, label="Some random clade",
color="red2", offset=.8) +
geom_cladelabel(node=21, label="A different clade 1",
color="blue", offset=.8)+
geom_cladelabel(node=23, label="A different clade 2",
color="green", offset=.8) +
geom_cladelabel(node=13, label= "Outgroup",
color="black", offset=.8)
Uh oh. Now we have two problems. First, the labels would look better if they were aligned. That’s simple. Pass align=TRUE
to geom_cladelabel()
(see ?geom_cladelabel
help for more). But now, the labels are falling off the edge of the plot. That’s because geom_cladelabel()
is just adding it this layer onto the end of the existing canvas that was originally layed out in the ggtree call. This default layout tried to optimize by plotting the entire tree over the entire region of the plot. Here’s how we’ll fix this.
ggtree(tree)
.theme_tree2()
? We used it way back to add a scale to the x-axis showing the genetic distance. This is the unit of the x-axis. We need to set the limits on the x-axis. Google around for something like “ggplot2 x axis limits” and you’ll wind up on this StackOverflow page that tells you exactly how to solve it – just add on a + xlim(..., ...)
layer. Here let’s extend out the axis a bit further to the right.theme_tree2()
segment of the code, or we could just add another theme layer on top of the plot altogether, which will override the theme that was set before. theme_tree()
doesn’t have the scale.ggtree(tree) +
geom_tiplab() +
geom_cladelabel(node=17, label="Some random clade",
color="red2", offset=.8, align=TRUE) +
geom_cladelabel(node=21, label="A different clade 1",
color="blue", offset=.8, align=TRUE) +
geom_cladelabel(node=23, label="A different clade 2",
color="green", offset=.8) +
geom_cladelabel(node=13, label= "Outgroup",
color="black", offset=.8) +
theme_tree2() +
xlim(0,70) +
theme_tree()
Alternatively, we could highlight the entire clade with geom_hilight()
. See the help for options to tweak.
ggtree(tree) +
geom_tiplab() +
geom_hilight(node=17, fill="gold") +
geom_hilight(node=21, fill="purple")
Orwe could collapse the entire clade with collapse
command. See the help for options to tweak.
p2 <- p + geom_tiplab()
collapse(p2, 19, 'min', color= "red", fill='steelblue', alpha=.4) %>%
collapse(24, 'max', fill='firebrick', color='blue')
Exercise 1
Some evolutionary events (e.g. reassortment, horizontal gene transfer) can be visualized with some simple annotations on a tree. The geom_taxalink()
layer draws straight or curved lines between any of two nodes in the tree, allow it to show evolutionary events by connecting taxa. Take a look at the tree annotation vignette and ?geom_taxalink
for more.
ggtree(tree) +
geom_tiplab() +
geom_taxalink("E", "H", color="blue3") +
geom_taxalink("C", "G", color="orange2", curvature=-.9)
Try different values of curvature
ggtree(tree) +
geom_tiplab() +
geom_taxalink("E", "H", color="blue3") +
geom_taxalink("C", "G", color="orange2", curvature=-.2)
Produce the figure below.
MRCA(tree, tip=c("taxon1", "taxon2"))
for B/C and L/J separately.ggtree(tree) + geom_text(aes(label=node), hjust=-.3)
to see what the node labels are on the plot. You might also add tip labels here too.ggtree(tree)
.linetype=2
somewhere in the geom_taxalink()
).ggtree(tree, ...)
call and change the layout to "circular"
.