Skip to content

Drawing an alignment with the tree

Giorgio Bianchini edited this page Jul 13, 2021 · 6 revisions

This guide will provide instructions on how to draw a phylogenetic tree that includes a sequence alignment of the same sequences that were used to build the tree. Including an alignment and a phylogenetic tree in a single plot can be useful to highlight sequence features that the tree alone cannot convey effectively, such as amino acids in the active site of an enzyme, or particularly relevant indels.

Consider for example the case of the sepJ gene, which encodes a protein that is important for multicellularity in cyanobacteria. This protein contains three main domains: a coiled-coil motif at the N-terminus (CC), a permease domain at the C-terminus (P) and a central linker (L) domain between the two. However, not all cyanobacteria are multicellular and, accordingly, some of them possess this gene, others possess a "shortened" version without the L domain, and others only have a distant homolog, which is an uncharacterised drug/metabolite exporter (DME) and only encodes the P domain.

In this tutorial, we will show a sequence alignment together with a phylogenetic tree of sepJ, which will allow us to immediately highlight the difference between the various kinds of sepJ homologs.

The file sepJ.tre contains a rooted phylogenetic tree of 44 sepJ homologs (adapted from Urrejola et al., 2020). When the tree is opened in TreeViewer, it should look similar to the following figure:

Cleaning up the tree

First, in this case we do not really want to show the branch length labels, both because they do not provide much information, and because some branches are so short that the branch lengths overlap each other. Therefore, you can start cleaning up the tree for display by removing the second Labels module that was added to the Plot elements when TreeViewer opened the tree file. This will delete the branch length labels from the plot.

We can also make the tree more compact by opening the options for the Coordinates module and setting the Width to 400. The tree should now look similar to the following figure:

Adding the alignment to the tree

The sepJ.fas file contains an alignment in FASTA format of the protein sequences that were used to build the tree. This alignment can be embedded with the tree by loading it as an attachment; to do this, click on the + button next to Attachments in the top-left of the TreeViewer window, then select the file and confirm. This will cause sepJ to be shown with a paperclip icon under the "Attachments" header.

This step added the alignment file to the tree, but did not actually plot the alignment. To plot the alignment, we need to use the Plot alignment Plot action module. To enable this module, click on the + under Plot elements and select it from the list. Once the module has been added (do not worry if a warning sign appears, this is because no alignment file has been selected yet), expand the options for it and set the FASTA alignment parameter to the sepJ attachment.

This will cause an alignment plot to be drawn just below the phylogenetic tree; the tree should look similar to the following figure:

We now need to position the alignment plot so that it is to the right of the tree plot, and resize it appropriately so that each sequence is right next to the tip of the tree to which it refers.

First of all, set the Anchor parameter to Top-right and the Alignment to Top-left; this will align the top-left corner of the alignment with the top-right corner of the tree plot. Now, when we will have positioned the alignment accurately, each sequence will sit right next to the label to which it refers, which means that the sequence labels on the alignment plot are redudant. You can get rid of them by setting the Label position parameter to Neither. To move the alignment to the right so that it does not overlap the tree labels, you should set the X component of the position to 200.

The tree plot should now look similar to the following figure:

To properly align the sequences with the tree labels, we need to make sure that each sequence is allocated the same vertical space as each tip of the tree. We can determine how much vertical space is available for each tip of the tree by looking at the Height parameter of the Coordinates module: in this case, it should have been set by default to 616. Since the tree contains 44 tips, this means that each tip is allocated 616 / 44 = 14 plot units. To apply this to the alignment, we need to make sure that Sequence height + Margin (in the parameters for the Plot alignment module) is equal to 14; since the Margin is equal to 2, this means that we need to set the value of the Sequence height to 12.

This will cause the alignment to expand so that each sequence occupied 12 units plus the Margin (i.e. 14 units in total, just like the tips of the tree). The tree should now look similar to the following figure:

However, the sequences are still not perfectly aligned with the labels: in fact, the top edge of each sequence is aligned with the middle of the corresponding tip label. To address this, we need to move the alignment up by a value that corresponds to half of the sequence's height (so that the middle of each sequence is aligned with the middle of its label). This means that the Y component of the Position parameter of the Plot alignment module needs to be set to -14 / 2 = -7. After doing this, the alignment should be perfectly aligned with the tree, and it should look like the following figure:

Highlighting the sequence groups

This tree makes it easy to see which sequences only have the P domain (the DME sequences), which ones have the P domain and the CC domain (the Pseudanabaena and the first group immediately after them), and which ones have all three domains (all the others). However, it would be nice to highlight these differences on the tree so that they are even clearer.

The first step to highlight these sequences would be to change their colour. To do this, you can start by selecting the branch corresponding to the common ancestor of all DME sequences by clicking on it. This will open the Selection panel on the right. Here, you can click on the + to add a new attribute. Set the Attribute name to Color and leave the Attribute value empty. This will add a new Add attribute module to the Further transformations. Expand the options for this new module; first of all, check the Apply recursively to all children check box, then click on the New value text box (as if you wanted to enter some text in it). Now, press CTRL+SHIFT+C (on Windows and Linux, or CMD+SHIFT+C on macOS) on your keyboard; this will open a colour picker window in which you can select a new colour for these sequences. Select an orange hue, then click on OK to close the colour picker and click on Apply to apply the new colour.

This should change the colour of the selected sequences, and the tree plot should now look like the following figure:

Now, select the branch corresponding to the last common ancestor of the Pseudanabaena sepJ sequences and repeat the same steps to assign a green colour to it. The tree should now look similar to the following figure:

Then, click on the next group that diversifies (i.e. the last common ancestor of the sequences from sepJ Geitlerinema sp. FC II to sepJ Spirulina subsalsa PCC 9445 2) and repeat the steps above to assign a light blue colour to it. The tree should look similar to the following figure:

Finally, click on the last common ancestor of the group of "full" sepJ sequences (i.e. the last common ancestor of the sequences from sepJ Nodularia sp. NIES-3585 to sepJ cyanobacterium PCC 7702 genomic). Again, follow the steps above to assign a dark blue colour to it:

To make these colours show up in the sequence alignment as well, expand the options for the Plot alignment module. Go in the Colours section, and click on the three vertical dots next to the Colour parameter. In the Colour formatter window that opens, set the Attribute name to Color and click on OK. The colours should now have been applied to the alignment and the plot should look similar to the following figure:

Displaying group names

The final step is to display the names of these groups that we have just highlighted. First of all, we need to create new attributes to store the group names. To do this, select the branch corresponding to the common ancestor of the orange sequences and add a new attribute to it, called Domains and with value P only. You should then add a new Domains attribute to the green and light blue groups, with value CC+P for both. Finally, add a Domains attribute to the ancestor to the dark blue group, with value CC+L+P.

To display these labels, we need to use the Group labels Plot action module. To enable this module, click on the + button under the Plot elements and select the module. Then, open its options and set the Attribute to Domains. This should cause the labels to appear on the tree (although in the wrong position):

To position the labels appropriately, set the Distance to 820; then, click on the button to change the Font and increase the font size to 18:

Finally, we can add another set of group labels to highlight which sequences are actual sepJ homologs and which sequences belong to the unidentified DME. To do this, select again the ancestor of all the DME sequences and add to it an attribute called Group with value DME. Then, select the last common ancestor of all the sepJ sequences (i.e. the sister group to the one you just selected) and add another attribute called Group with value sepJ.

Now, add another Group labels Plot action module by clicking on the + button under the Plot elements. In the options for this module, set the Attribute to Group, and the Distance to 850. The tree should now look similar to the following plot:

To make these labels look prettier, first of all click on the Font button and select the Helvetica-BoldOblique font and increase the font size to 24. Then, increase the Height to 40 and set the Fill colour to a light grey (e.g. #B4B4B4). Finally, click on the three dots next to the Colour in the Text options and in the new window set the Default colour to white and the Attribute name to N/A. This should cause the text to be come white. The final tree plot should look similar to the following figure:

You can now save the tree file or the plot as a PDF or SVG file using the items from the File menu. You can also download the sepJ.tbi tree file, which contains the tree along with all the modules.

Tips

  • If you wish, you can remove the % identity and % gaps plots (i.e. the green and pink plot at the bottom of the alignment) by setting the respective colours to transparent in the options for the Plot alignment module.

  • If you wish to make the font sizes bigger, you can increase the vertical spacing of the tips of the tree by increasing the Height in the options for the Coordinates module. Note that you will have to adjust the Sequence height in the options for the Plot alignment module as well.

  • Instead of drawing each sequence with a single colour, you can choose to colour each position based on the nucleotide/amino acid that is in it. To do this, change the Colour mode to By residue and then click on the button corresponding to the kind of sequence data that is included in the alignment (DNA/RNA or Protein). Note that this will be much slower! You can also use define custom colours by clicking on the three dots next to the Residue colours parameter.

  • You can also decide to draw the letters of the alignment, by checking the Draw residue letters check box. Again, this will be much slower; you also need to make sure that the Residue width is high enough, otherwise there will not be enough space to read the letters.

  • Colouring each residue in the sequence or displaying the alignment letters is particularly useful for short sequence stretches (e.g. active site or other conserved positions in which you want to highlight a synapomorfism or an autapomorfism). You can select which part of the sequence is shown (i.e. the start and end nucleotide/aminoacid) by changing the Start and End parameters. If you wish to show two disjoint ranges of residues, you will need to use two instances of the Plot alignment module.

References

Catalina Urrejola, Peter von Dassow, Ger van den Engh, Loreto Salas, Conrad W. Mullineaux, Rafael Vicuña, Patricia Sánchez-Baracaldo, Loss of Filamentous Multicellularity in Cyanobacteria: the Extremophile Gloeocapsopsis sp. Strain UTEX B3054 Retained Multicellular Features at the Genomic and Behavioral Levels, Journal of Bacteriology, 202(12), 2020. https://doi.org/10.1128/JB.00514-19.

Clone this wiki locally