run PhyloNet on Philippines Dicaeum RADseq

0.1 PhyloNet approach

  • I am going to be following the tutorial and info outlined in this link: https://phylogenomics.rice.edu/media/bookChapter.pdf
  • To start, I will set up the input for the maximum-pseudo-likelihood approach, the details of which can be found here: https://phylogenomics.rice.edu/html/commands/InferNetwork_MPL.html

0.2 set up InferNetwork_MPL input

  • The PhyloNet command ‘InferNetwork_MPL’ takes as input a set of gene trees and uses a maximum pseudo-likelihood approach to infer a species tree given number of reticulations with inferred branch lengths and inheritance probabilities. Despite the fact that the full likelihood of each proposed tree is not computed, it has been shown to perform quite reliably, and is computationally tractable.
  • I will begin by setting up a nexus full of the 2590 ML gene trees generated as input for ASTRAL, for input to PhyloNet. Branch lengths must be remmoved before input, so we will do that here.
Code
#make infer_MPL input

#load libraries
library(phytools)
Loading required package: ape
Loading required package: maps
Code
library(ape)
library(vcfR)

   *****       ***   vcfR   ***       *****
   This is vcfR 1.14.0 
     browseVignettes('vcfR') # Documentation
     citation('vcfR') # Citation
   *****       *****      *****       *****
Code
#read in gene trees inferred by iqtree2
x<-phytools::read.newick("~/Desktop/phil.dicaeum/astral/ml_best.trees")

#remove branch lengths
for (i in 1:length(x)){
  x[[i]]$edge.length<-NULL
}

#remove outgroup
for (i in 1:length(x)){
  x[[i]]<-drop.tip(x[[i]],c("niKU28413","niKU28414"))
}

#write to file
#write.tree(x, file="~/Desktop/phil.dicaeum/astral/mlbesttrees.nobrlens.nooutgroup.nex")
  • I then edited this output nexus file called ‘mlbesttrees.nobrlens.nooutgroup.nex’ in a text editor, to create an input file that looks like this:
Code
#NEXUS

BEGIN TREES;

Tree gt1=((C,((B,D),A)),E);
Tree gt2=(B,(D,(C,(A,E))));
......
Tree gt2590=(D,((B,E),(C,A)));

END;

BEGIN PHYLONET;

InferNetwork_MPL (all) 1 -a <luzon:hyp18070,hyp20218,hyp20213,hypFMNH454,hyp25637,hyp25921,hyp26984,hyp462070,hyp25675,hyp454950,hyp25880,hyp19638,hyp25868,hyp17969,hyp25672,hyp26975,hyp25670,hyp20193; zamboanga:hyp18191,hyp18159,hyp19177,hyp19178,hyp18193,hyp29945,hyp29951; mindanao:hyp14037,hyp14079,hyp14065,hyp14120,hyp14075,hyp1271,hyp1273,hyp1275,hyp3274,hyp3208,hyp3158,hyp2253,hyp3095,hyp3314,hyp357608,hyp357615,hyp357612,hyp2229,hyp2067,hyp28329,hyp28361,hyp28376,hyp2208,hyp28294,hyp1956,hyp20921,hyp28416,hyp27450,hyp19046,hyp19136,hyp28676,hyp31636,hyp31644> -pl 15;

END;
  • Where all gene trees are specified with a unique name, and the command at the bottom instructs the program to use (all) input gene trees, infer 1 reticulation edge, -a provides a ‘taxa map’ of individuals into species, and -pl tells the program the number of cores to use for the analysis. I will save this nexus file as ‘mpl.1retic.nex’, and move it to the KU cluster.

0.3 run InferNetwork_MPL

I can now run this analysis on the KU cluster using the following bash script:

Code
#!/bin/sh
#
#SBATCH --job-name=phylonet             # Job Name
#SBATCH --nodes=1             # 40 nodes
#SBATCH --ntasks-per-node=15               # 40 CPU allocation per Task
#SBATCH --partition=bi            # Name of the Slurm partition used
#SBATCH --chdir=/home/d669d153/work/phil.dicaeum/phylonet/inferMPL  # Set working d$
#SBATCH --mem-per-cpu=5gb            # memory requested
#SBATCH --time=5000

module load java
java -jar /home/d669d153/work/PhyloNet_3.8.0.jar mpl.1retic.nex

0.4 InferNetwork_MPL output

Code
library(tanggle)
Loading required package: ggplot2
Loading required package: ggtree
Registered S3 methods overwritten by 'treeio':
  method              from    
  MRCA.phylo          tidytree
  MRCA.treedata       tidytree
  Nnode.treedata      tidytree
  Ntip.treedata       tidytree
  ancestor.phylo      tidytree
  ancestor.treedata   tidytree
  child.phylo         tidytree
  child.treedata      tidytree
  full_join.phylo     tidytree
  full_join.treedata  tidytree
  groupClade.phylo    tidytree
  groupClade.treedata tidytree
  groupOTU.phylo      tidytree
  groupOTU.treedata   tidytree
  is.rooted.treedata  tidytree
  nodeid.phylo        tidytree
  nodeid.treedata     tidytree
  nodelab.phylo       tidytree
  nodelab.treedata    tidytree
  offspring.phylo     tidytree
  offspring.treedata  tidytree
  parent.phylo        tidytree
  parent.treedata     tidytree
  root.treedata       tidytree
  rootnode.phylo      tidytree
  sibling.phylo       tidytree
ggtree v3.6.2 For help: https://yulab-smu.top/treedata-book/

If you use the ggtree package suite in published research, please cite
the appropriate paper(s):

Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam.
ggtree: an R package for visualization and annotation of phylogenetic
trees with their covariates and other associated data. Methods in
Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628

Guangchuang Yu.  Data Integration, Manipulation and Visualization of
Phylogenetic Trees (1st edition). Chapman and Hall/CRC. 2022,
doi:10.1201/9781003279242

Shuangbin Xu, Lin Li, Xiao Luo, Meijun Chen, Wenli Tang, Li Zhan, Zehan
Dai, Tommy T. Lam, Yi Guan, Guangchuang Yu. Ggtree: A serialized data
object for visualization of a phylogenetic tree and annotation data.
iMeta 2022, 4(1):e56. doi:10.1002/imt2.56

Attaching package: 'ggtree'
The following object is masked from 'package:ape':

    rotate
Code
z <- read.evonet(text = "((((mindanao:1.0)#H1:1.0::0.494,zamboanga:1.0):0.3244889262249451,luzon:1.0):0.035718031146395546,#H1:1.0::0.506);")
#Plot an explicit network:
ggevonet(z, layout = "rectangular") + geom_tiplab() + geom_nodelab()

Code
ggevonet(z, layout = "slanted") + geom_tiplab() + geom_nodelab()