Grosbeak.snp.filtering

1 Load packages and read in unfiltered vcf

This vcf comes straight out of mapping raw RADseq reads to a publicly available Ceyx reference genome and calling SNPs using Stacks.

Code
library(vcfR)
library(SNPfiltR)
library(StAMPP)
library(adegenet)
library(ggplot2)
#read vcf
v<-read.vcfR("~/Downloads/populations.snps.vcf")

I will now follow the SNP filtering pipeline outlined in detail in the documents of the SNPfiltR R package (see SNPfiltR vignettes).

2 Implement quality filters that don’t involve missing data

This is because removing low data samples will alter percentage/quantile based missing data cutoffs, so we wait to implement those until after deciding on our final set of samples for downstream analysis

Code
#check the details of the unfiltered dataset
v
#visualize distributions
hard_filter(v)
no depth cutoff provided, exploratory visualization will be generated.
no genotype quality cutoff provided, exploratory visualization will be generated.
Code
#hard filter to minimum depth of 3, and minimum genotype quality of 30
v<-hard_filter(vcfR=v, depth = 3, gq = 25)
18.16% of genotypes fall below a read depth of 3 and were converted to NA
2.78% of genotypes fall below a genotype quality of 25 and were converted to NA
***** Object of Class vcfR *****
156 samples
6192 CHROMs
407,520 variants
Object size: 1417.2 Mb
67.7 percent missing data
*****        *****         *****

***** Object of Class vcfR *****
156 samples
6192 CHROMs
407,520 variants
Object size: 1417.2 Mb
67.7 percent missing data
*****        *****         *****

Use the function allele_balance() to filter for allele balance from Puritz SNP filtering tutorial “Allele balance: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous, we expect that the allele balance in our data (for real loci) should be close to 0.5”

Code
#test
filter_allele_balance(v)
10.03% of het genotypes (0.54% of all genotypes) fall outside of 0.25 - 0.75 allele balance ratio and were converted to NA

***** Object of Class vcfR *****
156 samples
6192 CHROMs
407,520 variants
Object size: 1380.1 Mb
74.44 percent missing data
*****        *****         *****
Code
#execute allele balance filter
v<-filter_allele_balance(v, 0.15, 0.85)
1.88% of het genotypes (0.1% of all genotypes) fall outside of 0.15 - 0.85 allele balance ratio and were converted to NA

We now want to implement a max depth filter (super high depth loci are likely multiple loci stuck together into a single paralogous locus, which we want to remove).

Code
#visualize and pick appropriate max depth cutoff
max_depth(v)
cutoff is not specified, exploratory visualization will be generated.
dashed line indicates a mean depth across all SNPs of 27.8
Code
#filter vcf by the max depth cutoff you chose
v<-max_depth(v, maxdepth = 250)
maxdepth cutoff is specified, filtered vcfR object will be returned
0.19% of SNPs were above a mean depth of 250 and were removed from the vcf

Remove SNPs from the vcfR that have become invariant following the removal of questionable genotypes above, and see how many SNPs we have left after this initial set of filters

Code
v<-min_mac(v, min.mac = 1)
24.69% of SNPs fell below a minor allele count of 1 and were removed from the VCF

Code
v
***** Object of Class vcfR *****
156 samples
5345 CHROMs
306,316 variants
Object size: 1217.8 Mb
68.02 percent missing data
*****        *****         *****

3 Setting the missing data by sample threshold

Check out the exploratory visualizations and make decisions about which samples to keep for downstream analysis.

Code
#run function to visualize per sample missingness
miss<-missing_by_sample(v)
No popmap provided

Code
#run function to visualize per SNP missingness
by.snp<-missing_by_snp(v)
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0569

Based on these visualizations, we will want to drop the worst sequenced samples that are dragging down the rest of the dataset. Drop those samples based on an approximate missing data proportion cutoff here (this can always be revised if we end up feeling like this is too lenient or stringent later):

Code
#run function to drop samples above the threshold we want from the vcf
vcfR.trim<-missing_by_sample(v, cutoff = .82)
17 samples are above a 0.82 missing data cutoff, and were removed from VCF

Code
#get a list of samples that got trimmed
colnames(v@gt)[!colnames(v@gt) %in% colnames(vcfR.trim@gt)]
 [1] "P_ludovicianus_44717"   "P_ludovicianus_44718"   "P_ludovicianus_44721"  
 [4] "P_ludovicianus_44735"   "P_melanocephalus_44661" "P_melanocephalus_44674"
 [7] "P_melanocephalus_44675" "P_melanocephalus_44700" "P_melanocephalus_44702"
[10] "P_melanocephalus_44767" "P_melanocephalus_44784" "P_melanocephalus_44786"
[13] "P_melanocephalus_44796" "P_melanocephalus_44797" "P_melanocephalus_44836"
[16] "P_melanocephalus_44842" "P_melanocephalus_44844"
Code
#remove invariant sites generated by sample trimming
vcfR.trim<-min_mac(vcfR.trim, min.mac = 1)
1.67% of SNPs fell below a minor allele count of 1 and were removed from the VCF

Code
#see what effect trimming samples had on missing data across the dataset
by.snp<-missing_by_snp(vcfR.trim)
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0517

4 Remove lagging low-quality samples

Code
#identify the lagging sample
test<-missing_by_snp(vcfR.trim, cutoff = .9)
cutoff is specified, filtered vcfR object will be returned
83.06% of SNPs fell below a completeness cutoff of 0.9 and were removed from the VCF

Code
test.miss<-missing_by_sample(test)
No popmap provided

Code
#manually remove this sample
vcfR.trim@gt<-vcfR.trim@gt[,colnames(vcfR.trim@gt) != "P_melanocephalus_39985"]
vcfR.trim
***** Object of Class vcfR *****
138 samples
5269 CHROMs
301,203 variants
Object size: 1157.4 Mb
65.04 percent missing data
*****        *****         *****
Code
#double check that this worked
by.snp<-missing_by_snp(vcfR.trim)
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0516

Code
#remove SNPs that have become invariant because of sample removal
vcfR.trim<-min_mac(vcfR.trim, min.mac = 1)
2.43% of SNPs fell below a minor allele count of 1 and were removed from the VCF

Code
vcfR.trim
***** Object of Class vcfR *****
138 samples
5136 CHROMs
293,890 variants
Object size: 1148 Mb
64.22 percent missing data
*****        *****         *****

5 Remove overlapping SNPs

It is a known thing (see this) that Stacks will not merge SNPs if they are sequenced by separate (but physically overlapping) loci, and will instead retain the same SNP twice. To account for this, we will simply remove a SNP every time its chromosome and position have already been seen in the dataset, using the following code:

Code
#check # of SNPs
vcfR.trim
***** Object of Class vcfR *****
138 samples
5136 CHROMs
293,890 variants
Object size: 1148 Mb
64.22 percent missing data
*****        *****         *****
Code
#generate dataframe containing information for chromosome and bp locality of each SNP
df<-as.data.frame(vcfR.trim@fix[,1:2])
#calc number of duplicated SNPs to remove
nrow(df) - length(unique(paste(df$CHROM,df$POS)))
[1] 2369
Code
#remove duplicated SNPs
vcfR.trim<-vcfR.trim[!duplicated(paste(df$CHROM,df$POS)),]
#check # of SNPs
vcfR.trim
***** Object of Class vcfR *****
138 samples
5136 CHROMs
291,521 variants
Object size: 1139.1 Mb
64.2 percent missing data
*****        *****         *****

6 Setting the missing data by SNP threshold

Now we will visualize different per SNP missing data thresholds and identify a value that optimizes the trade-off between amount of missing data and the total number of SNPs retained.

Code
#see what effect trimming samples has had on missing data across the dataset
by.snp<-missing_by_snp(vcfR.trim)
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0516

Test various SNP completeness thresholds to see if proportion missing data drives sample placement in an unrooted neighbor-joining tree

Code
#test 70% completeness threshold
filt<-missing_by_snp(vcfR.trim, cutoff=.7)
cutoff is specified, filtered vcfR object will be returned
72.97% of SNPs fell below a completeness cutoff of 0.7 and were removed from the VCF

Code
filt
***** Object of Class vcfR *****
138 samples
1243 CHROMs
78,785 variants
Object size: 712.5 Mb
8.494 percent missing data
*****        *****         *****
Code
#convert to genlight
gen<-vcfR2genlight(filt)
#fix sample names to fit in <= 10 characters
gen@ind.names
  [1] "P_hybrid_44696"         "P_hybrid_44703"         "P_hybrid_44707"        
  [4] "P_hybrid_44708"         "P_hybrid_44709"         "P_hybrid_44712"        
  [7] "P_hybrid_44762"         "P_hybrid_44771"         "P_hybrid_44781"        
 [10] "P_hybrid_45171"         "P_hybrid_45173"         "P_hybrid_45174"        
 [13] "P_ludovicianus_11998"   "P_ludovicianus_21721"   "P_ludovicianus_25286"  
 [16] "P_ludovicianus_26595"   "P_ludovicianus_33988"   "P_ludovicianus_34776"  
 [19] "P_ludovicianus_34779"   "P_ludovicianus_34782"   "P_ludovicianus_34830"  
 [22] "P_ludovicianus_44704"   "P_ludovicianus_44705"   "P_ludovicianus_44706"  
 [25] "P_ludovicianus_44710"   "P_ludovicianus_44711"   "P_ludovicianus_44713"  
 [28] "P_ludovicianus_44714"   "P_ludovicianus_44715"   "P_ludovicianus_44716"  
 [31] "P_ludovicianus_44719"   "P_ludovicianus_44720"   "P_ludovicianus_44722"  
 [34] "P_ludovicianus_44723"   "P_ludovicianus_44726"   "P_ludovicianus_44727"  
 [37] "P_ludovicianus_44729"   "P_ludovicianus_44730"   "P_ludovicianus_44731"  
 [40] "P_ludovicianus_44732"   "P_ludovicianus_44733"   "P_ludovicianus_44734"  
 [43] "P_ludovicianus_44737"   "P_ludovicianus_44738"   "P_ludovicianus_44739"  
 [46] "P_ludovicianus_44740"   "P_ludovicianus_44741"   "P_ludovicianus_44742"  
 [49] "P_ludovicianus_44743"   "P_ludovicianus_44744"   "P_ludovicianus_44745"  
 [52] "P_ludovicianus_44746"   "P_ludovicianus_44747"   "P_ludovicianus_44748"  
 [55] "P_ludovicianus_44749"   "P_ludovicianus_44753"   "P_ludovicianus_44754"  
 [58] "P_ludovicianus_44761"   "P_ludovicianus_44775"   "P_melanocephalus_34890"
 [61] "P_melanocephalus_43110" "P_melanocephalus_43276" "P_melanocephalus_44346"
 [64] "P_melanocephalus_44347" "P_melanocephalus_44471" "P_melanocephalus_44651"
 [67] "P_melanocephalus_44660" "P_melanocephalus_44666" "P_melanocephalus_44676"
 [70] "P_melanocephalus_44677" "P_melanocephalus_44678" "P_melanocephalus_44679"
 [73] "P_melanocephalus_44680" "P_melanocephalus_44681" "P_melanocephalus_44683"
 [76] "P_melanocephalus_44684" "P_melanocephalus_44685" "P_melanocephalus_44686"
 [79] "P_melanocephalus_44687" "P_melanocephalus_44688" "P_melanocephalus_44689"
 [82] "P_melanocephalus_44692" "P_melanocephalus_44693" "P_melanocephalus_44694"
 [85] "P_melanocephalus_44695" "P_melanocephalus_44697" "P_melanocephalus_44699"
 [88] "P_melanocephalus_44752" "P_melanocephalus_44760" "P_melanocephalus_44763"
 [91] "P_melanocephalus_44764" "P_melanocephalus_44765" "P_melanocephalus_44766"
 [94] "P_melanocephalus_44769" "P_melanocephalus_44770" "P_melanocephalus_44772"
 [97] "P_melanocephalus_44773" "P_melanocephalus_44774" "P_melanocephalus_44776"
[100] "P_melanocephalus_44777" "P_melanocephalus_44778" "P_melanocephalus_44779"
[103] "P_melanocephalus_44780" "P_melanocephalus_44782" "P_melanocephalus_44787"
[106] "P_melanocephalus_44788" "P_melanocephalus_44789" "P_melanocephalus_44790"
[109] "P_melanocephalus_44791" "P_melanocephalus_44792" "P_melanocephalus_44793"
[112] "P_melanocephalus_44794" "P_melanocephalus_44795" "P_melanocephalus_44798"
[115] "P_melanocephalus_44799" "P_melanocephalus_44801" "P_melanocephalus_44802"
[118] "P_melanocephalus_44803" "P_melanocephalus_44804" "P_melanocephalus_44805"
[121] "P_melanocephalus_44806" "P_melanocephalus_44808" "P_melanocephalus_44809"
[124] "P_melanocephalus_44810" "P_melanocephalus_44837" "P_melanocephalus_44840"
[127] "P_melanocephalus_44843" "P_melanocephalus_44845" "P_melanocephalus_44846"
[130] "P_melanocephalus_44847" "P_melanocephalus_44848" "P_melanocephalus_44853"
[133] "P_melanocephalus_44854" "P_melanocephalus_45175" "P_melanocephalus_45200"
[136] "P_melanocephalus_45232" "P_melanocephalus_45709" "P_melanocephalus_45926"
Code
gen@ind.names<-gsub("P_hybrid_","hyb", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","lud", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","mel", gen@ind.names)
gen@ind.names
  [1] "hyb44696" "hyb44703" "hyb44707" "hyb44708" "hyb44709" "hyb44712"
  [7] "hyb44762" "hyb44771" "hyb44781" "hyb45171" "hyb45173" "hyb45174"
 [13] "lud11998" "lud21721" "lud25286" "lud26595" "lud33988" "lud34776"
 [19] "lud34779" "lud34782" "lud34830" "lud44704" "lud44705" "lud44706"
 [25] "lud44710" "lud44711" "lud44713" "lud44714" "lud44715" "lud44716"
 [31] "lud44719" "lud44720" "lud44722" "lud44723" "lud44726" "lud44727"
 [37] "lud44729" "lud44730" "lud44731" "lud44732" "lud44733" "lud44734"
 [43] "lud44737" "lud44738" "lud44739" "lud44740" "lud44741" "lud44742"
 [49] "lud44743" "lud44744" "lud44745" "lud44746" "lud44747" "lud44748"
 [55] "lud44749" "lud44753" "lud44754" "lud44761" "lud44775" "mel34890"
 [61] "mel43110" "mel43276" "mel44346" "mel44347" "mel44471" "mel44651"
 [67] "mel44660" "mel44666" "mel44676" "mel44677" "mel44678" "mel44679"
 [73] "mel44680" "mel44681" "mel44683" "mel44684" "mel44685" "mel44686"
 [79] "mel44687" "mel44688" "mel44689" "mel44692" "mel44693" "mel44694"
 [85] "mel44695" "mel44697" "mel44699" "mel44752" "mel44760" "mel44763"
 [91] "mel44764" "mel44765" "mel44766" "mel44769" "mel44770" "mel44772"
 [97] "mel44773" "mel44774" "mel44776" "mel44777" "mel44778" "mel44779"
[103] "mel44780" "mel44782" "mel44787" "mel44788" "mel44789" "mel44790"
[109] "mel44791" "mel44792" "mel44793" "mel44794" "mel44795" "mel44798"
[115] "mel44799" "mel44801" "mel44802" "mel44803" "mel44804" "mel44805"
[121] "mel44806" "mel44808" "mel44809" "mel44810" "mel44837" "mel44840"
[127] "mel44843" "mel44845" "mel44846" "mel44847" "mel44848" "mel44853"
[133] "mel44854" "mel45175" "mel45200" "mel45232" "mel45709" "mel45926"
Code
pop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
gen@pop<-as.factor(gen@ind.names)
#generate pairwise divergence matrix
sample.div <- stamppNeisD(gen, pop = FALSE)
#export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.rad/grosbeak.70.splits.txt")
knitr::include_graphics(c("/Users/devonderaad/Desktop/grosbeak.rad/70.splits.png"))

Code
#test 80% completeness threshold
filt<-missing_by_snp(vcfR.trim, cutoff=.8)
cutoff is specified, filtered vcfR object will be returned
76.99% of SNPs fell below a completeness cutoff of 0.8 and were removed from the VCF

Code
filt
***** Object of Class vcfR *****
138 samples
1159 CHROMs
67,091 variants
Object size: 644.8 Mb
5.64 percent missing data
*****        *****         *****
Code
#convert to genlight
gen<-vcfR2genlight(filt)
#fix sample names to fit in <= 10 characters
gen@ind.names
  [1] "P_hybrid_44696"         "P_hybrid_44703"         "P_hybrid_44707"        
  [4] "P_hybrid_44708"         "P_hybrid_44709"         "P_hybrid_44712"        
  [7] "P_hybrid_44762"         "P_hybrid_44771"         "P_hybrid_44781"        
 [10] "P_hybrid_45171"         "P_hybrid_45173"         "P_hybrid_45174"        
 [13] "P_ludovicianus_11998"   "P_ludovicianus_21721"   "P_ludovicianus_25286"  
 [16] "P_ludovicianus_26595"   "P_ludovicianus_33988"   "P_ludovicianus_34776"  
 [19] "P_ludovicianus_34779"   "P_ludovicianus_34782"   "P_ludovicianus_34830"  
 [22] "P_ludovicianus_44704"   "P_ludovicianus_44705"   "P_ludovicianus_44706"  
 [25] "P_ludovicianus_44710"   "P_ludovicianus_44711"   "P_ludovicianus_44713"  
 [28] "P_ludovicianus_44714"   "P_ludovicianus_44715"   "P_ludovicianus_44716"  
 [31] "P_ludovicianus_44719"   "P_ludovicianus_44720"   "P_ludovicianus_44722"  
 [34] "P_ludovicianus_44723"   "P_ludovicianus_44726"   "P_ludovicianus_44727"  
 [37] "P_ludovicianus_44729"   "P_ludovicianus_44730"   "P_ludovicianus_44731"  
 [40] "P_ludovicianus_44732"   "P_ludovicianus_44733"   "P_ludovicianus_44734"  
 [43] "P_ludovicianus_44737"   "P_ludovicianus_44738"   "P_ludovicianus_44739"  
 [46] "P_ludovicianus_44740"   "P_ludovicianus_44741"   "P_ludovicianus_44742"  
 [49] "P_ludovicianus_44743"   "P_ludovicianus_44744"   "P_ludovicianus_44745"  
 [52] "P_ludovicianus_44746"   "P_ludovicianus_44747"   "P_ludovicianus_44748"  
 [55] "P_ludovicianus_44749"   "P_ludovicianus_44753"   "P_ludovicianus_44754"  
 [58] "P_ludovicianus_44761"   "P_ludovicianus_44775"   "P_melanocephalus_34890"
 [61] "P_melanocephalus_43110" "P_melanocephalus_43276" "P_melanocephalus_44346"
 [64] "P_melanocephalus_44347" "P_melanocephalus_44471" "P_melanocephalus_44651"
 [67] "P_melanocephalus_44660" "P_melanocephalus_44666" "P_melanocephalus_44676"
 [70] "P_melanocephalus_44677" "P_melanocephalus_44678" "P_melanocephalus_44679"
 [73] "P_melanocephalus_44680" "P_melanocephalus_44681" "P_melanocephalus_44683"
 [76] "P_melanocephalus_44684" "P_melanocephalus_44685" "P_melanocephalus_44686"
 [79] "P_melanocephalus_44687" "P_melanocephalus_44688" "P_melanocephalus_44689"
 [82] "P_melanocephalus_44692" "P_melanocephalus_44693" "P_melanocephalus_44694"
 [85] "P_melanocephalus_44695" "P_melanocephalus_44697" "P_melanocephalus_44699"
 [88] "P_melanocephalus_44752" "P_melanocephalus_44760" "P_melanocephalus_44763"
 [91] "P_melanocephalus_44764" "P_melanocephalus_44765" "P_melanocephalus_44766"
 [94] "P_melanocephalus_44769" "P_melanocephalus_44770" "P_melanocephalus_44772"
 [97] "P_melanocephalus_44773" "P_melanocephalus_44774" "P_melanocephalus_44776"
[100] "P_melanocephalus_44777" "P_melanocephalus_44778" "P_melanocephalus_44779"
[103] "P_melanocephalus_44780" "P_melanocephalus_44782" "P_melanocephalus_44787"
[106] "P_melanocephalus_44788" "P_melanocephalus_44789" "P_melanocephalus_44790"
[109] "P_melanocephalus_44791" "P_melanocephalus_44792" "P_melanocephalus_44793"
[112] "P_melanocephalus_44794" "P_melanocephalus_44795" "P_melanocephalus_44798"
[115] "P_melanocephalus_44799" "P_melanocephalus_44801" "P_melanocephalus_44802"
[118] "P_melanocephalus_44803" "P_melanocephalus_44804" "P_melanocephalus_44805"
[121] "P_melanocephalus_44806" "P_melanocephalus_44808" "P_melanocephalus_44809"
[124] "P_melanocephalus_44810" "P_melanocephalus_44837" "P_melanocephalus_44840"
[127] "P_melanocephalus_44843" "P_melanocephalus_44845" "P_melanocephalus_44846"
[130] "P_melanocephalus_44847" "P_melanocephalus_44848" "P_melanocephalus_44853"
[133] "P_melanocephalus_44854" "P_melanocephalus_45175" "P_melanocephalus_45200"
[136] "P_melanocephalus_45232" "P_melanocephalus_45709" "P_melanocephalus_45926"
Code
gen@ind.names<-gsub("P_hybrid_","hyb", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","lud", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","mel", gen@ind.names)
gen@ind.names
  [1] "hyb44696" "hyb44703" "hyb44707" "hyb44708" "hyb44709" "hyb44712"
  [7] "hyb44762" "hyb44771" "hyb44781" "hyb45171" "hyb45173" "hyb45174"
 [13] "lud11998" "lud21721" "lud25286" "lud26595" "lud33988" "lud34776"
 [19] "lud34779" "lud34782" "lud34830" "lud44704" "lud44705" "lud44706"
 [25] "lud44710" "lud44711" "lud44713" "lud44714" "lud44715" "lud44716"
 [31] "lud44719" "lud44720" "lud44722" "lud44723" "lud44726" "lud44727"
 [37] "lud44729" "lud44730" "lud44731" "lud44732" "lud44733" "lud44734"
 [43] "lud44737" "lud44738" "lud44739" "lud44740" "lud44741" "lud44742"
 [49] "lud44743" "lud44744" "lud44745" "lud44746" "lud44747" "lud44748"
 [55] "lud44749" "lud44753" "lud44754" "lud44761" "lud44775" "mel34890"
 [61] "mel43110" "mel43276" "mel44346" "mel44347" "mel44471" "mel44651"
 [67] "mel44660" "mel44666" "mel44676" "mel44677" "mel44678" "mel44679"
 [73] "mel44680" "mel44681" "mel44683" "mel44684" "mel44685" "mel44686"
 [79] "mel44687" "mel44688" "mel44689" "mel44692" "mel44693" "mel44694"
 [85] "mel44695" "mel44697" "mel44699" "mel44752" "mel44760" "mel44763"
 [91] "mel44764" "mel44765" "mel44766" "mel44769" "mel44770" "mel44772"
 [97] "mel44773" "mel44774" "mel44776" "mel44777" "mel44778" "mel44779"
[103] "mel44780" "mel44782" "mel44787" "mel44788" "mel44789" "mel44790"
[109] "mel44791" "mel44792" "mel44793" "mel44794" "mel44795" "mel44798"
[115] "mel44799" "mel44801" "mel44802" "mel44803" "mel44804" "mel44805"
[121] "mel44806" "mel44808" "mel44809" "mel44810" "mel44837" "mel44840"
[127] "mel44843" "mel44845" "mel44846" "mel44847" "mel44848" "mel44853"
[133] "mel44854" "mel45175" "mel45200" "mel45232" "mel45709" "mel45926"
Code
pop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
gen@pop<-as.factor(gen@ind.names)
#generate pairwise divergence matrix
sample.div <- stamppNeisD(gen, pop = FALSE)
#export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.rad/grosbeak.80.splits.txt")
knitr::include_graphics(c("/Users/devonderaad/Desktop/grosbeak.rad/80.splits.png"))

Code
#repeat with 90% completeness threshold
filt<-missing_by_snp(vcfR.trim, cutoff=.9)
cutoff is specified, filtered vcfR object will be returned
82.3% of SNPs fell below a completeness cutoff of 0.9 and were removed from the VCF

Code
filt
***** Object of Class vcfR *****
138 samples
1027 CHROMs
51,595 variants
Object size: 532.6 Mb
2.987 percent missing data
*****        *****         *****
Code
#convert to genlight
gen<-vcfR2genlight(filt)
#fix sample names to fit in <= 10 characters
gen@ind.names
  [1] "P_hybrid_44696"         "P_hybrid_44703"         "P_hybrid_44707"        
  [4] "P_hybrid_44708"         "P_hybrid_44709"         "P_hybrid_44712"        
  [7] "P_hybrid_44762"         "P_hybrid_44771"         "P_hybrid_44781"        
 [10] "P_hybrid_45171"         "P_hybrid_45173"         "P_hybrid_45174"        
 [13] "P_ludovicianus_11998"   "P_ludovicianus_21721"   "P_ludovicianus_25286"  
 [16] "P_ludovicianus_26595"   "P_ludovicianus_33988"   "P_ludovicianus_34776"  
 [19] "P_ludovicianus_34779"   "P_ludovicianus_34782"   "P_ludovicianus_34830"  
 [22] "P_ludovicianus_44704"   "P_ludovicianus_44705"   "P_ludovicianus_44706"  
 [25] "P_ludovicianus_44710"   "P_ludovicianus_44711"   "P_ludovicianus_44713"  
 [28] "P_ludovicianus_44714"   "P_ludovicianus_44715"   "P_ludovicianus_44716"  
 [31] "P_ludovicianus_44719"   "P_ludovicianus_44720"   "P_ludovicianus_44722"  
 [34] "P_ludovicianus_44723"   "P_ludovicianus_44726"   "P_ludovicianus_44727"  
 [37] "P_ludovicianus_44729"   "P_ludovicianus_44730"   "P_ludovicianus_44731"  
 [40] "P_ludovicianus_44732"   "P_ludovicianus_44733"   "P_ludovicianus_44734"  
 [43] "P_ludovicianus_44737"   "P_ludovicianus_44738"   "P_ludovicianus_44739"  
 [46] "P_ludovicianus_44740"   "P_ludovicianus_44741"   "P_ludovicianus_44742"  
 [49] "P_ludovicianus_44743"   "P_ludovicianus_44744"   "P_ludovicianus_44745"  
 [52] "P_ludovicianus_44746"   "P_ludovicianus_44747"   "P_ludovicianus_44748"  
 [55] "P_ludovicianus_44749"   "P_ludovicianus_44753"   "P_ludovicianus_44754"  
 [58] "P_ludovicianus_44761"   "P_ludovicianus_44775"   "P_melanocephalus_34890"
 [61] "P_melanocephalus_43110" "P_melanocephalus_43276" "P_melanocephalus_44346"
 [64] "P_melanocephalus_44347" "P_melanocephalus_44471" "P_melanocephalus_44651"
 [67] "P_melanocephalus_44660" "P_melanocephalus_44666" "P_melanocephalus_44676"
 [70] "P_melanocephalus_44677" "P_melanocephalus_44678" "P_melanocephalus_44679"
 [73] "P_melanocephalus_44680" "P_melanocephalus_44681" "P_melanocephalus_44683"
 [76] "P_melanocephalus_44684" "P_melanocephalus_44685" "P_melanocephalus_44686"
 [79] "P_melanocephalus_44687" "P_melanocephalus_44688" "P_melanocephalus_44689"
 [82] "P_melanocephalus_44692" "P_melanocephalus_44693" "P_melanocephalus_44694"
 [85] "P_melanocephalus_44695" "P_melanocephalus_44697" "P_melanocephalus_44699"
 [88] "P_melanocephalus_44752" "P_melanocephalus_44760" "P_melanocephalus_44763"
 [91] "P_melanocephalus_44764" "P_melanocephalus_44765" "P_melanocephalus_44766"
 [94] "P_melanocephalus_44769" "P_melanocephalus_44770" "P_melanocephalus_44772"
 [97] "P_melanocephalus_44773" "P_melanocephalus_44774" "P_melanocephalus_44776"
[100] "P_melanocephalus_44777" "P_melanocephalus_44778" "P_melanocephalus_44779"
[103] "P_melanocephalus_44780" "P_melanocephalus_44782" "P_melanocephalus_44787"
[106] "P_melanocephalus_44788" "P_melanocephalus_44789" "P_melanocephalus_44790"
[109] "P_melanocephalus_44791" "P_melanocephalus_44792" "P_melanocephalus_44793"
[112] "P_melanocephalus_44794" "P_melanocephalus_44795" "P_melanocephalus_44798"
[115] "P_melanocephalus_44799" "P_melanocephalus_44801" "P_melanocephalus_44802"
[118] "P_melanocephalus_44803" "P_melanocephalus_44804" "P_melanocephalus_44805"
[121] "P_melanocephalus_44806" "P_melanocephalus_44808" "P_melanocephalus_44809"
[124] "P_melanocephalus_44810" "P_melanocephalus_44837" "P_melanocephalus_44840"
[127] "P_melanocephalus_44843" "P_melanocephalus_44845" "P_melanocephalus_44846"
[130] "P_melanocephalus_44847" "P_melanocephalus_44848" "P_melanocephalus_44853"
[133] "P_melanocephalus_44854" "P_melanocephalus_45175" "P_melanocephalus_45200"
[136] "P_melanocephalus_45232" "P_melanocephalus_45709" "P_melanocephalus_45926"
Code
gen@ind.names<-gsub("P_hybrid_","hyb", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","lud", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","mel", gen@ind.names)
gen@ind.names
  [1] "hyb44696" "hyb44703" "hyb44707" "hyb44708" "hyb44709" "hyb44712"
  [7] "hyb44762" "hyb44771" "hyb44781" "hyb45171" "hyb45173" "hyb45174"
 [13] "lud11998" "lud21721" "lud25286" "lud26595" "lud33988" "lud34776"
 [19] "lud34779" "lud34782" "lud34830" "lud44704" "lud44705" "lud44706"
 [25] "lud44710" "lud44711" "lud44713" "lud44714" "lud44715" "lud44716"
 [31] "lud44719" "lud44720" "lud44722" "lud44723" "lud44726" "lud44727"
 [37] "lud44729" "lud44730" "lud44731" "lud44732" "lud44733" "lud44734"
 [43] "lud44737" "lud44738" "lud44739" "lud44740" "lud44741" "lud44742"
 [49] "lud44743" "lud44744" "lud44745" "lud44746" "lud44747" "lud44748"
 [55] "lud44749" "lud44753" "lud44754" "lud44761" "lud44775" "mel34890"
 [61] "mel43110" "mel43276" "mel44346" "mel44347" "mel44471" "mel44651"
 [67] "mel44660" "mel44666" "mel44676" "mel44677" "mel44678" "mel44679"
 [73] "mel44680" "mel44681" "mel44683" "mel44684" "mel44685" "mel44686"
 [79] "mel44687" "mel44688" "mel44689" "mel44692" "mel44693" "mel44694"
 [85] "mel44695" "mel44697" "mel44699" "mel44752" "mel44760" "mel44763"
 [91] "mel44764" "mel44765" "mel44766" "mel44769" "mel44770" "mel44772"
 [97] "mel44773" "mel44774" "mel44776" "mel44777" "mel44778" "mel44779"
[103] "mel44780" "mel44782" "mel44787" "mel44788" "mel44789" "mel44790"
[109] "mel44791" "mel44792" "mel44793" "mel44794" "mel44795" "mel44798"
[115] "mel44799" "mel44801" "mel44802" "mel44803" "mel44804" "mel44805"
[121] "mel44806" "mel44808" "mel44809" "mel44810" "mel44837" "mel44840"
[127] "mel44843" "mel44845" "mel44846" "mel44847" "mel44848" "mel44853"
[133] "mel44854" "mel45175" "mel45200" "mel45232" "mel45709" "mel45926"
Code
pop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
gen@pop<-as.factor(gen@ind.names)
#generate pairwise divergence matrix
sample.div <- stamppNeisD(gen, pop = FALSE)
#export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.rad/grosbeak.90.splits.txt")
knitr::include_graphics(c("/Users/devonderaad/Desktop/grosbeak.rad/90.splits.png"))

Code
#repeat with 100% completeness threshold
filt<-missing_by_snp(vcfR.trim, cutoff=1)
cutoff is specified, filtered vcfR object will be returned
96.77% of SNPs fell below a completeness cutoff of 1 and were removed from the VCF

Code
filt
***** Object of Class vcfR *****
138 samples
574 CHROMs
9,417 variants
Object size: 112.3 Mb
0 percent missing data
*****        *****         *****
Code
#convert to genlight
gen<-vcfR2genlight(filt)
#fix sample names to fit in <= 10 characters
gen@ind.names
  [1] "P_hybrid_44696"         "P_hybrid_44703"         "P_hybrid_44707"        
  [4] "P_hybrid_44708"         "P_hybrid_44709"         "P_hybrid_44712"        
  [7] "P_hybrid_44762"         "P_hybrid_44771"         "P_hybrid_44781"        
 [10] "P_hybrid_45171"         "P_hybrid_45173"         "P_hybrid_45174"        
 [13] "P_ludovicianus_11998"   "P_ludovicianus_21721"   "P_ludovicianus_25286"  
 [16] "P_ludovicianus_26595"   "P_ludovicianus_33988"   "P_ludovicianus_34776"  
 [19] "P_ludovicianus_34779"   "P_ludovicianus_34782"   "P_ludovicianus_34830"  
 [22] "P_ludovicianus_44704"   "P_ludovicianus_44705"   "P_ludovicianus_44706"  
 [25] "P_ludovicianus_44710"   "P_ludovicianus_44711"   "P_ludovicianus_44713"  
 [28] "P_ludovicianus_44714"   "P_ludovicianus_44715"   "P_ludovicianus_44716"  
 [31] "P_ludovicianus_44719"   "P_ludovicianus_44720"   "P_ludovicianus_44722"  
 [34] "P_ludovicianus_44723"   "P_ludovicianus_44726"   "P_ludovicianus_44727"  
 [37] "P_ludovicianus_44729"   "P_ludovicianus_44730"   "P_ludovicianus_44731"  
 [40] "P_ludovicianus_44732"   "P_ludovicianus_44733"   "P_ludovicianus_44734"  
 [43] "P_ludovicianus_44737"   "P_ludovicianus_44738"   "P_ludovicianus_44739"  
 [46] "P_ludovicianus_44740"   "P_ludovicianus_44741"   "P_ludovicianus_44742"  
 [49] "P_ludovicianus_44743"   "P_ludovicianus_44744"   "P_ludovicianus_44745"  
 [52] "P_ludovicianus_44746"   "P_ludovicianus_44747"   "P_ludovicianus_44748"  
 [55] "P_ludovicianus_44749"   "P_ludovicianus_44753"   "P_ludovicianus_44754"  
 [58] "P_ludovicianus_44761"   "P_ludovicianus_44775"   "P_melanocephalus_34890"
 [61] "P_melanocephalus_43110" "P_melanocephalus_43276" "P_melanocephalus_44346"
 [64] "P_melanocephalus_44347" "P_melanocephalus_44471" "P_melanocephalus_44651"
 [67] "P_melanocephalus_44660" "P_melanocephalus_44666" "P_melanocephalus_44676"
 [70] "P_melanocephalus_44677" "P_melanocephalus_44678" "P_melanocephalus_44679"
 [73] "P_melanocephalus_44680" "P_melanocephalus_44681" "P_melanocephalus_44683"
 [76] "P_melanocephalus_44684" "P_melanocephalus_44685" "P_melanocephalus_44686"
 [79] "P_melanocephalus_44687" "P_melanocephalus_44688" "P_melanocephalus_44689"
 [82] "P_melanocephalus_44692" "P_melanocephalus_44693" "P_melanocephalus_44694"
 [85] "P_melanocephalus_44695" "P_melanocephalus_44697" "P_melanocephalus_44699"
 [88] "P_melanocephalus_44752" "P_melanocephalus_44760" "P_melanocephalus_44763"
 [91] "P_melanocephalus_44764" "P_melanocephalus_44765" "P_melanocephalus_44766"
 [94] "P_melanocephalus_44769" "P_melanocephalus_44770" "P_melanocephalus_44772"
 [97] "P_melanocephalus_44773" "P_melanocephalus_44774" "P_melanocephalus_44776"
[100] "P_melanocephalus_44777" "P_melanocephalus_44778" "P_melanocephalus_44779"
[103] "P_melanocephalus_44780" "P_melanocephalus_44782" "P_melanocephalus_44787"
[106] "P_melanocephalus_44788" "P_melanocephalus_44789" "P_melanocephalus_44790"
[109] "P_melanocephalus_44791" "P_melanocephalus_44792" "P_melanocephalus_44793"
[112] "P_melanocephalus_44794" "P_melanocephalus_44795" "P_melanocephalus_44798"
[115] "P_melanocephalus_44799" "P_melanocephalus_44801" "P_melanocephalus_44802"
[118] "P_melanocephalus_44803" "P_melanocephalus_44804" "P_melanocephalus_44805"
[121] "P_melanocephalus_44806" "P_melanocephalus_44808" "P_melanocephalus_44809"
[124] "P_melanocephalus_44810" "P_melanocephalus_44837" "P_melanocephalus_44840"
[127] "P_melanocephalus_44843" "P_melanocephalus_44845" "P_melanocephalus_44846"
[130] "P_melanocephalus_44847" "P_melanocephalus_44848" "P_melanocephalus_44853"
[133] "P_melanocephalus_44854" "P_melanocephalus_45175" "P_melanocephalus_45200"
[136] "P_melanocephalus_45232" "P_melanocephalus_45709" "P_melanocephalus_45926"
Code
gen@ind.names<-gsub("P_hybrid_","hyb", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","lud", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","mel", gen@ind.names)
gen@ind.names
  [1] "hyb44696" "hyb44703" "hyb44707" "hyb44708" "hyb44709" "hyb44712"
  [7] "hyb44762" "hyb44771" "hyb44781" "hyb45171" "hyb45173" "hyb45174"
 [13] "lud11998" "lud21721" "lud25286" "lud26595" "lud33988" "lud34776"
 [19] "lud34779" "lud34782" "lud34830" "lud44704" "lud44705" "lud44706"
 [25] "lud44710" "lud44711" "lud44713" "lud44714" "lud44715" "lud44716"
 [31] "lud44719" "lud44720" "lud44722" "lud44723" "lud44726" "lud44727"
 [37] "lud44729" "lud44730" "lud44731" "lud44732" "lud44733" "lud44734"
 [43] "lud44737" "lud44738" "lud44739" "lud44740" "lud44741" "lud44742"
 [49] "lud44743" "lud44744" "lud44745" "lud44746" "lud44747" "lud44748"
 [55] "lud44749" "lud44753" "lud44754" "lud44761" "lud44775" "mel34890"
 [61] "mel43110" "mel43276" "mel44346" "mel44347" "mel44471" "mel44651"
 [67] "mel44660" "mel44666" "mel44676" "mel44677" "mel44678" "mel44679"
 [73] "mel44680" "mel44681" "mel44683" "mel44684" "mel44685" "mel44686"
 [79] "mel44687" "mel44688" "mel44689" "mel44692" "mel44693" "mel44694"
 [85] "mel44695" "mel44697" "mel44699" "mel44752" "mel44760" "mel44763"
 [91] "mel44764" "mel44765" "mel44766" "mel44769" "mel44770" "mel44772"
 [97] "mel44773" "mel44774" "mel44776" "mel44777" "mel44778" "mel44779"
[103] "mel44780" "mel44782" "mel44787" "mel44788" "mel44789" "mel44790"
[109] "mel44791" "mel44792" "mel44793" "mel44794" "mel44795" "mel44798"
[115] "mel44799" "mel44801" "mel44802" "mel44803" "mel44804" "mel44805"
[121] "mel44806" "mel44808" "mel44809" "mel44810" "mel44837" "mel44840"
[127] "mel44843" "mel44845" "mel44846" "mel44847" "mel44848" "mel44853"
[133] "mel44854" "mel45175" "mel45200" "mel45232" "mel45709" "mel45926"
Code
pop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
gen@pop<-as.factor(gen@ind.names)
#generate pairwise divergence matrix
sample.div <- stamppNeisD(gen, pop = FALSE)
#export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.rad/grosbeak.100.splits.txt")
knitr::include_graphics(c("/Users/devonderaad/Desktop/grosbeak.rad/100.splits.png"))

Missing data does not appear to be driving sample placement, so I’m going to implement a 90% per-SNP completeness cutoff, which seems like a good trade-off.

Code
#set 90% completeness per SNP threshold
filt<-missing_by_snp(vcfR.trim, cutoff=.90)
cutoff is specified, filtered vcfR object will be returned
82.3% of SNPs fell below a completeness cutoff of 0.9 and were removed from the VCF

Code
#print info on the resulting dataset
filt
***** Object of Class vcfR *****
138 samples
1027 CHROMs
51,595 variants
Object size: 532.6 Mb
2.987 percent missing data
*****        *****         *****
Code
#get per sample missing data info
bysamp<-missing_by_sample(filt)
No popmap provided

Code
#print in descending order by missing data
bysamp$unfiltered.stats[order(bysamp$unfiltered.stats[,2]),]
                                      samples prop.missing mean.depth
P_melanocephalus_43110 P_melanocephalus_43110  0.002597151 204.393463
P_melanocephalus_44792 P_melanocephalus_44792  0.003391802 226.417075
P_melanocephalus_44840 P_melanocephalus_44840  0.003469328  87.621382
P_melanocephalus_44808 P_melanocephalus_44808  0.003740673  68.118750
P_melanocephalus_43276 P_melanocephalus_43276  0.004108925  70.563474
P_melanocephalus_44695 P_melanocephalus_44695  0.004108925 290.195571
P_melanocephalus_44853 P_melanocephalus_44853  0.004457796 114.973776
P_melanocephalus_44680 P_melanocephalus_44680  0.004477178  94.089985
P_melanocephalus_44809 P_melanocephalus_44809  0.004554705 129.607009
P_melanocephalus_44778 P_melanocephalus_44778  0.004593468  47.364695
P_melanocephalus_44685 P_melanocephalus_44685  0.004632232 172.368370
P_melanocephalus_44791 P_melanocephalus_44791  0.004729140  95.780822
P_melanocephalus_44788 P_melanocephalus_44788  0.004806667 107.910978
P_melanocephalus_44795 P_melanocephalus_44795  0.004826049  49.157072
P_melanocephalus_44764 P_melanocephalus_44764  0.004922958  98.850704
P_melanocephalus_44804 P_melanocephalus_44804  0.005058630 167.689562
P_melanocephalus_44782 P_melanocephalus_44782  0.005271829 374.030415
P_melanocephalus_44805 P_melanocephalus_44805  0.005310592 166.366458
P_melanocephalus_44847 P_melanocephalus_44847  0.005329974  92.838504
P_hybrid_45174                 P_hybrid_45174  0.005349356  83.837896
P_melanocephalus_44787 P_melanocephalus_44787  0.005504409 172.190271
P_melanocephalus_44773 P_melanocephalus_44773  0.005543173  94.823579
P_hybrid_44771                 P_hybrid_44771  0.005562555 103.525279
P_melanocephalus_44803 P_melanocephalus_44803  0.005620700  82.220836
P_melanocephalus_44810 P_melanocephalus_44810  0.005698227  77.205025
P_melanocephalus_44770 P_melanocephalus_44770  0.005736990 151.095051
P_melanocephalus_44845 P_melanocephalus_44845  0.005795135 517.445961
P_melanocephalus_44677 P_melanocephalus_44677  0.005833899 154.985905
P_melanocephalus_44766 P_melanocephalus_44766  0.005950189 179.453732
P_melanocephalus_44794 P_melanocephalus_44794  0.006027716  61.180115
P_melanocephalus_44765 P_melanocephalus_44765  0.006144006  50.390304
P_melanocephalus_44793 P_melanocephalus_44793  0.006279678  67.063740
P_melanocephalus_44683 P_melanocephalus_44683  0.006492877  53.292860
P_melanocephalus_44684 P_melanocephalus_44684  0.006531641  66.356003
P_melanocephalus_44776 P_melanocephalus_44776  0.006725458 109.958203
P_hybrid_44696                 P_hybrid_44696  0.006861130 175.130481
P_melanocephalus_45200 P_melanocephalus_45200  0.006938657 110.175674
P_melanocephalus_44774 P_melanocephalus_44774  0.006977420  73.555831
P_melanocephalus_44660 P_melanocephalus_44660  0.007054947 158.128399
P_melanocephalus_44780 P_melanocephalus_44780  0.007054947  55.545510
P_hybrid_45171                 P_hybrid_45171  0.007190619 387.530572
P_melanocephalus_44681 P_melanocephalus_44681  0.007229383 647.700656
P_melanocephalus_44692 P_melanocephalus_44692  0.007403818  56.540371
P_melanocephalus_44763 P_melanocephalus_44763  0.007520109 567.633038
P_melanocephalus_44679 P_melanocephalus_44679  0.007539490  75.670019
P_melanocephalus_44799 P_melanocephalus_44799  0.007558872 178.310048
P_hybrid_45173                 P_hybrid_45173  0.007810834  47.576262
P_melanocephalus_45926 P_melanocephalus_45926  0.007849598  86.966029
P_melanocephalus_44752 P_melanocephalus_44752  0.007868980 524.343941
P_melanocephalus_44843 P_melanocephalus_44843  0.008004652 104.359638
P_melanocephalus_44798 P_melanocephalus_44798  0.008043415  56.271356
P_melanocephalus_44769 P_melanocephalus_44769  0.008198469 590.913038
P_melanocephalus_44848 P_melanocephalus_44848  0.008275996 348.373886
P_melanocephalus_44678 P_melanocephalus_44678  0.008295377  43.497586
P_melanocephalus_44789 P_melanocephalus_44789  0.008334141  42.518186
P_melanocephalus_44846 P_melanocephalus_44846  0.008334141  52.243819
P_melanocephalus_44837 P_melanocephalus_44837  0.008353523 147.221425
P_melanocephalus_44666 P_melanocephalus_44666  0.008372904 162.687528
P_melanocephalus_44854 P_melanocephalus_44854  0.008489195 123.896808
P_ludovicianus_34830     P_ludovicianus_34830  0.008915593 105.055402
P_melanocephalus_45709 P_melanocephalus_45709  0.009051265  33.473068
P_melanocephalus_44790 P_melanocephalus_44790  0.009477663  56.371620
P_melanocephalus_44688 P_melanocephalus_44688  0.009574571 456.008884
P_ludovicianus_44741     P_ludovicianus_44741  0.009632716 243.295824
P_melanocephalus_44777 P_melanocephalus_44777  0.009865297 512.092472
P_hybrid_44708                 P_hybrid_44708  0.009962206 125.645661
P_ludovicianus_44705     P_ludovicianus_44705  0.010039733 204.137224
P_ludovicianus_44746     P_ludovicianus_44746  0.010117259  69.433086
P_melanocephalus_44689 P_melanocephalus_44689  0.010407985  40.462905
P_melanocephalus_44772 P_melanocephalus_44772  0.010873147  42.071011
P_ludovicianus_44710     P_ludovicianus_44710  0.011435217  71.856936
P_ludovicianus_44727     P_ludovicianus_44727  0.011629034 125.305599
P_ludovicianus_44753     P_ludovicianus_44753  0.011880996  85.525774
P_ludovicianus_44775     P_ludovicianus_44775  0.011900378 123.049842
P_ludovicianus_44738     P_ludovicianus_44738  0.012249249  75.030689
P_ludovicianus_44732     P_ludovicianus_44732  0.012404303  95.953057
P_ludovicianus_44726     P_ludovicianus_44726  0.012675647 360.327339
P_hybrid_44709                 P_hybrid_44709  0.012753174  80.046881
P_ludovicianus_44745     P_ludovicianus_44745  0.012753174 433.571412
P_ludovicianus_44734     P_ludovicianus_44734  0.012869464 463.937288
P_ludovicianus_44743     P_ludovicianus_44743  0.012888846 134.471844
P_hybrid_44707                 P_hybrid_44707  0.013140808 402.861755
P_ludovicianus_21721     P_ludovicianus_21721  0.013276480  96.160322
P_ludovicianus_44722     P_ludovicianus_44722  0.013373389  57.636912
P_ludovicianus_34782     P_ludovicianus_34782  0.013509061 106.722641
P_ludovicianus_11998     P_ludovicianus_11998  0.013567206  48.439768
P_ludovicianus_44706     P_ludovicianus_44706  0.013605970  74.769143
P_melanocephalus_44760 P_melanocephalus_44760  0.014885163  28.435280
P_ludovicianus_44740     P_ludovicianus_44740  0.014904545 184.978023
P_ludovicianus_44754     P_ludovicianus_44754  0.015137126 267.301236
P_ludovicianus_44761     P_ludovicianus_44761  0.015214653  51.548849
P_ludovicianus_44719     P_ludovicianus_44719  0.015311561  91.095935
P_ludovicianus_44715     P_ludovicianus_44715  0.015330943  91.889300
P_ludovicianus_44730     P_ludovicianus_44730  0.016009303  72.246706
P_melanocephalus_44694 P_melanocephalus_44694  0.016416319  28.608221
P_melanocephalus_44651 P_melanocephalus_44651  0.016435701  25.720417
P_ludovicianus_44714     P_ludovicianus_44714  0.016687664 141.513186
P_ludovicianus_44747     P_ludovicianus_44747  0.017559841 812.964253
P_melanocephalus_44699 P_melanocephalus_44699  0.020040702  37.005498
P_ludovicianus_44739     P_ludovicianus_44739  0.023238686  47.610941
P_melanocephalus_44347 P_melanocephalus_44347  0.023858901  24.680923
P_ludovicianus_44729     P_ludovicianus_44729  0.025235003  39.517070
P_ludovicianus_44713     P_ludovicianus_44713  0.028084117  29.779304
P_hybrid_44781                 P_hybrid_44781  0.029634655  27.190029
P_melanocephalus_44801 P_melanocephalus_44801  0.031495300  23.168841
P_hybrid_44703                 P_hybrid_44703  0.031650354  24.484168
P_melanocephalus_34890 P_melanocephalus_34890  0.032890784  22.654816
P_ludovicianus_33988     P_ludovicianus_33988  0.033045838  29.079234
P_ludovicianus_44749     P_ludovicianus_44749  0.033452854  44.434839
P_melanocephalus_44687 P_melanocephalus_44687  0.033666053  22.789141
P_ludovicianus_44711     P_ludovicianus_44711  0.035953096  23.613430
P_ludovicianus_44731     P_ludovicianus_44731  0.037019091  25.810104
P_ludovicianus_44744     P_ludovicianus_44744  0.037348580  26.594407
P_ludovicianus_44716     P_ludovicianus_44716  0.045391995  19.060098
P_melanocephalus_44686 P_melanocephalus_44686  0.052912104  20.753689
P_melanocephalus_44693 P_melanocephalus_44693  0.053338502  16.082919
P_melanocephalus_44346 P_melanocephalus_44346  0.055315438  21.470815
P_melanocephalus_44697 P_melanocephalus_44697  0.057350518  16.636874
P_melanocephalus_44676 P_melanocephalus_44676  0.063455761  16.074522
P_ludovicianus_34776     P_ludovicianus_34776  0.070801434  19.425493
P_melanocephalus_45232 P_melanocephalus_45232  0.071227832  18.169887
P_ludovicianus_26595     P_ludovicianus_26595  0.076596569  17.785656
P_ludovicianus_44737     P_ludovicianus_44737  0.081034984  16.831632
P_melanocephalus_45175 P_melanocephalus_45175  0.093284233  15.251079
P_ludovicianus_44723     P_ludovicianus_44723  0.106948348  13.635827
P_ludovicianus_44720     P_ludovicianus_44720  0.117046225  16.037097
P_ludovicianus_44733     P_ludovicianus_44733  0.121193914  13.505867
P_melanocephalus_44471 P_melanocephalus_44471  0.126872759  13.385802
P_melanocephalus_44779 P_melanocephalus_44779  0.131660045  13.197982
P_ludovicianus_44748     P_ludovicianus_44748  0.137222599  13.018848
P_melanocephalus_44806 P_melanocephalus_44806  0.141312143  12.202623
P_ludovicianus_44704     P_ludovicianus_44704  0.147553057  11.792029
P_ludovicianus_25286     P_ludovicianus_25286  0.148909778  11.464588
P_ludovicianus_34779     P_ludovicianus_34779  0.149665665  13.078271
P_hybrid_44762                 P_hybrid_44762  0.157844752  10.886056
P_hybrid_44712                 P_hybrid_44712  0.175598411  13.123052
P_melanocephalus_44802 P_melanocephalus_44802  0.205736990   9.568838
P_ludovicianus_44742     P_ludovicianus_44742  0.214536292  11.198712

7 Visualize depth and quality across all retained genotypes

Code
#plot depth per snp and per sample
dp <- extract.gt(filt, element = "DP", as.numeric=TRUE)
heatmap.bp(dp, rlabels = FALSE)

Code
#plot genotype quality per snp and per sample
gq <- extract.gt(filt, element = "GQ", as.numeric=TRUE)
heatmap.bp(gq, rlabels = FALSE)

8 Separate sex-linked SNPs from autosomal SNPs

This chunk is not run (eval=FALSE is set in the chunk header) because I ended up deciding that the cutoff is too arbitrary, and reviewers would likely balk at this approach to isolating Z-linked SNPs as untrustworthy. I think this is better left for a future project with a good reference genome and whole genome data. In the meantime, I’ve left this code here as an artifact in case I need to come back to it in the future.

Code
#extract depth matrix
dpmat<-extract.gt(filt, element = "DP")
dpmat[1:5,1:5] #check
dpmat<-apply(dpmat, 2, as.numeric) #convert matrix to numeric
dpmat[1:5,1:5] #check
#read in sample sheet with sample sex info
samps<-read.csv("~/Desktop/grosbeak.data.csv")
samps<-samps[samps$passed.genomic.filtering == "TRUE",] #retain only samples that passed filtering
samps$sample_id == colnames(dpmat) #check if the dataframe is in the same order as the matrix
samps<-samps[match(colnames(dpmat),samps$sample_id),] #reorder dataframe using the command 'match'
samps$sample_id == colnames(dpmat) #check if this worked
#compare sequencing depth between male and female samples
mean(dpmat[,samps$sex=="female"], na.rm = TRUE)
mean(dpmat[,samps$sex=="male"], na.rm = TRUE)
#because male and female sequencing depth is highly uneven, I will need to subsample males to get a set of males with appropriately matched (as close to even as possible) sequencing depth
#apply(dpmat, 2, mean, na.rm=TRUE) #you can use this code to get a list of depth per sample for the next step
#manually (arbitrarily) pick out a subset of male samples that have similar depth to the female samples we sequenced and check that their depth is close to ~68.7x, like the female group shown above
mean(dpmat[,c(4:12,16,50,39,117,83,84,26,137,135,106)], na.rm = TRUE)

#compare average depth between the subset of male and female samples
x<-c()
for (i in 1:nrow(dpmat)){
  x[i]<-mean(dpmat[i,samps$sex=="male"], na.rm = TRUE)/mean(dpmat[i,samps$sex=="female"], na.rm = TRUE)
}
#look at the distribution
hist(x, breaks = 200)
#isolate the peak of the normal distribution that corresponds to the autosomal loci
sort(table(round(x,2)),decreasing = TRUE)
#1.86 is the 'typical' autosomal loci, so we will be looking for SNPs with > 1.8x this ratio (we expect 2x but want to account for noise)
table(x > 3.348)

#isolate X-linked SNPs
xSNPs<-filt[c(x>3.348),]
xSNPs
#convert to genlight
gen<-vcfR2genlight(xSNPs)
#fix sample names to fit in <= 10 characters
gen@ind.names
gen@ind.names<-gsub("P_hybrid_","hyb", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","lud", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","mel", gen@ind.names)
gen@ind.names
pop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
gen@pop<-as.factor(gen@ind.names)
#generate pairwise divergence matrix
sample.div <- stamppNeisD(gen, pop = FALSE)
#export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.xSNPs.splits.txt")
knitr::include_graphics(c("/Users/devonderaad/Desktop/z.splits.png"))

#isolate autosomal SNPs
autoSNPs<-filt[c(x<3.348)]
autoSNPs
#convert to genlight
gen<-vcfR2genlight(autoSNPs)
#fix sample names to fit in <= 10 characters
gen@ind.names
gen@ind.names<-gsub("P_hybrid_","h", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","l", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","m", gen@ind.names)
gen@ind.names
pop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
gen@pop<-as.factor(gen@ind.names)
#generate pairwise divergence matrix
sample.div <- stamppNeisD(gen, pop = FALSE)
#export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.autoSNPs.splits.txt")
knitr::include_graphics(c("/Users/devonderaad/Desktop/autosomal.splits.png"))

9 linkage filter

Now filter to get unlinked SNP datasets

Code
#perform linkage filtering to get a reduced vcf with only unlinked SNPs
filt.thin<-distance_thin(filt, min.distance = 1000)

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |==                                                                    |   4%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |=========                                                             |  14%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |==============================                                        |  44%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=====================================                                 |  54%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |===================================================                   |  74%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |==========================================================            |  84%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |===============================================================       |  91%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================|  99%
  |                                                                            
  |======================================================================| 100%
3584 out of 51595 input SNPs were not located within 1000 base-pairs of another SNP and were retained despite filtering

10 write vcf to disk for downstream analyses

Code
#get info for all SNPs passing filtering 
filt
***** Object of Class vcfR *****
138 samples
1027 CHROMs
51,595 variants
Object size: 532.6 Mb
2.987 percent missing data
*****        *****         *****
Code
#write to disk
vcfR::write.vcf(filt, file = "~/Desktop/grosbeak.rad/grosbeak.filtered.snps.vcf.gz")

#get info for the thinned SNP dataset
filt.thin
***** Object of Class vcfR *****
138 samples
1027 CHROMs
3,584 variants
Object size: 40.7 Mb
3.403 percent missing data
*****        *****         *****
Code
#write to disk
vcfR::write.vcf(filt.thin, file = "~/Desktop/grosbeak.rad/grosbeak.filtered.unlinked.snps.vcf.gz")