Code
library(vcfR)
library(SNPfiltR)
library(StAMPP)
library(adegenet)
library(ggplot2)
#read vcf
<-read.vcfR("~/Downloads/populations.snps.vcf") v
This vcf comes straight out of mapping raw RADseq reads to a publicly available Ceyx reference genome and calling SNPs using Stacks.
library(vcfR)
library(SNPfiltR)
library(StAMPP)
library(adegenet)
library(ggplot2)
#read vcf
<-read.vcfR("~/Downloads/populations.snps.vcf") v
I will now follow the SNP filtering pipeline outlined in detail in the documents of the SNPfiltR R package (see SNPfiltR vignettes).
This is because removing low data samples will alter percentage/quantile based missing data cutoffs, so we wait to implement those until after deciding on our final set of samples for downstream analysis
#check the details of the unfiltered dataset
v#visualize distributions
hard_filter(v)
no depth cutoff provided, exploratory visualization will be generated.
no genotype quality cutoff provided, exploratory visualization will be generated.
#hard filter to minimum depth of 3, and minimum genotype quality of 30
<-hard_filter(vcfR=v, depth = 3, gq = 25) v
18.16% of genotypes fall below a read depth of 3 and were converted to NA
2.78% of genotypes fall below a genotype quality of 25 and were converted to NA
***** Object of Class vcfR *****
156 samples
6192 CHROMs
407,520 variants
Object size: 1417.2 Mb
67.7 percent missing data
***** ***** *****
***** Object of Class vcfR *****
156 samples
6192 CHROMs
407,520 variants
Object size: 1417.2 Mb
67.7 percent missing data
***** ***** *****
Use the function allele_balance()
to filter for allele balance from Puritz SNP filtering tutorial “Allele balance: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous, we expect that the allele balance in our data (for real loci) should be close to 0.5”
#test
filter_allele_balance(v)
10.03% of het genotypes (0.54% of all genotypes) fall outside of 0.25 - 0.75 allele balance ratio and were converted to NA
***** Object of Class vcfR *****
156 samples
6192 CHROMs
407,520 variants
Object size: 1380.1 Mb
74.44 percent missing data
***** ***** *****
#execute allele balance filter
<-filter_allele_balance(v, 0.15, 0.85) v
1.88% of het genotypes (0.1% of all genotypes) fall outside of 0.15 - 0.85 allele balance ratio and were converted to NA
We now want to implement a max depth filter (super high depth loci are likely multiple loci stuck together into a single paralogous locus, which we want to remove).
#visualize and pick appropriate max depth cutoff
max_depth(v)
cutoff is not specified, exploratory visualization will be generated.
dashed line indicates a mean depth across all SNPs of 27.8
#filter vcf by the max depth cutoff you chose
<-max_depth(v, maxdepth = 250) v
maxdepth cutoff is specified, filtered vcfR object will be returned
0.19% of SNPs were above a mean depth of 250 and were removed from the vcf
Remove SNPs from the vcfR that have become invariant following the removal of questionable genotypes above, and see how many SNPs we have left after this initial set of filters
<-min_mac(v, min.mac = 1) v
24.69% of SNPs fell below a minor allele count of 1 and were removed from the VCF
v
***** Object of Class vcfR *****
156 samples
5345 CHROMs
306,316 variants
Object size: 1217.8 Mb
68.02 percent missing data
***** ***** *****
Check out the exploratory visualizations and make decisions about which samples to keep for downstream analysis.
#run function to visualize per sample missingness
<-missing_by_sample(v) miss
No popmap provided
#run function to visualize per SNP missingness
<-missing_by_snp(v) by.snp
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0569
Based on these visualizations, we will want to drop the worst sequenced samples that are dragging down the rest of the dataset. Drop those samples based on an approximate missing data proportion cutoff here (this can always be revised if we end up feeling like this is too lenient or stringent later):
#run function to drop samples above the threshold we want from the vcf
<-missing_by_sample(v, cutoff = .82) vcfR.trim
17 samples are above a 0.82 missing data cutoff, and were removed from VCF
#get a list of samples that got trimmed
colnames(v@gt)[!colnames(v@gt) %in% colnames(vcfR.trim@gt)]
[1] "P_ludovicianus_44717" "P_ludovicianus_44718" "P_ludovicianus_44721"
[4] "P_ludovicianus_44735" "P_melanocephalus_44661" "P_melanocephalus_44674"
[7] "P_melanocephalus_44675" "P_melanocephalus_44700" "P_melanocephalus_44702"
[10] "P_melanocephalus_44767" "P_melanocephalus_44784" "P_melanocephalus_44786"
[13] "P_melanocephalus_44796" "P_melanocephalus_44797" "P_melanocephalus_44836"
[16] "P_melanocephalus_44842" "P_melanocephalus_44844"
#remove invariant sites generated by sample trimming
<-min_mac(vcfR.trim, min.mac = 1) vcfR.trim
1.67% of SNPs fell below a minor allele count of 1 and were removed from the VCF
#see what effect trimming samples had on missing data across the dataset
<-missing_by_snp(vcfR.trim) by.snp
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0517
#identify the lagging sample
<-missing_by_snp(vcfR.trim, cutoff = .9) test
cutoff is specified, filtered vcfR object will be returned
83.06% of SNPs fell below a completeness cutoff of 0.9 and were removed from the VCF
<-missing_by_sample(test) test.miss
No popmap provided
#manually remove this sample
@gt<-vcfR.trim@gt[,colnames(vcfR.trim@gt) != "P_melanocephalus_39985"]
vcfR.trim vcfR.trim
***** Object of Class vcfR *****
138 samples
5269 CHROMs
301,203 variants
Object size: 1157.4 Mb
65.04 percent missing data
***** ***** *****
#double check that this worked
<-missing_by_snp(vcfR.trim) by.snp
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0516
#remove SNPs that have become invariant because of sample removal
<-min_mac(vcfR.trim, min.mac = 1) vcfR.trim
2.43% of SNPs fell below a minor allele count of 1 and were removed from the VCF
vcfR.trim
***** Object of Class vcfR *****
138 samples
5136 CHROMs
293,890 variants
Object size: 1148 Mb
64.22 percent missing data
***** ***** *****
It is a known thing (see this) that Stacks will not merge SNPs if they are sequenced by separate (but physically overlapping) loci, and will instead retain the same SNP twice. To account for this, we will simply remove a SNP every time its chromosome and position have already been seen in the dataset, using the following code:
#check # of SNPs
vcfR.trim
***** Object of Class vcfR *****
138 samples
5136 CHROMs
293,890 variants
Object size: 1148 Mb
64.22 percent missing data
***** ***** *****
#generate dataframe containing information for chromosome and bp locality of each SNP
<-as.data.frame(vcfR.trim@fix[,1:2])
df#calc number of duplicated SNPs to remove
nrow(df) - length(unique(paste(df$CHROM,df$POS)))
[1] 2369
#remove duplicated SNPs
<-vcfR.trim[!duplicated(paste(df$CHROM,df$POS)),]
vcfR.trim#check # of SNPs
vcfR.trim
***** Object of Class vcfR *****
138 samples
5136 CHROMs
291,521 variants
Object size: 1139.1 Mb
64.2 percent missing data
***** ***** *****
Now we will visualize different per SNP missing data thresholds and identify a value that optimizes the trade-off between amount of missing data and the total number of SNPs retained.
#see what effect trimming samples has had on missing data across the dataset
<-missing_by_snp(vcfR.trim) by.snp
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0516
Test various SNP completeness thresholds to see if proportion missing data drives sample placement in an unrooted neighbor-joining tree
#test 70% completeness threshold
<-missing_by_snp(vcfR.trim, cutoff=.7) filt
cutoff is specified, filtered vcfR object will be returned
72.97% of SNPs fell below a completeness cutoff of 0.7 and were removed from the VCF
filt
***** Object of Class vcfR *****
138 samples
1243 CHROMs
78,785 variants
Object size: 712.5 Mb
8.494 percent missing data
***** ***** *****
#convert to genlight
<-vcfR2genlight(filt)
gen#fix sample names to fit in <= 10 characters
@ind.names gen
[1] "P_hybrid_44696" "P_hybrid_44703" "P_hybrid_44707"
[4] "P_hybrid_44708" "P_hybrid_44709" "P_hybrid_44712"
[7] "P_hybrid_44762" "P_hybrid_44771" "P_hybrid_44781"
[10] "P_hybrid_45171" "P_hybrid_45173" "P_hybrid_45174"
[13] "P_ludovicianus_11998" "P_ludovicianus_21721" "P_ludovicianus_25286"
[16] "P_ludovicianus_26595" "P_ludovicianus_33988" "P_ludovicianus_34776"
[19] "P_ludovicianus_34779" "P_ludovicianus_34782" "P_ludovicianus_34830"
[22] "P_ludovicianus_44704" "P_ludovicianus_44705" "P_ludovicianus_44706"
[25] "P_ludovicianus_44710" "P_ludovicianus_44711" "P_ludovicianus_44713"
[28] "P_ludovicianus_44714" "P_ludovicianus_44715" "P_ludovicianus_44716"
[31] "P_ludovicianus_44719" "P_ludovicianus_44720" "P_ludovicianus_44722"
[34] "P_ludovicianus_44723" "P_ludovicianus_44726" "P_ludovicianus_44727"
[37] "P_ludovicianus_44729" "P_ludovicianus_44730" "P_ludovicianus_44731"
[40] "P_ludovicianus_44732" "P_ludovicianus_44733" "P_ludovicianus_44734"
[43] "P_ludovicianus_44737" "P_ludovicianus_44738" "P_ludovicianus_44739"
[46] "P_ludovicianus_44740" "P_ludovicianus_44741" "P_ludovicianus_44742"
[49] "P_ludovicianus_44743" "P_ludovicianus_44744" "P_ludovicianus_44745"
[52] "P_ludovicianus_44746" "P_ludovicianus_44747" "P_ludovicianus_44748"
[55] "P_ludovicianus_44749" "P_ludovicianus_44753" "P_ludovicianus_44754"
[58] "P_ludovicianus_44761" "P_ludovicianus_44775" "P_melanocephalus_34890"
[61] "P_melanocephalus_43110" "P_melanocephalus_43276" "P_melanocephalus_44346"
[64] "P_melanocephalus_44347" "P_melanocephalus_44471" "P_melanocephalus_44651"
[67] "P_melanocephalus_44660" "P_melanocephalus_44666" "P_melanocephalus_44676"
[70] "P_melanocephalus_44677" "P_melanocephalus_44678" "P_melanocephalus_44679"
[73] "P_melanocephalus_44680" "P_melanocephalus_44681" "P_melanocephalus_44683"
[76] "P_melanocephalus_44684" "P_melanocephalus_44685" "P_melanocephalus_44686"
[79] "P_melanocephalus_44687" "P_melanocephalus_44688" "P_melanocephalus_44689"
[82] "P_melanocephalus_44692" "P_melanocephalus_44693" "P_melanocephalus_44694"
[85] "P_melanocephalus_44695" "P_melanocephalus_44697" "P_melanocephalus_44699"
[88] "P_melanocephalus_44752" "P_melanocephalus_44760" "P_melanocephalus_44763"
[91] "P_melanocephalus_44764" "P_melanocephalus_44765" "P_melanocephalus_44766"
[94] "P_melanocephalus_44769" "P_melanocephalus_44770" "P_melanocephalus_44772"
[97] "P_melanocephalus_44773" "P_melanocephalus_44774" "P_melanocephalus_44776"
[100] "P_melanocephalus_44777" "P_melanocephalus_44778" "P_melanocephalus_44779"
[103] "P_melanocephalus_44780" "P_melanocephalus_44782" "P_melanocephalus_44787"
[106] "P_melanocephalus_44788" "P_melanocephalus_44789" "P_melanocephalus_44790"
[109] "P_melanocephalus_44791" "P_melanocephalus_44792" "P_melanocephalus_44793"
[112] "P_melanocephalus_44794" "P_melanocephalus_44795" "P_melanocephalus_44798"
[115] "P_melanocephalus_44799" "P_melanocephalus_44801" "P_melanocephalus_44802"
[118] "P_melanocephalus_44803" "P_melanocephalus_44804" "P_melanocephalus_44805"
[121] "P_melanocephalus_44806" "P_melanocephalus_44808" "P_melanocephalus_44809"
[124] "P_melanocephalus_44810" "P_melanocephalus_44837" "P_melanocephalus_44840"
[127] "P_melanocephalus_44843" "P_melanocephalus_44845" "P_melanocephalus_44846"
[130] "P_melanocephalus_44847" "P_melanocephalus_44848" "P_melanocephalus_44853"
[133] "P_melanocephalus_44854" "P_melanocephalus_45175" "P_melanocephalus_45200"
[136] "P_melanocephalus_45232" "P_melanocephalus_45709" "P_melanocephalus_45926"
@ind.names<-gsub("P_hybrid_","hyb", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","lud", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","mel", gen@ind.names)
gen@ind.names gen
[1] "hyb44696" "hyb44703" "hyb44707" "hyb44708" "hyb44709" "hyb44712"
[7] "hyb44762" "hyb44771" "hyb44781" "hyb45171" "hyb45173" "hyb45174"
[13] "lud11998" "lud21721" "lud25286" "lud26595" "lud33988" "lud34776"
[19] "lud34779" "lud34782" "lud34830" "lud44704" "lud44705" "lud44706"
[25] "lud44710" "lud44711" "lud44713" "lud44714" "lud44715" "lud44716"
[31] "lud44719" "lud44720" "lud44722" "lud44723" "lud44726" "lud44727"
[37] "lud44729" "lud44730" "lud44731" "lud44732" "lud44733" "lud44734"
[43] "lud44737" "lud44738" "lud44739" "lud44740" "lud44741" "lud44742"
[49] "lud44743" "lud44744" "lud44745" "lud44746" "lud44747" "lud44748"
[55] "lud44749" "lud44753" "lud44754" "lud44761" "lud44775" "mel34890"
[61] "mel43110" "mel43276" "mel44346" "mel44347" "mel44471" "mel44651"
[67] "mel44660" "mel44666" "mel44676" "mel44677" "mel44678" "mel44679"
[73] "mel44680" "mel44681" "mel44683" "mel44684" "mel44685" "mel44686"
[79] "mel44687" "mel44688" "mel44689" "mel44692" "mel44693" "mel44694"
[85] "mel44695" "mel44697" "mel44699" "mel44752" "mel44760" "mel44763"
[91] "mel44764" "mel44765" "mel44766" "mel44769" "mel44770" "mel44772"
[97] "mel44773" "mel44774" "mel44776" "mel44777" "mel44778" "mel44779"
[103] "mel44780" "mel44782" "mel44787" "mel44788" "mel44789" "mel44790"
[109] "mel44791" "mel44792" "mel44793" "mel44794" "mel44795" "mel44798"
[115] "mel44799" "mel44801" "mel44802" "mel44803" "mel44804" "mel44805"
[121] "mel44806" "mel44808" "mel44809" "mel44810" "mel44837" "mel44840"
[127] "mel44843" "mel44845" "mel44846" "mel44847" "mel44848" "mel44853"
[133] "mel44854" "mel45175" "mel45200" "mel45232" "mel45709" "mel45926"
pop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
@pop<-as.factor(gen@ind.names)
gen#generate pairwise divergence matrix
<- stamppNeisD(gen, pop = FALSE)
sample.div #export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.rad/grosbeak.70.splits.txt")
::include_graphics(c("/Users/devonderaad/Desktop/grosbeak.rad/70.splits.png")) knitr
#test 80% completeness threshold
<-missing_by_snp(vcfR.trim, cutoff=.8) filt
cutoff is specified, filtered vcfR object will be returned
76.99% of SNPs fell below a completeness cutoff of 0.8 and were removed from the VCF
filt
***** Object of Class vcfR *****
138 samples
1159 CHROMs
67,091 variants
Object size: 644.8 Mb
5.64 percent missing data
***** ***** *****
#convert to genlight
<-vcfR2genlight(filt)
gen#fix sample names to fit in <= 10 characters
@ind.names gen
[1] "P_hybrid_44696" "P_hybrid_44703" "P_hybrid_44707"
[4] "P_hybrid_44708" "P_hybrid_44709" "P_hybrid_44712"
[7] "P_hybrid_44762" "P_hybrid_44771" "P_hybrid_44781"
[10] "P_hybrid_45171" "P_hybrid_45173" "P_hybrid_45174"
[13] "P_ludovicianus_11998" "P_ludovicianus_21721" "P_ludovicianus_25286"
[16] "P_ludovicianus_26595" "P_ludovicianus_33988" "P_ludovicianus_34776"
[19] "P_ludovicianus_34779" "P_ludovicianus_34782" "P_ludovicianus_34830"
[22] "P_ludovicianus_44704" "P_ludovicianus_44705" "P_ludovicianus_44706"
[25] "P_ludovicianus_44710" "P_ludovicianus_44711" "P_ludovicianus_44713"
[28] "P_ludovicianus_44714" "P_ludovicianus_44715" "P_ludovicianus_44716"
[31] "P_ludovicianus_44719" "P_ludovicianus_44720" "P_ludovicianus_44722"
[34] "P_ludovicianus_44723" "P_ludovicianus_44726" "P_ludovicianus_44727"
[37] "P_ludovicianus_44729" "P_ludovicianus_44730" "P_ludovicianus_44731"
[40] "P_ludovicianus_44732" "P_ludovicianus_44733" "P_ludovicianus_44734"
[43] "P_ludovicianus_44737" "P_ludovicianus_44738" "P_ludovicianus_44739"
[46] "P_ludovicianus_44740" "P_ludovicianus_44741" "P_ludovicianus_44742"
[49] "P_ludovicianus_44743" "P_ludovicianus_44744" "P_ludovicianus_44745"
[52] "P_ludovicianus_44746" "P_ludovicianus_44747" "P_ludovicianus_44748"
[55] "P_ludovicianus_44749" "P_ludovicianus_44753" "P_ludovicianus_44754"
[58] "P_ludovicianus_44761" "P_ludovicianus_44775" "P_melanocephalus_34890"
[61] "P_melanocephalus_43110" "P_melanocephalus_43276" "P_melanocephalus_44346"
[64] "P_melanocephalus_44347" "P_melanocephalus_44471" "P_melanocephalus_44651"
[67] "P_melanocephalus_44660" "P_melanocephalus_44666" "P_melanocephalus_44676"
[70] "P_melanocephalus_44677" "P_melanocephalus_44678" "P_melanocephalus_44679"
[73] "P_melanocephalus_44680" "P_melanocephalus_44681" "P_melanocephalus_44683"
[76] "P_melanocephalus_44684" "P_melanocephalus_44685" "P_melanocephalus_44686"
[79] "P_melanocephalus_44687" "P_melanocephalus_44688" "P_melanocephalus_44689"
[82] "P_melanocephalus_44692" "P_melanocephalus_44693" "P_melanocephalus_44694"
[85] "P_melanocephalus_44695" "P_melanocephalus_44697" "P_melanocephalus_44699"
[88] "P_melanocephalus_44752" "P_melanocephalus_44760" "P_melanocephalus_44763"
[91] "P_melanocephalus_44764" "P_melanocephalus_44765" "P_melanocephalus_44766"
[94] "P_melanocephalus_44769" "P_melanocephalus_44770" "P_melanocephalus_44772"
[97] "P_melanocephalus_44773" "P_melanocephalus_44774" "P_melanocephalus_44776"
[100] "P_melanocephalus_44777" "P_melanocephalus_44778" "P_melanocephalus_44779"
[103] "P_melanocephalus_44780" "P_melanocephalus_44782" "P_melanocephalus_44787"
[106] "P_melanocephalus_44788" "P_melanocephalus_44789" "P_melanocephalus_44790"
[109] "P_melanocephalus_44791" "P_melanocephalus_44792" "P_melanocephalus_44793"
[112] "P_melanocephalus_44794" "P_melanocephalus_44795" "P_melanocephalus_44798"
[115] "P_melanocephalus_44799" "P_melanocephalus_44801" "P_melanocephalus_44802"
[118] "P_melanocephalus_44803" "P_melanocephalus_44804" "P_melanocephalus_44805"
[121] "P_melanocephalus_44806" "P_melanocephalus_44808" "P_melanocephalus_44809"
[124] "P_melanocephalus_44810" "P_melanocephalus_44837" "P_melanocephalus_44840"
[127] "P_melanocephalus_44843" "P_melanocephalus_44845" "P_melanocephalus_44846"
[130] "P_melanocephalus_44847" "P_melanocephalus_44848" "P_melanocephalus_44853"
[133] "P_melanocephalus_44854" "P_melanocephalus_45175" "P_melanocephalus_45200"
[136] "P_melanocephalus_45232" "P_melanocephalus_45709" "P_melanocephalus_45926"
@ind.names<-gsub("P_hybrid_","hyb", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","lud", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","mel", gen@ind.names)
gen@ind.names gen
[1] "hyb44696" "hyb44703" "hyb44707" "hyb44708" "hyb44709" "hyb44712"
[7] "hyb44762" "hyb44771" "hyb44781" "hyb45171" "hyb45173" "hyb45174"
[13] "lud11998" "lud21721" "lud25286" "lud26595" "lud33988" "lud34776"
[19] "lud34779" "lud34782" "lud34830" "lud44704" "lud44705" "lud44706"
[25] "lud44710" "lud44711" "lud44713" "lud44714" "lud44715" "lud44716"
[31] "lud44719" "lud44720" "lud44722" "lud44723" "lud44726" "lud44727"
[37] "lud44729" "lud44730" "lud44731" "lud44732" "lud44733" "lud44734"
[43] "lud44737" "lud44738" "lud44739" "lud44740" "lud44741" "lud44742"
[49] "lud44743" "lud44744" "lud44745" "lud44746" "lud44747" "lud44748"
[55] "lud44749" "lud44753" "lud44754" "lud44761" "lud44775" "mel34890"
[61] "mel43110" "mel43276" "mel44346" "mel44347" "mel44471" "mel44651"
[67] "mel44660" "mel44666" "mel44676" "mel44677" "mel44678" "mel44679"
[73] "mel44680" "mel44681" "mel44683" "mel44684" "mel44685" "mel44686"
[79] "mel44687" "mel44688" "mel44689" "mel44692" "mel44693" "mel44694"
[85] "mel44695" "mel44697" "mel44699" "mel44752" "mel44760" "mel44763"
[91] "mel44764" "mel44765" "mel44766" "mel44769" "mel44770" "mel44772"
[97] "mel44773" "mel44774" "mel44776" "mel44777" "mel44778" "mel44779"
[103] "mel44780" "mel44782" "mel44787" "mel44788" "mel44789" "mel44790"
[109] "mel44791" "mel44792" "mel44793" "mel44794" "mel44795" "mel44798"
[115] "mel44799" "mel44801" "mel44802" "mel44803" "mel44804" "mel44805"
[121] "mel44806" "mel44808" "mel44809" "mel44810" "mel44837" "mel44840"
[127] "mel44843" "mel44845" "mel44846" "mel44847" "mel44848" "mel44853"
[133] "mel44854" "mel45175" "mel45200" "mel45232" "mel45709" "mel45926"
pop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
@pop<-as.factor(gen@ind.names)
gen#generate pairwise divergence matrix
<- stamppNeisD(gen, pop = FALSE)
sample.div #export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.rad/grosbeak.80.splits.txt")
::include_graphics(c("/Users/devonderaad/Desktop/grosbeak.rad/80.splits.png")) knitr
#repeat with 90% completeness threshold
<-missing_by_snp(vcfR.trim, cutoff=.9) filt
cutoff is specified, filtered vcfR object will be returned
82.3% of SNPs fell below a completeness cutoff of 0.9 and were removed from the VCF
filt
***** Object of Class vcfR *****
138 samples
1027 CHROMs
51,595 variants
Object size: 532.6 Mb
2.987 percent missing data
***** ***** *****
#convert to genlight
<-vcfR2genlight(filt)
gen#fix sample names to fit in <= 10 characters
@ind.names gen
[1] "P_hybrid_44696" "P_hybrid_44703" "P_hybrid_44707"
[4] "P_hybrid_44708" "P_hybrid_44709" "P_hybrid_44712"
[7] "P_hybrid_44762" "P_hybrid_44771" "P_hybrid_44781"
[10] "P_hybrid_45171" "P_hybrid_45173" "P_hybrid_45174"
[13] "P_ludovicianus_11998" "P_ludovicianus_21721" "P_ludovicianus_25286"
[16] "P_ludovicianus_26595" "P_ludovicianus_33988" "P_ludovicianus_34776"
[19] "P_ludovicianus_34779" "P_ludovicianus_34782" "P_ludovicianus_34830"
[22] "P_ludovicianus_44704" "P_ludovicianus_44705" "P_ludovicianus_44706"
[25] "P_ludovicianus_44710" "P_ludovicianus_44711" "P_ludovicianus_44713"
[28] "P_ludovicianus_44714" "P_ludovicianus_44715" "P_ludovicianus_44716"
[31] "P_ludovicianus_44719" "P_ludovicianus_44720" "P_ludovicianus_44722"
[34] "P_ludovicianus_44723" "P_ludovicianus_44726" "P_ludovicianus_44727"
[37] "P_ludovicianus_44729" "P_ludovicianus_44730" "P_ludovicianus_44731"
[40] "P_ludovicianus_44732" "P_ludovicianus_44733" "P_ludovicianus_44734"
[43] "P_ludovicianus_44737" "P_ludovicianus_44738" "P_ludovicianus_44739"
[46] "P_ludovicianus_44740" "P_ludovicianus_44741" "P_ludovicianus_44742"
[49] "P_ludovicianus_44743" "P_ludovicianus_44744" "P_ludovicianus_44745"
[52] "P_ludovicianus_44746" "P_ludovicianus_44747" "P_ludovicianus_44748"
[55] "P_ludovicianus_44749" "P_ludovicianus_44753" "P_ludovicianus_44754"
[58] "P_ludovicianus_44761" "P_ludovicianus_44775" "P_melanocephalus_34890"
[61] "P_melanocephalus_43110" "P_melanocephalus_43276" "P_melanocephalus_44346"
[64] "P_melanocephalus_44347" "P_melanocephalus_44471" "P_melanocephalus_44651"
[67] "P_melanocephalus_44660" "P_melanocephalus_44666" "P_melanocephalus_44676"
[70] "P_melanocephalus_44677" "P_melanocephalus_44678" "P_melanocephalus_44679"
[73] "P_melanocephalus_44680" "P_melanocephalus_44681" "P_melanocephalus_44683"
[76] "P_melanocephalus_44684" "P_melanocephalus_44685" "P_melanocephalus_44686"
[79] "P_melanocephalus_44687" "P_melanocephalus_44688" "P_melanocephalus_44689"
[82] "P_melanocephalus_44692" "P_melanocephalus_44693" "P_melanocephalus_44694"
[85] "P_melanocephalus_44695" "P_melanocephalus_44697" "P_melanocephalus_44699"
[88] "P_melanocephalus_44752" "P_melanocephalus_44760" "P_melanocephalus_44763"
[91] "P_melanocephalus_44764" "P_melanocephalus_44765" "P_melanocephalus_44766"
[94] "P_melanocephalus_44769" "P_melanocephalus_44770" "P_melanocephalus_44772"
[97] "P_melanocephalus_44773" "P_melanocephalus_44774" "P_melanocephalus_44776"
[100] "P_melanocephalus_44777" "P_melanocephalus_44778" "P_melanocephalus_44779"
[103] "P_melanocephalus_44780" "P_melanocephalus_44782" "P_melanocephalus_44787"
[106] "P_melanocephalus_44788" "P_melanocephalus_44789" "P_melanocephalus_44790"
[109] "P_melanocephalus_44791" "P_melanocephalus_44792" "P_melanocephalus_44793"
[112] "P_melanocephalus_44794" "P_melanocephalus_44795" "P_melanocephalus_44798"
[115] "P_melanocephalus_44799" "P_melanocephalus_44801" "P_melanocephalus_44802"
[118] "P_melanocephalus_44803" "P_melanocephalus_44804" "P_melanocephalus_44805"
[121] "P_melanocephalus_44806" "P_melanocephalus_44808" "P_melanocephalus_44809"
[124] "P_melanocephalus_44810" "P_melanocephalus_44837" "P_melanocephalus_44840"
[127] "P_melanocephalus_44843" "P_melanocephalus_44845" "P_melanocephalus_44846"
[130] "P_melanocephalus_44847" "P_melanocephalus_44848" "P_melanocephalus_44853"
[133] "P_melanocephalus_44854" "P_melanocephalus_45175" "P_melanocephalus_45200"
[136] "P_melanocephalus_45232" "P_melanocephalus_45709" "P_melanocephalus_45926"
@ind.names<-gsub("P_hybrid_","hyb", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","lud", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","mel", gen@ind.names)
gen@ind.names gen
[1] "hyb44696" "hyb44703" "hyb44707" "hyb44708" "hyb44709" "hyb44712"
[7] "hyb44762" "hyb44771" "hyb44781" "hyb45171" "hyb45173" "hyb45174"
[13] "lud11998" "lud21721" "lud25286" "lud26595" "lud33988" "lud34776"
[19] "lud34779" "lud34782" "lud34830" "lud44704" "lud44705" "lud44706"
[25] "lud44710" "lud44711" "lud44713" "lud44714" "lud44715" "lud44716"
[31] "lud44719" "lud44720" "lud44722" "lud44723" "lud44726" "lud44727"
[37] "lud44729" "lud44730" "lud44731" "lud44732" "lud44733" "lud44734"
[43] "lud44737" "lud44738" "lud44739" "lud44740" "lud44741" "lud44742"
[49] "lud44743" "lud44744" "lud44745" "lud44746" "lud44747" "lud44748"
[55] "lud44749" "lud44753" "lud44754" "lud44761" "lud44775" "mel34890"
[61] "mel43110" "mel43276" "mel44346" "mel44347" "mel44471" "mel44651"
[67] "mel44660" "mel44666" "mel44676" "mel44677" "mel44678" "mel44679"
[73] "mel44680" "mel44681" "mel44683" "mel44684" "mel44685" "mel44686"
[79] "mel44687" "mel44688" "mel44689" "mel44692" "mel44693" "mel44694"
[85] "mel44695" "mel44697" "mel44699" "mel44752" "mel44760" "mel44763"
[91] "mel44764" "mel44765" "mel44766" "mel44769" "mel44770" "mel44772"
[97] "mel44773" "mel44774" "mel44776" "mel44777" "mel44778" "mel44779"
[103] "mel44780" "mel44782" "mel44787" "mel44788" "mel44789" "mel44790"
[109] "mel44791" "mel44792" "mel44793" "mel44794" "mel44795" "mel44798"
[115] "mel44799" "mel44801" "mel44802" "mel44803" "mel44804" "mel44805"
[121] "mel44806" "mel44808" "mel44809" "mel44810" "mel44837" "mel44840"
[127] "mel44843" "mel44845" "mel44846" "mel44847" "mel44848" "mel44853"
[133] "mel44854" "mel45175" "mel45200" "mel45232" "mel45709" "mel45926"
pop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
@pop<-as.factor(gen@ind.names)
gen#generate pairwise divergence matrix
<- stamppNeisD(gen, pop = FALSE)
sample.div #export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.rad/grosbeak.90.splits.txt")
::include_graphics(c("/Users/devonderaad/Desktop/grosbeak.rad/90.splits.png")) knitr
#repeat with 100% completeness threshold
<-missing_by_snp(vcfR.trim, cutoff=1) filt
cutoff is specified, filtered vcfR object will be returned
96.77% of SNPs fell below a completeness cutoff of 1 and were removed from the VCF
filt
***** Object of Class vcfR *****
138 samples
574 CHROMs
9,417 variants
Object size: 112.3 Mb
0 percent missing data
***** ***** *****
#convert to genlight
<-vcfR2genlight(filt)
gen#fix sample names to fit in <= 10 characters
@ind.names gen
[1] "P_hybrid_44696" "P_hybrid_44703" "P_hybrid_44707"
[4] "P_hybrid_44708" "P_hybrid_44709" "P_hybrid_44712"
[7] "P_hybrid_44762" "P_hybrid_44771" "P_hybrid_44781"
[10] "P_hybrid_45171" "P_hybrid_45173" "P_hybrid_45174"
[13] "P_ludovicianus_11998" "P_ludovicianus_21721" "P_ludovicianus_25286"
[16] "P_ludovicianus_26595" "P_ludovicianus_33988" "P_ludovicianus_34776"
[19] "P_ludovicianus_34779" "P_ludovicianus_34782" "P_ludovicianus_34830"
[22] "P_ludovicianus_44704" "P_ludovicianus_44705" "P_ludovicianus_44706"
[25] "P_ludovicianus_44710" "P_ludovicianus_44711" "P_ludovicianus_44713"
[28] "P_ludovicianus_44714" "P_ludovicianus_44715" "P_ludovicianus_44716"
[31] "P_ludovicianus_44719" "P_ludovicianus_44720" "P_ludovicianus_44722"
[34] "P_ludovicianus_44723" "P_ludovicianus_44726" "P_ludovicianus_44727"
[37] "P_ludovicianus_44729" "P_ludovicianus_44730" "P_ludovicianus_44731"
[40] "P_ludovicianus_44732" "P_ludovicianus_44733" "P_ludovicianus_44734"
[43] "P_ludovicianus_44737" "P_ludovicianus_44738" "P_ludovicianus_44739"
[46] "P_ludovicianus_44740" "P_ludovicianus_44741" "P_ludovicianus_44742"
[49] "P_ludovicianus_44743" "P_ludovicianus_44744" "P_ludovicianus_44745"
[52] "P_ludovicianus_44746" "P_ludovicianus_44747" "P_ludovicianus_44748"
[55] "P_ludovicianus_44749" "P_ludovicianus_44753" "P_ludovicianus_44754"
[58] "P_ludovicianus_44761" "P_ludovicianus_44775" "P_melanocephalus_34890"
[61] "P_melanocephalus_43110" "P_melanocephalus_43276" "P_melanocephalus_44346"
[64] "P_melanocephalus_44347" "P_melanocephalus_44471" "P_melanocephalus_44651"
[67] "P_melanocephalus_44660" "P_melanocephalus_44666" "P_melanocephalus_44676"
[70] "P_melanocephalus_44677" "P_melanocephalus_44678" "P_melanocephalus_44679"
[73] "P_melanocephalus_44680" "P_melanocephalus_44681" "P_melanocephalus_44683"
[76] "P_melanocephalus_44684" "P_melanocephalus_44685" "P_melanocephalus_44686"
[79] "P_melanocephalus_44687" "P_melanocephalus_44688" "P_melanocephalus_44689"
[82] "P_melanocephalus_44692" "P_melanocephalus_44693" "P_melanocephalus_44694"
[85] "P_melanocephalus_44695" "P_melanocephalus_44697" "P_melanocephalus_44699"
[88] "P_melanocephalus_44752" "P_melanocephalus_44760" "P_melanocephalus_44763"
[91] "P_melanocephalus_44764" "P_melanocephalus_44765" "P_melanocephalus_44766"
[94] "P_melanocephalus_44769" "P_melanocephalus_44770" "P_melanocephalus_44772"
[97] "P_melanocephalus_44773" "P_melanocephalus_44774" "P_melanocephalus_44776"
[100] "P_melanocephalus_44777" "P_melanocephalus_44778" "P_melanocephalus_44779"
[103] "P_melanocephalus_44780" "P_melanocephalus_44782" "P_melanocephalus_44787"
[106] "P_melanocephalus_44788" "P_melanocephalus_44789" "P_melanocephalus_44790"
[109] "P_melanocephalus_44791" "P_melanocephalus_44792" "P_melanocephalus_44793"
[112] "P_melanocephalus_44794" "P_melanocephalus_44795" "P_melanocephalus_44798"
[115] "P_melanocephalus_44799" "P_melanocephalus_44801" "P_melanocephalus_44802"
[118] "P_melanocephalus_44803" "P_melanocephalus_44804" "P_melanocephalus_44805"
[121] "P_melanocephalus_44806" "P_melanocephalus_44808" "P_melanocephalus_44809"
[124] "P_melanocephalus_44810" "P_melanocephalus_44837" "P_melanocephalus_44840"
[127] "P_melanocephalus_44843" "P_melanocephalus_44845" "P_melanocephalus_44846"
[130] "P_melanocephalus_44847" "P_melanocephalus_44848" "P_melanocephalus_44853"
[133] "P_melanocephalus_44854" "P_melanocephalus_45175" "P_melanocephalus_45200"
[136] "P_melanocephalus_45232" "P_melanocephalus_45709" "P_melanocephalus_45926"
@ind.names<-gsub("P_hybrid_","hyb", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","lud", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","mel", gen@ind.names)
gen@ind.names gen
[1] "hyb44696" "hyb44703" "hyb44707" "hyb44708" "hyb44709" "hyb44712"
[7] "hyb44762" "hyb44771" "hyb44781" "hyb45171" "hyb45173" "hyb45174"
[13] "lud11998" "lud21721" "lud25286" "lud26595" "lud33988" "lud34776"
[19] "lud34779" "lud34782" "lud34830" "lud44704" "lud44705" "lud44706"
[25] "lud44710" "lud44711" "lud44713" "lud44714" "lud44715" "lud44716"
[31] "lud44719" "lud44720" "lud44722" "lud44723" "lud44726" "lud44727"
[37] "lud44729" "lud44730" "lud44731" "lud44732" "lud44733" "lud44734"
[43] "lud44737" "lud44738" "lud44739" "lud44740" "lud44741" "lud44742"
[49] "lud44743" "lud44744" "lud44745" "lud44746" "lud44747" "lud44748"
[55] "lud44749" "lud44753" "lud44754" "lud44761" "lud44775" "mel34890"
[61] "mel43110" "mel43276" "mel44346" "mel44347" "mel44471" "mel44651"
[67] "mel44660" "mel44666" "mel44676" "mel44677" "mel44678" "mel44679"
[73] "mel44680" "mel44681" "mel44683" "mel44684" "mel44685" "mel44686"
[79] "mel44687" "mel44688" "mel44689" "mel44692" "mel44693" "mel44694"
[85] "mel44695" "mel44697" "mel44699" "mel44752" "mel44760" "mel44763"
[91] "mel44764" "mel44765" "mel44766" "mel44769" "mel44770" "mel44772"
[97] "mel44773" "mel44774" "mel44776" "mel44777" "mel44778" "mel44779"
[103] "mel44780" "mel44782" "mel44787" "mel44788" "mel44789" "mel44790"
[109] "mel44791" "mel44792" "mel44793" "mel44794" "mel44795" "mel44798"
[115] "mel44799" "mel44801" "mel44802" "mel44803" "mel44804" "mel44805"
[121] "mel44806" "mel44808" "mel44809" "mel44810" "mel44837" "mel44840"
[127] "mel44843" "mel44845" "mel44846" "mel44847" "mel44848" "mel44853"
[133] "mel44854" "mel45175" "mel45200" "mel45232" "mel45709" "mel45926"
pop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
@pop<-as.factor(gen@ind.names)
gen#generate pairwise divergence matrix
<- stamppNeisD(gen, pop = FALSE)
sample.div #export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.rad/grosbeak.100.splits.txt")
::include_graphics(c("/Users/devonderaad/Desktop/grosbeak.rad/100.splits.png")) knitr
Missing data does not appear to be driving sample placement, so I’m going to implement a 90% per-SNP completeness cutoff, which seems like a good trade-off.
#set 90% completeness per SNP threshold
<-missing_by_snp(vcfR.trim, cutoff=.90) filt
cutoff is specified, filtered vcfR object will be returned
82.3% of SNPs fell below a completeness cutoff of 0.9 and were removed from the VCF
#print info on the resulting dataset
filt
***** Object of Class vcfR *****
138 samples
1027 CHROMs
51,595 variants
Object size: 532.6 Mb
2.987 percent missing data
***** ***** *****
#get per sample missing data info
<-missing_by_sample(filt) bysamp
No popmap provided
#print in descending order by missing data
$unfiltered.stats[order(bysamp$unfiltered.stats[,2]),] bysamp
samples prop.missing mean.depth
P_melanocephalus_43110 P_melanocephalus_43110 0.002597151 204.393463
P_melanocephalus_44792 P_melanocephalus_44792 0.003391802 226.417075
P_melanocephalus_44840 P_melanocephalus_44840 0.003469328 87.621382
P_melanocephalus_44808 P_melanocephalus_44808 0.003740673 68.118750
P_melanocephalus_43276 P_melanocephalus_43276 0.004108925 70.563474
P_melanocephalus_44695 P_melanocephalus_44695 0.004108925 290.195571
P_melanocephalus_44853 P_melanocephalus_44853 0.004457796 114.973776
P_melanocephalus_44680 P_melanocephalus_44680 0.004477178 94.089985
P_melanocephalus_44809 P_melanocephalus_44809 0.004554705 129.607009
P_melanocephalus_44778 P_melanocephalus_44778 0.004593468 47.364695
P_melanocephalus_44685 P_melanocephalus_44685 0.004632232 172.368370
P_melanocephalus_44791 P_melanocephalus_44791 0.004729140 95.780822
P_melanocephalus_44788 P_melanocephalus_44788 0.004806667 107.910978
P_melanocephalus_44795 P_melanocephalus_44795 0.004826049 49.157072
P_melanocephalus_44764 P_melanocephalus_44764 0.004922958 98.850704
P_melanocephalus_44804 P_melanocephalus_44804 0.005058630 167.689562
P_melanocephalus_44782 P_melanocephalus_44782 0.005271829 374.030415
P_melanocephalus_44805 P_melanocephalus_44805 0.005310592 166.366458
P_melanocephalus_44847 P_melanocephalus_44847 0.005329974 92.838504
P_hybrid_45174 P_hybrid_45174 0.005349356 83.837896
P_melanocephalus_44787 P_melanocephalus_44787 0.005504409 172.190271
P_melanocephalus_44773 P_melanocephalus_44773 0.005543173 94.823579
P_hybrid_44771 P_hybrid_44771 0.005562555 103.525279
P_melanocephalus_44803 P_melanocephalus_44803 0.005620700 82.220836
P_melanocephalus_44810 P_melanocephalus_44810 0.005698227 77.205025
P_melanocephalus_44770 P_melanocephalus_44770 0.005736990 151.095051
P_melanocephalus_44845 P_melanocephalus_44845 0.005795135 517.445961
P_melanocephalus_44677 P_melanocephalus_44677 0.005833899 154.985905
P_melanocephalus_44766 P_melanocephalus_44766 0.005950189 179.453732
P_melanocephalus_44794 P_melanocephalus_44794 0.006027716 61.180115
P_melanocephalus_44765 P_melanocephalus_44765 0.006144006 50.390304
P_melanocephalus_44793 P_melanocephalus_44793 0.006279678 67.063740
P_melanocephalus_44683 P_melanocephalus_44683 0.006492877 53.292860
P_melanocephalus_44684 P_melanocephalus_44684 0.006531641 66.356003
P_melanocephalus_44776 P_melanocephalus_44776 0.006725458 109.958203
P_hybrid_44696 P_hybrid_44696 0.006861130 175.130481
P_melanocephalus_45200 P_melanocephalus_45200 0.006938657 110.175674
P_melanocephalus_44774 P_melanocephalus_44774 0.006977420 73.555831
P_melanocephalus_44660 P_melanocephalus_44660 0.007054947 158.128399
P_melanocephalus_44780 P_melanocephalus_44780 0.007054947 55.545510
P_hybrid_45171 P_hybrid_45171 0.007190619 387.530572
P_melanocephalus_44681 P_melanocephalus_44681 0.007229383 647.700656
P_melanocephalus_44692 P_melanocephalus_44692 0.007403818 56.540371
P_melanocephalus_44763 P_melanocephalus_44763 0.007520109 567.633038
P_melanocephalus_44679 P_melanocephalus_44679 0.007539490 75.670019
P_melanocephalus_44799 P_melanocephalus_44799 0.007558872 178.310048
P_hybrid_45173 P_hybrid_45173 0.007810834 47.576262
P_melanocephalus_45926 P_melanocephalus_45926 0.007849598 86.966029
P_melanocephalus_44752 P_melanocephalus_44752 0.007868980 524.343941
P_melanocephalus_44843 P_melanocephalus_44843 0.008004652 104.359638
P_melanocephalus_44798 P_melanocephalus_44798 0.008043415 56.271356
P_melanocephalus_44769 P_melanocephalus_44769 0.008198469 590.913038
P_melanocephalus_44848 P_melanocephalus_44848 0.008275996 348.373886
P_melanocephalus_44678 P_melanocephalus_44678 0.008295377 43.497586
P_melanocephalus_44789 P_melanocephalus_44789 0.008334141 42.518186
P_melanocephalus_44846 P_melanocephalus_44846 0.008334141 52.243819
P_melanocephalus_44837 P_melanocephalus_44837 0.008353523 147.221425
P_melanocephalus_44666 P_melanocephalus_44666 0.008372904 162.687528
P_melanocephalus_44854 P_melanocephalus_44854 0.008489195 123.896808
P_ludovicianus_34830 P_ludovicianus_34830 0.008915593 105.055402
P_melanocephalus_45709 P_melanocephalus_45709 0.009051265 33.473068
P_melanocephalus_44790 P_melanocephalus_44790 0.009477663 56.371620
P_melanocephalus_44688 P_melanocephalus_44688 0.009574571 456.008884
P_ludovicianus_44741 P_ludovicianus_44741 0.009632716 243.295824
P_melanocephalus_44777 P_melanocephalus_44777 0.009865297 512.092472
P_hybrid_44708 P_hybrid_44708 0.009962206 125.645661
P_ludovicianus_44705 P_ludovicianus_44705 0.010039733 204.137224
P_ludovicianus_44746 P_ludovicianus_44746 0.010117259 69.433086
P_melanocephalus_44689 P_melanocephalus_44689 0.010407985 40.462905
P_melanocephalus_44772 P_melanocephalus_44772 0.010873147 42.071011
P_ludovicianus_44710 P_ludovicianus_44710 0.011435217 71.856936
P_ludovicianus_44727 P_ludovicianus_44727 0.011629034 125.305599
P_ludovicianus_44753 P_ludovicianus_44753 0.011880996 85.525774
P_ludovicianus_44775 P_ludovicianus_44775 0.011900378 123.049842
P_ludovicianus_44738 P_ludovicianus_44738 0.012249249 75.030689
P_ludovicianus_44732 P_ludovicianus_44732 0.012404303 95.953057
P_ludovicianus_44726 P_ludovicianus_44726 0.012675647 360.327339
P_hybrid_44709 P_hybrid_44709 0.012753174 80.046881
P_ludovicianus_44745 P_ludovicianus_44745 0.012753174 433.571412
P_ludovicianus_44734 P_ludovicianus_44734 0.012869464 463.937288
P_ludovicianus_44743 P_ludovicianus_44743 0.012888846 134.471844
P_hybrid_44707 P_hybrid_44707 0.013140808 402.861755
P_ludovicianus_21721 P_ludovicianus_21721 0.013276480 96.160322
P_ludovicianus_44722 P_ludovicianus_44722 0.013373389 57.636912
P_ludovicianus_34782 P_ludovicianus_34782 0.013509061 106.722641
P_ludovicianus_11998 P_ludovicianus_11998 0.013567206 48.439768
P_ludovicianus_44706 P_ludovicianus_44706 0.013605970 74.769143
P_melanocephalus_44760 P_melanocephalus_44760 0.014885163 28.435280
P_ludovicianus_44740 P_ludovicianus_44740 0.014904545 184.978023
P_ludovicianus_44754 P_ludovicianus_44754 0.015137126 267.301236
P_ludovicianus_44761 P_ludovicianus_44761 0.015214653 51.548849
P_ludovicianus_44719 P_ludovicianus_44719 0.015311561 91.095935
P_ludovicianus_44715 P_ludovicianus_44715 0.015330943 91.889300
P_ludovicianus_44730 P_ludovicianus_44730 0.016009303 72.246706
P_melanocephalus_44694 P_melanocephalus_44694 0.016416319 28.608221
P_melanocephalus_44651 P_melanocephalus_44651 0.016435701 25.720417
P_ludovicianus_44714 P_ludovicianus_44714 0.016687664 141.513186
P_ludovicianus_44747 P_ludovicianus_44747 0.017559841 812.964253
P_melanocephalus_44699 P_melanocephalus_44699 0.020040702 37.005498
P_ludovicianus_44739 P_ludovicianus_44739 0.023238686 47.610941
P_melanocephalus_44347 P_melanocephalus_44347 0.023858901 24.680923
P_ludovicianus_44729 P_ludovicianus_44729 0.025235003 39.517070
P_ludovicianus_44713 P_ludovicianus_44713 0.028084117 29.779304
P_hybrid_44781 P_hybrid_44781 0.029634655 27.190029
P_melanocephalus_44801 P_melanocephalus_44801 0.031495300 23.168841
P_hybrid_44703 P_hybrid_44703 0.031650354 24.484168
P_melanocephalus_34890 P_melanocephalus_34890 0.032890784 22.654816
P_ludovicianus_33988 P_ludovicianus_33988 0.033045838 29.079234
P_ludovicianus_44749 P_ludovicianus_44749 0.033452854 44.434839
P_melanocephalus_44687 P_melanocephalus_44687 0.033666053 22.789141
P_ludovicianus_44711 P_ludovicianus_44711 0.035953096 23.613430
P_ludovicianus_44731 P_ludovicianus_44731 0.037019091 25.810104
P_ludovicianus_44744 P_ludovicianus_44744 0.037348580 26.594407
P_ludovicianus_44716 P_ludovicianus_44716 0.045391995 19.060098
P_melanocephalus_44686 P_melanocephalus_44686 0.052912104 20.753689
P_melanocephalus_44693 P_melanocephalus_44693 0.053338502 16.082919
P_melanocephalus_44346 P_melanocephalus_44346 0.055315438 21.470815
P_melanocephalus_44697 P_melanocephalus_44697 0.057350518 16.636874
P_melanocephalus_44676 P_melanocephalus_44676 0.063455761 16.074522
P_ludovicianus_34776 P_ludovicianus_34776 0.070801434 19.425493
P_melanocephalus_45232 P_melanocephalus_45232 0.071227832 18.169887
P_ludovicianus_26595 P_ludovicianus_26595 0.076596569 17.785656
P_ludovicianus_44737 P_ludovicianus_44737 0.081034984 16.831632
P_melanocephalus_45175 P_melanocephalus_45175 0.093284233 15.251079
P_ludovicianus_44723 P_ludovicianus_44723 0.106948348 13.635827
P_ludovicianus_44720 P_ludovicianus_44720 0.117046225 16.037097
P_ludovicianus_44733 P_ludovicianus_44733 0.121193914 13.505867
P_melanocephalus_44471 P_melanocephalus_44471 0.126872759 13.385802
P_melanocephalus_44779 P_melanocephalus_44779 0.131660045 13.197982
P_ludovicianus_44748 P_ludovicianus_44748 0.137222599 13.018848
P_melanocephalus_44806 P_melanocephalus_44806 0.141312143 12.202623
P_ludovicianus_44704 P_ludovicianus_44704 0.147553057 11.792029
P_ludovicianus_25286 P_ludovicianus_25286 0.148909778 11.464588
P_ludovicianus_34779 P_ludovicianus_34779 0.149665665 13.078271
P_hybrid_44762 P_hybrid_44762 0.157844752 10.886056
P_hybrid_44712 P_hybrid_44712 0.175598411 13.123052
P_melanocephalus_44802 P_melanocephalus_44802 0.205736990 9.568838
P_ludovicianus_44742 P_ludovicianus_44742 0.214536292 11.198712
#plot depth per snp and per sample
<- extract.gt(filt, element = "DP", as.numeric=TRUE)
dp heatmap.bp(dp, rlabels = FALSE)
#plot genotype quality per snp and per sample
<- extract.gt(filt, element = "GQ", as.numeric=TRUE)
gq heatmap.bp(gq, rlabels = FALSE)
This chunk is not run (eval=FALSE is set in the chunk header) because I ended up deciding that the cutoff is too arbitrary, and reviewers would likely balk at this approach to isolating Z-linked SNPs as untrustworthy. I think this is better left for a future project with a good reference genome and whole genome data. In the meantime, I’ve left this code here as an artifact in case I need to come back to it in the future.
#extract depth matrix
<-extract.gt(filt, element = "DP")
dpmat1:5,1:5] #check
dpmat[<-apply(dpmat, 2, as.numeric) #convert matrix to numeric
dpmat1:5,1:5] #check
dpmat[#read in sample sheet with sample sex info
<-read.csv("~/Desktop/grosbeak.data.csv")
samps<-samps[samps$passed.genomic.filtering == "TRUE",] #retain only samples that passed filtering
samps$sample_id == colnames(dpmat) #check if the dataframe is in the same order as the matrix
samps<-samps[match(colnames(dpmat),samps$sample_id),] #reorder dataframe using the command 'match'
samps$sample_id == colnames(dpmat) #check if this worked
samps#compare sequencing depth between male and female samples
mean(dpmat[,samps$sex=="female"], na.rm = TRUE)
mean(dpmat[,samps$sex=="male"], na.rm = TRUE)
#because male and female sequencing depth is highly uneven, I will need to subsample males to get a set of males with appropriately matched (as close to even as possible) sequencing depth
#apply(dpmat, 2, mean, na.rm=TRUE) #you can use this code to get a list of depth per sample for the next step
#manually (arbitrarily) pick out a subset of male samples that have similar depth to the female samples we sequenced and check that their depth is close to ~68.7x, like the female group shown above
mean(dpmat[,c(4:12,16,50,39,117,83,84,26,137,135,106)], na.rm = TRUE)
#compare average depth between the subset of male and female samples
<-c()
xfor (i in 1:nrow(dpmat)){
<-mean(dpmat[i,samps$sex=="male"], na.rm = TRUE)/mean(dpmat[i,samps$sex=="female"], na.rm = TRUE)
x[i]
}#look at the distribution
hist(x, breaks = 200)
#isolate the peak of the normal distribution that corresponds to the autosomal loci
sort(table(round(x,2)),decreasing = TRUE)
#1.86 is the 'typical' autosomal loci, so we will be looking for SNPs with > 1.8x this ratio (we expect 2x but want to account for noise)
table(x > 3.348)
#isolate X-linked SNPs
<-filt[c(x>3.348),]
xSNPs
xSNPs#convert to genlight
<-vcfR2genlight(xSNPs)
gen#fix sample names to fit in <= 10 characters
@ind.names
gen@ind.names<-gsub("P_hybrid_","hyb", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","lud", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","mel", gen@ind.names)
gen@ind.names
genpop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
@pop<-as.factor(gen@ind.names)
gen#generate pairwise divergence matrix
<- stamppNeisD(gen, pop = FALSE)
sample.div #export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.xSNPs.splits.txt")
::include_graphics(c("/Users/devonderaad/Desktop/z.splits.png"))
knitr
#isolate autosomal SNPs
<-filt[c(x<3.348)]
autoSNPs
autoSNPs#convert to genlight
<-vcfR2genlight(autoSNPs)
gen#fix sample names to fit in <= 10 characters
@ind.names
gen@ind.names<-gsub("P_hybrid_","h", gen@ind.names)
gen@ind.names<-gsub("P_ludovicianus_","l", gen@ind.names)
gen@ind.names<-gsub("P_melanocephalus_","m", gen@ind.names)
gen@ind.names
genpop(gen)<-gen@ind.names
#assign populations (a StaMPP requirement)
@pop<-as.factor(gen@ind.names)
gen#generate pairwise divergence matrix
<- stamppNeisD(gen, pop = FALSE)
sample.div #export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/grosbeak.autoSNPs.splits.txt")
::include_graphics(c("/Users/devonderaad/Desktop/autosomal.splits.png")) knitr
Now filter to get unlinked SNP datasets
#perform linkage filtering to get a reduced vcf with only unlinked SNPs
<-distance_thin(filt, min.distance = 1000) filt.thin
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========= | 14%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================ | 24%
|
|================= | 24%
|
|================= | 25%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|============================== | 44%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|=================================== | 51%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|===================================== | 54%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|========================================== | 61%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================= | 71%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|=================================================== | 74%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|========================================================== | 84%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
3584 out of 51595 input SNPs were not located within 1000 base-pairs of another SNP and were retained despite filtering
#get info for all SNPs passing filtering
filt
***** Object of Class vcfR *****
138 samples
1027 CHROMs
51,595 variants
Object size: 532.6 Mb
2.987 percent missing data
***** ***** *****
#write to disk
::write.vcf(filt, file = "~/Desktop/grosbeak.rad/grosbeak.filtered.snps.vcf.gz")
vcfR
#get info for the thinned SNP dataset
filt.thin
***** Object of Class vcfR *****
138 samples
1027 CHROMs
3,584 variants
Object size: 40.7 Mb
3.403 percent missing data
***** ***** *****
#write to disk
::write.vcf(filt.thin, file = "~/Desktop/grosbeak.rad/grosbeak.filtered.unlinked.snps.vcf.gz") vcfR