计算遗传分化指数(Genetic differentiation
原文链接
Analysis of genome data
主要内容:利用vcf文件计算不同群体之间的遗传分化指数
方法: vcfR (R包) genetic_diff()函数
数据:https://hail.is/docs/0.2/getting_started.html 中提到的 简化版 he public 1000 Genomes dataset
原教程翻译
大部分的种群研究中一个基本的问题就是种群是否具有多样性以及这种多样性是否在种群之间共享(A fundamental question to most population studies is whether populations are diverse and whether this diversity is shared among the populations)。为了解决种群内多样性这个问题遗传学家提出了杂合性(heterozygosity)这个概念(To address the question of within population diversity geneticists typically report heterozygosity)。杂合性(heterozygosity)是指在种群中随机选择两个等位基因它们是不同的概率(This is the probability that two alleles randomly chosen from a opulation will be different)。遗传学家提出Fst和它的类似物来解决种群分化的问题。通过Fst来衡量种群分化最初由Sewal Wright提出。(To address differentiation population geneticists typically utilize Fst or one of its analogues. Population differention measured by Fst was originally proposed by Sewall Wright)。
(未完待续)
代码
library(vcfR)
vcfdata<-read.vcfR("GWAS_practice/1kg.vcf")
length(colnames(vcfdata@gt)[-1])
pop<-read.table("GWAS_practice/data/1kg_annotations.txt",header=T,sep="\t")
colnams(pop)
pop$Population
pop$SuperPopulation
dim(pop)
head(pop)
pops<-data.frame(Sample=character(),
Population = character(),
SuperPopulation = character(),
isFemale = character(),
PurpleHair = character(),
CaffeineConsumption = character())
for (i in colnames(vcfdata@gt)[-1]){
pops <- rbind(pops,pop[which(pop$Sample == i),])
}
dim(pops)
myDiff<-genetic_diff(vcfdata,
pops$SuperPopulation,
method="nei")
dim(myDiff)
colnames(myDiff)
knitr::kable(head(myDiff[,1:13]))
knitr::kable(head(myDiff[,14:17]))
knitr::kable(round(colMeans(myDiff[,c(3:8,14,17)], na.rm = TRUE), digits = 3))
library(reshape2)
library(ggplot2)
dpf <- melt(myDiff[,c(3:7,17)],
varnames=c('Index', 'Sample'),
value.name = 'Depth', na.rm=TRUE)
ggplot(dpf, aes(x=variable, y=Depth)) +
geom_violin(fill="#2ca25f", adjust = 1.2)+
labs(x="",y="")+
theme_bw()
image.png
好像也看不出来种群间有啥区别