XP-CLR选择信号

2021-01-11 本文已影响0人斩毛毛

检测基因组选择信号的方法有很多种，其中 XP-CLR 方法是常用的一种。XP-CLR 是陈华老师、Nick Patterson 和 David Reich 在 2010 年发表的方法，全称叫 the cross-population composite likelihood ratio test（跨群体复合似然比检验），是一种是基于选择扫荡（selective sweeep）的似然方法。
选择扫荡可以增加群体之间的遗传分化，导致等位基因频率偏离中性条件下的预期值。XP-CLR 利用了两个群体之间的多基因座等位基因频率差异（multilocus allele frequency differentiation）建立模型，使用布朗运动来模拟中性下的遗传漂移，并使用确定性模型来近似地对附近的单核苷酸多态性（SNPs）进行选择性扫描。

python版本XP-CLR

python版本XP-CLR，具体可看 https://github.com/hardingnj/xpclr

1 安装

lone this git repository into your working directory and cd.

python setup.py install
Also available via conda via the bioconda channel:

conda create -n xpclr   -c bioconda xpclr

2 输入文件

该软件可接受输入格式为:
(1) hdf5，通过scikit-allel 将vcf转换成该格式，
(2) vcf格式
(3) 或者像之前xpclr中的3种类型格式
具体可以看xpclr-master/fixture/下的示例文件

两个群体的gen 基因型文件，其组成如下,缺失可以用9代替

1 1 0 1 9 9
0 1 1 0 0 1
1 1 0 0 1 0

每2列为一个样本的基因型，以此类推。

snp, 位置信息，其格式如下

rs1081440   9   0.000109    36587 C   T
rs9408752   9   0.000938    91857  A   G
rs2810979   9   0.001323    152695  G   A

每一列以此为：SNP编号(自行定义即可)，染色体，遗传距离(可简单的物理距离/100000000)，SNP位置，Ref, Alt

3 运行

基本参数说明

xpclr  -h
usage: xpclr [-h] --out OUT [--format FORMAT] [--input INPUT]
             [--gdistkey GDISTKEY] [--samplesA SAMPLESA] [--samplesB SAMPLESB]
             [--rrate RRATE] [--map MAP] [--popA POPA] [--popB POPB] --chr
             CHROM [--ld LDCUTOFF] [--phased] [--verbose VERBOSE]
             [--maxsnps MAXSNPS] [--minsnps MINSNPS] [--size SIZE]
             [--start START] [--stop STOP] [--step STEP]

--out: 输出文件
--format： 输入格式，vcf，hdf5，zarr，txt（对应2种基因型，和一个snp位点文件）
--input：输入vcf，或者hdf5， 选择txt时，不选择，
--samplesA： 样本A名称文件； txt时，不选择
--samplesB：样本B名称文件； txt时，不选择
--map: 基因位点文件
--popA： 样本A基因型文件，txt时使用
--popB： 样本B基因型文件，txt时使用
--chr： 染色体，和输入文件一致即可，每次分析一个
--ld： LD值，进行权重分析
--maxsnps： 一个窗口最大的SNP
--minsnps： 一个窗口最小的SNP
--size： 窗口大小
--step： 步长

其中，群体A为reference，群体B为目标群体。

运行示例文件

## 输入VCF文件
xpclr --out test -Sa samplesA.txt -Sb samplesB.txt \
  -I small.vcf.gz -C 3L --ld 0.95 --phased --maxsnps 600 \
  --size 200000 --step 20000
### 输入txt文件
xpclr --format txt --out test --map mapfile.snp \
  --popA genotype1.geno --popB genotype2.geno \
  --chr 1 --ld 0.95 --phased --maxsnps 600 \
  --size 200000 --step 20000

结果
结果文件中倒数两列分别为xpclr标准化值，以及xpclr的值

原版XP-CLR

原版XP-CLR多年没有更新

1 安装

软件可以从此处下载https://reich.hms.harvard.edu/software

wget https://reich.hms.harvard.edu/sites/reich.hms.harvard.edu/files/inline-files/XPCLR.tar
tar XPCLR.tar

在bin目录下有执行程序，以及3个示例文件

如果 XPCLR 无法在你的系统中运行，则需要自己用 src 的源码编译：

cd src
make
make install

2 参数说明

./XPCLR -h
Usage:
 XPCLR -xpclr hapmapInput1 hapmapInput2 mapInput outFile \
  -w1 gWin(Morgan) snpWin gridSize(bp) chrN -p corrLevel

# -xpclr: 后面接是两个群体的 .geno 文件（genofile1 和 genofile2）、 .snp 文件（mapfile）、输出文件（outputFile）
# -w1: 后面接的参数依次为：gWin 是 Morgan 为单位的window size (一般可以设为 0.005)；snpWin 代表一个 window 中最大的 SNP 数量；gridSize 是 bp 为单位的两个 grid points 的间距；chrN 是染色体数
# p: -p1 代表 phased 的数据，-p0 代表未 phased; 如果存在2个SNP，A/G, G/C，不能明确A，G或者A，C为同一染色体，则是unphased
#corrLevel：加权复合似然比检验中的 criterion，一般可设为0.95

3 运行示例数据

/data/pub/shehb/soft/XPCLR/bin/XPCLR -xpclr CEU.9 YRI.9 9.xpclr.b36.snp new -w1 0.005 200 1000  9 -p0   0.95

结果文件没一列依次为：chr, grid, ofSNPs_in_window, physical_pos, genetic_pos, XPCLR_score max_s.

4 其它说明

如果在运行python版本的过程当中，出现如下错误

No permission to write in the specified directory: {0}".format(outdir

则找到对应xpclr文件，加入

fn = args.out
    fn = os.path.abspath(args.out)
    outdir = os.path.dirname(fn)

snp位置信息文件的制作
过滤后的vcf文件

zcat in.vcf.gz | awk '($1=="MC01"){print $1":"$2"\tMC01\t"$2/100000000"\t"$2"\t"$4"\t"$5}' |grep -v '#' >MC01.map

两个genotype文件的制作
首先准备群体名称文件，相同的两列

head genoe1.txt
MB  MB
MF  MF
S001    S001
S002    S002
S003    S003
S004    S004
S005    S005
S006    S006

然后利用plink进行调去相应序列

 plink --vcf in.vcf.gz --keep genoe1.txt --chr MC01 \
  --out MC01.out --recode 01 transpose \
  -output-missing-genotype 9 --allow-extra-chr \
  --set-missing-var-ids @:# --keep-allele-order

##  -output-missing-genotype； 不符合就定义为9

得到一个.tped文件，然后调取对应基因型即可

cut -d " " -f 5- MC01.out.tped >popA_MC01

XP-CLR选择信号

python版本XP-CLR

1 安装

2 输入文件

3 运行

原版XP-CLR

1 安装

2 参数说明

3 运行示例数据

4 其它说明

参考

猜你喜欢

热点阅读