GWAS走起~(5 bcftools)
这几天没写简书,因为送去测序的样品又不合格的,比较烦人,今天刚补了样。

O,忘了.还要召唤油盐酱醋(call various)

这个不用学习新软件samtools就可以了,samtools中的mpielup。该命令用于生成bcf文件,再使用bcftools进行SNP和Indel的分析。Bcftools不是新软件,bcftools是samtool中附带的软件,在samtools的安装文件夹中可以找到。其实关于软件及其选项的怎么用的问题,可以直接在shell中打上命令,然后按椅子啊enter键就可以很清楚的看到这些命令及其参数的用法和说明。
好的,开盘!
[hai@localhost data]$ samtools mpileup
Usage: samtools mpileup [options] in1.bam [in2.bam [...]]
Input options:
-6 assume the quality is in theIllumina-1.3+ encoding
-A count anomalous read pairs
-B disable BAQ computation
-b FILE list of input BAM filenames, one per line[null]
-C INT parameter for adjusting mapQ; 0 todisable [0]
-d INT max per-BAM depth to avoid excessivememory usage [250]
-E recalculate extended BAQ on the flythus ignoring existing BQs
-f FILE faidx indexed reference sequence file[null]
-G FILE exclude read groups listed in FILE [null]
-l FILE list of positions (chr pos) or regions(BED) [null]
-M INT cap mapping quality at INT [60]
-r STR region in which pileup is generated[null]
-R ignore RG tags
-q INT skip alignments with mapQ smaller thanINT [0]
-Q INT skip bases with baseQ/BAQ smaller thanINT [13]
--rf INT required flags: skip reads with mask bitsunset []
--ff INT filter flags: skip reads with mask bitsset []
Output options:
-D output per-sample DP in BCF (require-g/-u)
-g generate BCF output (genotypelikelihoods)
-O output base positions on reads(disabled by -g/-u)
-s output mapping quality (disabled by -g/-u)
-S output per-sample strand biasP-value in BCF (require -g/-u)
-u generate uncompress BCF output
SNP/INDEL genotype likelihoods options (effective with `-g' or `-u'):
-e INT Phred-scaled gap extension seq errorprobability [20]
-F FLOAT minimum fraction of gapped reads forcandidates [0.002]
-h INT coefficient for homopolymer errors [100]
-I do not perform indel calling
-L INT max per-sample depth for INDEL calling[250]
-m INT minimum gapped reads for indelcandidates [1]
-o INT Phred-scaled gap open sequencing errorprobability [40]
-p apply -m and -F per-sample toincrease sensitivity
-P STR comma separated list of platforms for indels[all]
Notes: Assuming diploid individuals.
这就是mpileup的参数及其用法,我怕解释不清楚,大家还是自己看着折磨吧!
再跟大家介绍个非常有用的Linux命令“|”这命令老厉害了,可以直接将上一步产生的内容放入下一个命令的输入中。
samtools mpileup -guSDf ref1.fa out.rmdup.bam | bcftools view -cvNg - >abc.vcf
-g: 输出到bcf格式。
-f :来输入有索引文件的fasta参考序列。
[if !supportLists]· [endif]-S: 在BCF中输出每个样本的偏置p值
-D :BCF中每个样本的DP输出(需要-g/-u)
-u: 生成解压BCF输出(这个可以缩短运行时间)
同样的做法,了解一下bcftools view
Usage: bcftools view [options] [reg]
Input/output options:
-A keep all possible alternate alleles atvariant sites
-b output BCF instead of VCF
-D FILE sequence dictionary for VCF->BCFconversion [null]
-F PL generated by r921 or before (whichgenerate old ordering)
-G suppress all individual genotypeinformation
-l FILE list of sites (chr pos) or regions (BED) tooutput [all sites]
-L calculate LD for adjacent sites
-N skip sites where REF is not A/C/G/T
-Q output the QCALL likelihood format
-s FILE list of samples to use [all samples]
-S input is VCF
-u uncompressed BCF output (force -b)
Consensus/variant calling options:
-c SNP calling (force -e)
-d FLOAT skip loci where less than FLOAT fraction ofsamples covered [0]
-e likelihood based analyses
-g call genotypes at variant sites (force-c)
-i FLOAT indel-to-substitution ratio [-1]
-I skip indels
-m FLOAT alternative model for multiallelic andrare-variant calling, include if P(chi^2)>=FLOAT
-p FLOAT variant if P(ref|D)
-P STR type of prior: full, cond2, flat [full]
-t FLOAT scaled substitution mutation rate [0.001]
-T STR constrained calling; STR can be: pair,trioauto, trioxd and trioxs (see manual) [null]
-v output potential variant sites only(force -c)
Contrast calling and association test options:
-1 INT number of group-1 samples [0]
-C FLOAT posterior constrast for LRT
-U INT number of permutations for associationtesting (effective with -1) [0]
-X FLOAT only perform permutations forP(chi^2)
-c:召唤snp;
-v:输出变异位点;
[if !supportLists]· [endif]-N:跳过REF不是A/C/G/T的站点
[if !supportLists]· [endif]-g:召唤出不同位点的基因型(可以看成是基因分型)
最后,咱们得到一个vcf文件。这个就是我们要的菜了,好不好吃不知道,好不好看不知道,反正菜出来了,下面改一下它的吃相,改一下他的看相。
vcftools --vcf abc.vcf --plink --out SNP
先将vcf 格式转换为plink需要的格式,记住这个操作哦!操作完毕会有map、ped、log三个文件格式产生,其实后期用map、ped这两个就可以了。

下面就是plink亮相的时候了。

神龙已被召唤,那就来一下真正的GWAS 吧!!哇卡卡!