群体遗传学GWAS重测序分析

GWAS走起~(5 bcftools)

2019-12-02  本文已影响0人  roguish读书人

这几天没写简书,因为送去测序的样品又不合格的,比较烦人,今天刚补了样。

O,忘了.还要召唤油盐酱醋(call various)

这个不用学习新软件samtools就可以了,samtools中的mpielup。该命令用于生成bcf文件,再使用bcftools进行SNPIndel的分析。Bcftools不是新软件,bcftoolssamtool中附带的软件,在samtools的安装文件夹中可以找到。其实关于软件及其选项的怎么用的问题,可以直接在shell中打上命令,然后按椅子啊enter键就可以很清楚的看到这些命令及其参数的用法和说明。

好的,开盘!

[hai@localhost data]$ samtools mpileup

 

Usage: samtools mpileup [options] in1.bam [in2.bam [...]]

 

Input options:

 

       -6           assume the quality is in theIllumina-1.3+ encoding

       -A           count anomalous read pairs

       -B           disable BAQ computation

       -b FILE      list of input BAM filenames, one per line[null]

       -C INT       parameter for adjusting mapQ; 0 todisable [0]

       -d INT       max per-BAM depth to avoid excessivememory usage [250]

       -E           recalculate extended BAQ on the flythus ignoring existing BQs

       -f FILE      faidx indexed reference sequence file[null]

       -G FILE      exclude read groups listed in FILE [null]

       -l FILE      list of positions (chr pos) or regions(BED) [null]

       -M INT       cap mapping quality at INT [60]

       -r STR       region in which pileup is generated[null]

       -R           ignore RG tags

       -q INT       skip alignments with mapQ smaller thanINT [0]

       -Q INT       skip bases with baseQ/BAQ smaller thanINT [13]

       --rf INT     required flags: skip reads with mask bitsunset []

       --ff INT     filter flags: skip reads with mask bitsset []

 

Output options:

 

       -D           output per-sample DP in BCF (require-g/-u)

       -g           generate BCF output (genotypelikelihoods)

       -O           output base positions on reads(disabled by -g/-u)

       -s           output mapping quality (disabled by -g/-u)

       -S           output per-sample strand biasP-value in BCF (require -g/-u)

       -u           generate uncompress BCF output

 

SNP/INDEL genotype likelihoods options (effective with `-g' or `-u'):

 

       -e INT       Phred-scaled gap extension seq errorprobability [20]

       -F FLOAT     minimum fraction of gapped reads forcandidates [0.002]

       -h INT       coefficient for homopolymer errors [100]

       -I           do not perform indel calling

       -L INT       max per-sample depth for INDEL calling[250]

       -m INT       minimum gapped reads for indelcandidates [1]

       -o INT       Phred-scaled gap open sequencing errorprobability [40]

       -p           apply -m and -F per-sample toincrease sensitivity

       -P STR       comma separated list of platforms for indels[all]

 

Notes: Assuming diploid individuals.

这就是mpileup的参数及其用法,我怕解释不清楚,大家还是自己看着折磨吧!

再跟大家介绍个非常有用的Linux命令“|”这命令老厉害了,可以直接将上一步产生的内容放入下一个命令的输入中。

samtools mpileup -guSDf ref1.fa out.rmdup.bam | bcftools view -cvNg - >abc.vcf

-g 输出到bcf格式。

-f :来输入有索引文件的fasta参考序列。

[if !supportLists]·        [endif]-S: 在BCF中输出每个样本的偏置p值

-D :BCF中每个样本的DP输出(需要-g/-u)

-u: 生成解压BCF输出(这个可以缩短运行时间)

同样的做法,了解一下bcftools view

Usage: bcftools view [options] [reg]

Input/output options:

       -A        keep all possible alternate alleles atvariant sites

       -b        output BCF instead of VCF

       -D FILE   sequence dictionary for VCF->BCFconversion [null]

       -F        PL generated by r921 or before (whichgenerate old ordering)

       -G        suppress all individual genotypeinformation

       -l FILE   list of sites (chr pos) or regions (BED) tooutput [all sites]

       -L        calculate LD for adjacent sites

       -N        skip sites where REF is not A/C/G/T

       -Q        output the QCALL likelihood format

       -s FILE   list of samples to use [all samples]

       -S        input is VCF

       -u        uncompressed BCF output (force -b)

Consensus/variant calling options:

       -c        SNP calling (force -e)

       -d FLOAT  skip loci where less than FLOAT fraction ofsamples covered [0]

       -e        likelihood based analyses

       -g        call genotypes at variant sites (force-c)

       -i FLOAT  indel-to-substitution ratio [-1]

       -I        skip indels

       -m FLOAT  alternative model for multiallelic andrare-variant calling, include if P(chi^2)>=FLOAT

       -p FLOAT  variant if P(ref|D)

       -P STR    type of prior: full, cond2, flat [full]

       -t FLOAT scaled substitution mutation rate [0.001]

       -T STR    constrained calling; STR can be: pair,trioauto, trioxd and trioxs (see manual) [null]

       -v        output potential variant sites only(force -c)

Contrast calling and association test options:

       -1 INT    number of group-1 samples [0]

       -C FLOAT  posterior constrast for LRT

       -U INT    number of permutations for associationtesting (effective with -1) [0]

       -X FLOAT  only perform permutations forP(chi^2)

-c召唤snp

-v:输出变异位点;

[if !supportLists]·        [endif]-N跳过REF不是A/C/G/T的站点

[if !supportLists]·        [endif]-g召唤出不同位点的基因型(可以看成是基因分型)

 

最后,咱们得到一个vcf文件。这个就是我们要的菜了,好不好吃不知道,好不好看不知道,反正菜出来了,下面改一下它的吃相,改一下他的看相。

vcftools --vcf abc.vcf --plink --out SNP

先将vcf 格式转换为plink需要的格式,记住这个操作哦!操作完毕会有mappedlog三个文件格式产生,其实后期用mapped这两个就可以了。

下面就是plink亮相的时候了。

神龙已被召唤,那就来一下真正的GWAS  吧!!哇卡卡!

上一篇 下一篇

猜你喜欢

热点阅读