生信猿全基因组/外显子组测序分析

【体细胞突变-1】strelka

2019-01-10  本文已影响3人  oddxix

欢迎关注公众号:oddxix

体细胞突变


体细胞突变(somatic mutation)是指患者某些组织或者器官后天性地发生了体细胞变异,虽然它不会遗传给后代个体,却可以通过细胞分裂,遗传给子代细胞。体细胞突变对肿瘤的发生发展有关键性的作用,并且它也是制定肿瘤癌症靶向治疗措施的关键所在。NGS使体细胞变异的检测更加全面,成本更低,在检测多种体细胞变异上具有很大的优势,但在使用过程中还存在着挑战:如样品降解、覆盖度不足、遗传异质性和组织污染(杂质)等问题。 为应对以上挑战,降低错误率,科学家采取了不同的算法和统计模型用于检测体细胞突变。


strelka

Strelka从比对的测序数据中调用生殖突变和体细胞突变。 它被优化用于临床分析小群组中的生殖变异和肿瘤/正常样本对中的体细胞变异。

安装

下载之后解压就可以使用,最好修改一下环境变量。
下载指令:

mkdir strelka &&  cd strelka
wget https://github.com/Illumina/strelka/releases/download/v2.8.2/strelka-2.8.2.centos5_x86_64.tar.bz2
tar xvfj strelka-2.8.2.centos5_x86_64.tar.bz2

参数

Usage: configureStrelkaGermlineWorkflow.py [options]

Version: 2.9.6

This script configures Strelka germline small variant calling.You must specify an alignment file (BAM or CRAM) for at least one sample.

Configuration will produce a workflow run script which
can execute the workflow on a single node or through
sge and resume any interrupted execution.

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --config=FILE         provide a configuration file to override defaults in
                        global config file (/work/home/biostar/exome_software/
                        strelka-2.9.6.centos6_x86_64/bin/configureStrelkaGerml
                        ineWorkflow.py.ini)
  --allHelp             show all extended/hidden options

  Workflow options:
    --bam=FILE          Sample BAM or CRAM file. May be specified more than
                        once, multiple inputs will be treated as each BAM file
                        representing a different sample. [required] (no                        default)
    --ploidy=FILE       Provide ploidy file in VCF. The VCF should include one
                        sample column per input sample labeled with the same
                        sample names found in the input BAM/CRAM RG header
                        sections. Ploidy should be provided in records using
                        the FORMAT/CN field, which are interpreted to span the
                        range [POS+1, INFO/END]. Any CN value besides 1 or 0
                        will be treated as 2. File must be tabix indexed. (no                        default)
    --noCompress=FILE   Provide BED file of regions where gVCF block
                        compression is not allowed. File must be bgzip-
                        compressed/tabix-indexed. (no default)
    --callContinuousVf=CHROM
                        Call variants on CHROM without a ploidy prior
                        assumption, issuing calls with continuous variant                        frequencies (no default)
    --rna               Set options for RNA-Seq input.
    --referenceFasta=FILE
                        samtools-indexed reference fasta file [required]
    --indelCandidates=FILE
                        Specify a VCF of candidate indel alleles. These
                        alleles are always evaluated but only reported in the
                        output when they are inferred to exist in the sample.
                        The VCF must be tabix indexed. All indel alleles must
                        be left-shifted/normalized, any unnormalized alleles
                        will be ignored. This option may be specified more
                        than once, multiple input VCFs will be merged.
                        (default: None)
    --forcedGT=FILE     Specify a VCF of candidate alleles. These alleles are
                        always evaluated and reported even if they are
                        unlikely to exist in the sample. The VCF must be tabix
                        indexed. All indel alleles must be left-
                        shifted/normalized, any unnormalized allele will
                        trigger a runtime error. This option may be specified
                        more than once, multiple input VCFs will be merged.
                        Note that for any SNVs provided in the VCF, the SNV
                        site will be reported (and for gVCF, excluded from
                        block compression), but the specific SNV alleles are
                        ignored. (default: None)
    --exome, --targeted
                        Set options for exome or other targeted input: note in
                        particular that this flag turns off high-depth filters
    --callRegions=FILE  Optionally provide a bgzip-compressed/tabix-indexed
                        BED file containing the set of regions to call. No VCF
                        output will be provided outside of these regions. The
                        full genome will still be used to estimate statistics
                        from the input (such as expected depth per
                        chromosome). Only one BED file may be specified.                        (default: call the entire genome)
    --runDir=DIR        Name of directory to be created where all workflow
                        scripts and output will be written. Each analysis                        requires a separate directory. (default:
                        StrelkaGermlineWorkflow)

Strelka的运行有两个步骤:(1)配置和(2)工作流执行。配置步骤用于指定输入数据和与变体调用方法本身相关的任何选项。执行步骤用于指定与strelka如何执行相关的任何参数(例如作业应该在其上并行化的内核或SGE节点的总数)。第二个执行步骤也可以被中断并重新启动,而不需要更改工作流的最终结果。

Somatic configuration example

${STRELKA_INSTALL_PATH}/bin/configureStrelkaSomaticWorkflow.py \
--normalBam HCC1187BL.bam \
--tumorBam HCC1187C.bam \
--referenceFasta hg19.fa \
--indelCandidates ${MANTA_ANALYSIS_PATH}/results/variants/candidateSmallIndels.vcf.gz \
--runDir ${STRELKA_ANALYSIS_PATH}

运行之后,产生${STRELKA_ANALYSIS_PATH}目录,上面的代码只是生成了一个python脚本,还需要自行提交该脚本才能得到结果。

运行命令:./runWorkflow.py -m local -j 1 -g 4 ;
就可以得到results结果目录

Germline configuration example

${STRELKA_INSTALL_PATH}/bin/configureStrelkaGermlineWorkflow.py \
--bam NA12878.bam \
--referenceFasta hg19.fa \
--runDir ${STRELKA_ANALYSIS_PATH}

可选参数

--indelCandidates向任何Strelka工作流提供一个或多个候选indel VCF。 以这种方式提供的任何indel将被给予候选状态并在重新调整和基因分型步骤期间被考虑,但除非在输入样本中找到indel等位基因,否则将不输出。

--callRegions配置选项以BED格式提供区域文件,Strelka在默认情况下调用整个基因组,变体调用可能被限制为基因组的任意子集。BED文件必须是bgzip压缩的和tabix索引的,并且只能指定一个这样的BED文件。当指定时,所有VCF输出仅局限于提供的调用区域,可用于外显子组区域调用

strelka的注意事项

对于WGS、WES和ctDNA超深度测序,需要编辑strrelka安装目录下配置文件strelka_config_bwa_default.ini

WGS:默认参数不变

WES:设置第一个参数isSkipDepthFilters=1(默认0),表示不过滤测序深度(skip depth filtration)

ctDNA:设置第一个参数isSkipDepthFilters=1,第二个参数maxInputDepth = 30000(默认10000)

欢迎关注oddxix

有趣的灵魂等着你~
如果觉得写的不错记得点个赞哦~

上一篇下一篇

猜你喜欢

热点阅读