【体细胞突变-1】strelka
欢迎关注公众号:oddxix
体细胞突变
体细胞突变(somatic mutation)是指患者某些组织或者器官后天性地发生了体细胞变异,虽然它不会遗传给后代个体,却可以通过细胞分裂,遗传给子代细胞。体细胞突变对肿瘤的发生发展有关键性的作用,并且它也是制定肿瘤癌症靶向治疗措施的关键所在。NGS使体细胞变异的检测更加全面,成本更低,在检测多种体细胞变异上具有很大的优势,但在使用过程中还存在着挑战:如样品降解、覆盖度不足、遗传异质性和组织污染(杂质)等问题。 为应对以上挑战,降低错误率,科学家采取了不同的算法和统计模型用于检测体细胞突变。
strelka
Strelka从比对的测序数据中调用生殖突变和体细胞突变。 它被优化用于临床分析小群组中的生殖变异和肿瘤/正常样本对中的体细胞变异。
安装
下载之后解压就可以使用,最好修改一下环境变量。
下载指令:
mkdir strelka && cd strelka
wget https://github.com/Illumina/strelka/releases/download/v2.8.2/strelka-2.8.2.centos5_x86_64.tar.bz2
tar xvfj strelka-2.8.2.centos5_x86_64.tar.bz2
参数:
Usage: configureStrelkaGermlineWorkflow.py [options]
Version: 2.9.6
This script configures Strelka germline small variant calling.You must specify an alignment file (BAM or CRAM) for at least one sample.
Configuration will produce a workflow run script which
can execute the workflow on a single node or through
sge and resume any interrupted execution.
Options:
--version show program's version number and exit
-h, --help show this help message and exit
--config=FILE provide a configuration file to override defaults in
global config file (/work/home/biostar/exome_software/
strelka-2.9.6.centos6_x86_64/bin/configureStrelkaGerml
ineWorkflow.py.ini)
--allHelp show all extended/hidden options
Workflow options:
--bam=FILE Sample BAM or CRAM file. May be specified more than
once, multiple inputs will be treated as each BAM file
representing a different sample. [required] (no default)
--ploidy=FILE Provide ploidy file in VCF. The VCF should include one
sample column per input sample labeled with the same
sample names found in the input BAM/CRAM RG header
sections. Ploidy should be provided in records using
the FORMAT/CN field, which are interpreted to span the
range [POS+1, INFO/END]. Any CN value besides 1 or 0
will be treated as 2. File must be tabix indexed. (no default)
--noCompress=FILE Provide BED file of regions where gVCF block
compression is not allowed. File must be bgzip-
compressed/tabix-indexed. (no default)
--callContinuousVf=CHROM
Call variants on CHROM without a ploidy prior
assumption, issuing calls with continuous variant frequencies (no default)
--rna Set options for RNA-Seq input.
--referenceFasta=FILE
samtools-indexed reference fasta file [required]
--indelCandidates=FILE
Specify a VCF of candidate indel alleles. These
alleles are always evaluated but only reported in the
output when they are inferred to exist in the sample.
The VCF must be tabix indexed. All indel alleles must
be left-shifted/normalized, any unnormalized alleles
will be ignored. This option may be specified more
than once, multiple input VCFs will be merged.
(default: None)
--forcedGT=FILE Specify a VCF of candidate alleles. These alleles are
always evaluated and reported even if they are
unlikely to exist in the sample. The VCF must be tabix
indexed. All indel alleles must be left-
shifted/normalized, any unnormalized allele will
trigger a runtime error. This option may be specified
more than once, multiple input VCFs will be merged.
Note that for any SNVs provided in the VCF, the SNV
site will be reported (and for gVCF, excluded from
block compression), but the specific SNV alleles are
ignored. (default: None)
--exome, --targeted
Set options for exome or other targeted input: note in
particular that this flag turns off high-depth filters
--callRegions=FILE Optionally provide a bgzip-compressed/tabix-indexed
BED file containing the set of regions to call. No VCF
output will be provided outside of these regions. The
full genome will still be used to estimate statistics
from the input (such as expected depth per
chromosome). Only one BED file may be specified. (default: call the entire genome)
--runDir=DIR Name of directory to be created where all workflow
scripts and output will be written. Each analysis requires a separate directory. (default:
StrelkaGermlineWorkflow)
Strelka的运行有两个步骤:(1)配置和(2)工作流执行。配置步骤用于指定输入数据和与变体调用方法本身相关的任何选项。执行步骤用于指定与strelka如何执行相关的任何参数(例如作业应该在其上并行化的内核或SGE节点的总数)。第二个执行步骤也可以被中断并重新启动,而不需要更改工作流的最终结果。
Somatic configuration example
${STRELKA_INSTALL_PATH}/bin/configureStrelkaSomaticWorkflow.py \
--normalBam HCC1187BL.bam \
--tumorBam HCC1187C.bam \
--referenceFasta hg19.fa \
--indelCandidates ${MANTA_ANALYSIS_PATH}/results/variants/candidateSmallIndels.vcf.gz \
--runDir ${STRELKA_ANALYSIS_PATH}
运行之后,产生${STRELKA_ANALYSIS_PATH}目录,上面的代码只是生成了一个python脚本,还需要自行提交该脚本才能得到结果。
运行命令:./runWorkflow.py -m local -j 1 -g 4 ;
就可以得到results结果目录
Germline configuration example
${STRELKA_INSTALL_PATH}/bin/configureStrelkaGermlineWorkflow.py \
--bam NA12878.bam \
--referenceFasta hg19.fa \
--runDir ${STRELKA_ANALYSIS_PATH}
可选参数
--indelCandidates
向任何Strelka工作流提供一个或多个候选indel VCF。 以这种方式提供的任何indel将被给予候选状态并在重新调整和基因分型步骤期间被考虑,但除非在输入样本中找到indel等位基因,否则将不输出。
--callRegions
配置选项以BED格式提供区域文件,Strelka在默认情况下调用整个基因组,变体调用可能被限制为基因组的任意子集。BED文件必须是bgzip压缩的和tabix索引的,并且只能指定一个这样的BED文件。当指定时,所有VCF输出仅局限于提供的调用区域,可用于外显子组区域调用
strelka的注意事项
对于WGS、WES和ctDNA超深度测序,需要编辑strrelka安装目录下配置文件strelka_config_bwa_default.ini
WGS:默认参数不变
WES:设置第一个参数isSkipDepthFilters=1(默认0),表示不过滤测序深度(skip depth filtration)
ctDNA:设置第一个参数isSkipDepthFilters=1,第二个参数maxInputDepth = 30000(默认10000)
欢迎关注oddxix
有趣的灵魂等着你~
如果觉得写的不错记得点个赞哦~