SnpEff: SNP的vcf文件注释
2021-11-19 本文已影响0人
胡童远
SnpEff文章
标题:A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff
中文:预测注释SNP的作用
杂志:Fly (Austin)
时间:2012
引用:6162 (谷歌学术2021.11.19)
地址
主页:http://pcingola.github.io/SnpEff/
安装
wget -c https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip
unzip snpEff_latest_core.zip
route="/your_route/hutongyuan/softwares/snpEff"
java -jar $route/snpEff.jar -h

SnpEff的数据库
java -jar $route/snpEff.jar databases > snpEff.databases.list
46041个不同的基因组,含基因组下载地址
自定义数据库
建库方法:http://pcingola.github.io/SnpEff/se_buildingdb/#option-2-building-a-database-from-gff-files
cd snpEff/ # 进入jar文件所在路径
mkdir -p ./data/bacteria/
cp my_ref_genes.gff ./data/bacteria/genes.gff # 拷贝my基因组的gff(含序列)
echo "bacteria.genome : a_name" >> snpEff.config # 添加信息到config
java -jar $route/snpEff.jar build -gff3 -v bacteria # 建库
部分过程
00:00:00 SnpEff version SnpEff 5.0e (build 2021-03-09 06:01), by Pablo Cingolani
00:00:00 Command: 'build'
00:00:00 Building database for 'bacteria'
00:00:00 Reading configuration file 'snpEff.config'. Genome: 'bacteria'
00:00:00 Reading config file: /hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/snpEff/snpEff.config
00:00:00 done
Reading GFF3 data file : '/hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/snpEff/./data/bacteria/genes.gff'
#-----------------------------------------------
# Genome name : 'a_name'
# Genome version : 'bacteria'
# Genome ID : 'bacteria[0]'
# Has protein coding info : true
# Has Tr. Support Level info : true
# Genes : 4196
# Protein coding genes : 4196
#-----------------------------------------------
00:00:03 Done
00:00:03 Logging
00:00:08 Checking for updates...
00:00:11 Done.
结果

注释SNP
mkdir result/
java -jar $route/snpEff.jar -v bacteria \
parsnp.vcf > ./result/parsnp_anno.vcf
mv snpEff_genes.txt ./result/
mv snpEff_summary.html ./result/
参数
-v 啰嗦模式
-no-intergenic 舍弃基因间注释
-no-downstream 舍弃下游注释
-no-upstream 舍弃上游注释
-no-intron 舍弃内含子注释
-no-utr 舍弃utr注释
部分过程
00:00:00 SnpEff version SnpEff 5.0e (build 2021-03-09 06:01), by Pablo Cingolani
00:00:00 Command: 'ann'
00:00:00 Reading configuration file 'snpEff.config'. Genome: 'bacteria'
00:00:00 Reading config file: /hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/snpEff/snpEff.config
00:00:00 done
00:00:00 Reading database for genome version 'bacteria' from file '/hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/snpEff/./data/bacteria/snpEffectPredictor.bin' (this might take a while)
00:00:01 done
#-----------------------------------------------
# Genome name : 'a_name'
# Genome version : 'bacteria'
# Genome ID : 'bacteria[0]'
# Has protein coding info : true
# Has Tr. Support Level info : true
# Genes : 4196
# Protein coding genes : 4196
#-----------------------------------------------
00:00:15 Creating summary file: snpEff_summary.html
00:00:15 Creating genes file: snpEff_genes.txt
00:00:15 done.
00:00:15 Logging
00:00:20 Checking for updates...
00:00:23 Done.
结果

结果:anno.vcf文件

1 染色体
2 突变位置
3 突变周边,“.”的右边时突变位置
4 参考碱基
5 突变碱基
6 是否通过过滤
7 突变类型,氨基酸变化,上下游、基因间、内含子的突变情况
8 参考时0
9/10 不突变0,突变1
第七列详情,任取三个突变

结果:HTML文件
结果模块
Summary
Variant rate by chromosome
Variants by type
Number of variants by impact
Number of variants by functional class
Number of variants by effect
Quality histogram
InDel length histogram
Base variant table
Transition vs transversions (ts/tv)
Allele frequency
Allele Count
Codon change table
Amino acid change table
Chromosome variants plots
Details by gene
突变效果分类统计

突变类型和区域统计


碱基改变统计

氨基酸变化统计

突变位置统计
