snpEff软件操作群体遗传学基因家族和关联分析

SnpEff: SNP的vcf文件注释

2021-11-19  本文已影响0人  胡童远

SnpEff文章

标题:A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff
中文:预测注释SNP的作用
杂志:Fly (Austin)
时间:2012
引用:6162 (谷歌学术2021.11.19)

地址

主页:http://pcingola.github.io/SnpEff/

安装

wget -c https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip
unzip snpEff_latest_core.zip

route="/your_route/hutongyuan/softwares/snpEff"
java -jar $route/snpEff.jar -h

SnpEff的数据库

java -jar $route/snpEff.jar databases > snpEff.databases.list

46041个不同的基因组,含基因组下载地址

自定义数据库

建库方法:http://pcingola.github.io/SnpEff/se_buildingdb/#option-2-building-a-database-from-gff-files

cd snpEff/  # 进入jar文件所在路径
mkdir -p ./data/bacteria/
cp my_ref_genes.gff ./data/bacteria/genes.gff  # 拷贝my基因组的gff(含序列)
echo "bacteria.genome : a_name" >> snpEff.config  # 添加信息到config
java -jar $route/snpEff.jar build -gff3 -v bacteria  # 建库

部分过程

00:00:00        SnpEff version SnpEff 5.0e (build 2021-03-09 06:01), by Pablo Cingolani
00:00:00        Command: 'build'
00:00:00        Building database for 'bacteria'
00:00:00        Reading configuration file 'snpEff.config'. Genome: 'bacteria'
00:00:00        Reading config file: /hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/snpEff/snpEff.config
00:00:00        done
Reading GFF3 data file  : '/hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/snpEff/./data/bacteria/genes.gff'
#-----------------------------------------------
# Genome name                : 'a_name'
# Genome version             : 'bacteria'
# Genome ID                  : 'bacteria[0]'
# Has protein coding info    : true
# Has Tr. Support Level info : true
# Genes                      : 4196
# Protein coding genes       : 4196
#-----------------------------------------------
00:00:03        Done
00:00:03        Logging
00:00:08        Checking for updates...
00:00:11        Done.

结果

注释SNP

mkdir result/
java -jar $route/snpEff.jar -v bacteria \
parsnp.vcf > ./result/parsnp_anno.vcf
mv snpEff_genes.txt ./result/
mv snpEff_summary.html ./result/

参数

-v  啰嗦模式
-no-intergenic   舍弃基因间注释
-no-downstream  舍弃下游注释
-no-upstream  舍弃上游注释
-no-intron  舍弃内含子注释
-no-utr  舍弃utr注释

部分过程

00:00:00        SnpEff version SnpEff 5.0e (build 2021-03-09 06:01), by Pablo Cingolani
00:00:00        Command: 'ann'
00:00:00        Reading configuration file 'snpEff.config'. Genome: 'bacteria'
00:00:00        Reading config file: /hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/snpEff/snpEff.config
00:00:00        done
00:00:00        Reading database for genome version 'bacteria' from file '/hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/snpEff/./data/bacteria/snpEffectPredictor.bin' (this might take a while)
00:00:01        done
#-----------------------------------------------
# Genome name                : 'a_name'
# Genome version             : 'bacteria'
# Genome ID                  : 'bacteria[0]'
# Has protein coding info    : true
# Has Tr. Support Level info : true
# Genes                      : 4196
# Protein coding genes       : 4196
#-----------------------------------------------
00:00:15        Creating summary file: snpEff_summary.html
00:00:15        Creating genes file: snpEff_genes.txt
00:00:15        done.
00:00:15        Logging
00:00:20        Checking for updates...
00:00:23        Done.

结果

结果:anno.vcf文件

1 染色体
2 突变位置
3 突变周边,“.”的右边时突变位置
4 参考碱基
5 突变碱基
6 是否通过过滤
7 突变类型,氨基酸变化,上下游、基因间、内含子的突变情况
8 参考时0
9/10 不突变0,突变1

第七列详情,任取三个突变

结果:HTML文件

结果模块

Summary
Variant rate by chromosome
Variants by type
Number of variants by impact
Number of variants by functional class
Number of variants by effect
Quality histogram
InDel length histogram
Base variant table
Transition vs transversions (ts/tv)
Allele frequency
Allele Count
Codon change table
Amino acid change table
Chromosome variants plots
Details by gene

突变效果分类统计

突变类型和区域统计

碱基改变统计

氨基酸变化统计

突变位置统计

预测变异的效果
推荐一款高引超6000次的全基因组/全外显子组变异注释工具

上一篇下一篇

猜你喜欢

热点阅读