vcf注释软件VEP
一、简介
Variant Effect Predictor
The VEP is a software suite that performs annotation and analysis of most types of genomic variation in coding and noncoding regions of the genome. From disease investigation to population studies, it is a critical tool to annotate variants and prioritize a subset for further analysis.
使用说明概览:https://asia.ensembl.org/info/docs/tools/vep/script/vep_tutorial.html
详情:http://asia.ensembl.org/info/docs/tools/vep/script/vep_download.html
二、下载安装
1、下载
git clone https://github.com/Ensembl/ensembl-vep.git
2、安装
cd ensembl-vep
perl INSTALL.pl
选数据库,选插件,会进行下载和解压。0是不选,all是选所有,某个数字就下载某个版本的。
如果没有对应的cache文件也没关系,可以用脚本从gtf和fa文件转化。The VEP package also includes a script, gtf2vep.pl, to build custom cache files. This requires a local GFF or general transfer format (GTF) file that describes transcript structures and a FASTA file of the genomic sequence.
3、测试
如果未下载vcf文件对应版本的数据库需要加上参数--port 3337。
/home/shaoyu/software/ensembl-vep/vep -i /home/shaoyu/software/ensembl-vep/examples/homo_sapiens_GRCh37.vcf --cache --port 3337
结果文件:variant_effect_output.txt variant_effect_output.txt_summary.html
三、使用
/home/shaoyu/software/ensembl-vep/vep -i /home/shaoyu/software/ensembl-vep/examples/homo_sapiens_GRCh37.vcf --cache --port 3337 #bascic
/home/shaoyu/software/ensembl-vep/vep -i /home/shaoyu/software/ensembl-vep/examples/homo_sapiens_GRCh37.vcf --cache --port 3337 --sift b -o test2.sift.txt #SIFT is an algorithm for predicting whether a given change in a protein sequence will be deleterious to the function of that protein. the b means we want both the prediction and the score.
/home/shaoyu/software/ensembl-vep/filter_vep -i test2.sift.txt -filter "SIFT is deleterious" -o test2.sift.filter.txt #只留下deleterious的
/home/shaoyu/software/ensembl-vep/vep -i /home/shaoyu/software/ensembl-vep/examples/homo_sapiens_GRCh37.vcf --cache --port 3337 --everything -o test3.everything.txt #--everthing加上所有注释
--everthing
Shortcut flag to switch on all of the following:
--sift b, --polyphen b, --ccds, --uniprot, --hgvs, --symbol, --numbers, --domains, --regulatory, --canonical, --protein, --biotype, --uniprot, --tsl, --appris, --gene_phenotype --af, --af_1kg, --af_esp, --af_gnomad, --max_af, --pubmed, --variant_class
更多参数:
https://asia.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_species
过滤参数:
四、解读
1. transcript annotation
transcript annotation2. Protein annotation
五、输出文件
可以通过参数设定文件格式(txt, vcf, json),默认为txt。