2020-06-28 学习WES数据分析流程5

2020-06-28  本文已影响0人  程凉皮儿

ANNOVAR 注释软件注释过滤

其实之前也学习了点相关内容外显子数据处理笔记 3 ANNOVAR | 注释但当时没有好用的服务器,很多操作都没能完成。
今天来重新学习一下。

参考学习资料:

ANNOVAR是由大神王凯编写的一个注释软件,可以对SNP和indel进行注释,也可以进行变异的过滤筛选。

ANNOVAR能够利用最新的数据来分析各种基因组中的遗传变异。主要包含三种不同的注释方法,Gene-based Annotation(基于基因的注释)、Region-based Annotation(基于区域的注释)、Filter-based Annotation(基于筛选的注释)。

ANNOVAR由Perl编写。

优点:提供多个数据可直接下载、支持多种格式、注释直观;
缺点:没有数据库的物种无法注释。

ANNOVAR下载数据库

命令示例

[kaiwang@biocluster ~/]$ Perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar refGene humandb/
# -buildver 表示version
# -downdb 下载数据库的指令
# -webfrom annovar 从annovar提供的镜像下载,不加此参数将寻找数据库本身的源
# humandb/ 存放于humandb/目录下

ANNOVAR的官方文档列出了可供下载的数据库及版本、更新日期等信息,可用-downdb avdblist参数查看。

那么根据指引就开始探索
首先是下载和安装软件及必要的数据库文件:

cd ~/wes_cancer/biosoft
# wget 下载地址
tar -zxvf annovar.latest.tar.gz
cd annovar
nohup perl annotate_variation.pl -downdb -webfrom annovar gnomad_genome --buildver hg38 humandb/ >down.log 2>&1 &
nohup perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar ensGene humandb/ &
nohup perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar cytoBand humandb/ &
nohup perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar avsnp147 humandb/ &
nohup perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar exac03 humandb/ &
nohup perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar 1000g2015aug humandb/ &
nohup perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar clinvar_20200316 humandb/ &
nohup perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar dbnsfp30a humandb/ &
nohup perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar knownGene humandb/ &
nohup perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar refGene humandb/  

下载好的数据文件以hg38构建,结果如下:

(base) root@1100150:~# tree wes_cancer/biosoft/annovar/humandb/
wes_cancer/biosoft/annovar/humandb/
├── GRCh37_MT_ensGene.txt
├── GRCh37_MT_ensGeneMrna.fa
├── annovar_downdb.log
├── genometrax-sample-files-gff
│   ├── list
│   ├── sample_chip_featuretype_hg19.gff
│   ├── sample_common_snp_featuretype_hg19.gff
│   ├── sample_cosmic_featuretype_hg19.gff
│   ├── sample_cpg_islands_featuretype_hg19.gff
│   ├── sample_dbnsfp_featuretype_hg19.gff
│   ├── sample_disease_featuretype_hg19.gff
│   ├── sample_dnase_featuretype_hg19.gff
│   ├── sample_drug_featuretype_hg19.gff
│   ├── sample_evs_featuretype_hg19.gff
│   ├── sample_gwas_featuretype_hg19.gff
│   ├── sample_hgmd_common_snp_featuretype_hg19.gff
│   ├── sample_hgmd_disease_genes_featuretype_hg19.gff
│   ├── sample_hgmd_featuretype_hg19.gff
│   ├── sample_hgmdimputed_featuretype_hg19.gff
│   ├── sample_miRNA_featuretype_hg19.gff
│   ├── sample_microsatellites_featuretype_hg19.gff
│   ├── sample_omim_featuretype_hg19.gff
│   ├── sample_pathway_featuretype_hg19.gff
│   ├── sample_pgx_featuretype_hg19.gff
│   ├── sample_ptms_featuretype_hg19.gff
│   ├── sample_snps_dbsnp_featuretype_hg19.gff
│   ├── sample_snps_ensembl_featuretype_hg19.gff
│   ├── sample_transfac_sites_featuretype_hg19.gff
│   └── sample_tss_featuretype_hg19.gff
├── hg19_MT_ensGene.txt
├── hg19_MT_ensGeneMrna.fa
├── hg19_avsnp147.txt
├── hg19_avsnp147.txt.idx
├── hg19_cytoBand.txt
├── hg19_dbnsfp30a.txt
├── hg19_dbnsfp30a.txt.idx
├── hg19_exac03.txt
├── hg19_exac03.txt.idx
├── hg19_example_db_generic.txt
├── hg19_example_db_gff3.txt
├── hg19_refGene.txt
├── hg19_refGeneMrna.fa
├── hg19_refGeneVersion.txt
├── hg19_refGeneWithVer.txt
├── hg19_refGeneWithVerMrna.fa
├── hg38_AFR.sites.2015_08.txt
├── hg38_AFR.sites.2015_08.txt.idx
├── hg38_ALL.sites.2015_08.txt
├── hg38_ALL.sites.2015_08.txt.idx
├── hg38_AMR.sites.2015_08.txt
├── hg38_AMR.sites.2015_08.txt.idx
├── hg38_EAS.sites.2015_08.txt
├── hg38_EAS.sites.2015_08.txt.idx
├── hg38_EUR.sites.2015_08.txt
├── hg38_EUR.sites.2015_08.txt.idx
├── hg38_SAS.sites.2015_08.txt
├── hg38_SAS.sites.2015_08.txt.idx
├── hg38_avsnp147.txt
├── hg38_avsnp147.txt.idx
├── hg38_clinvar_20200316.txt
├── hg38_clinvar_20200316.txt.idx
├── hg38_dbnsfp30a.txt
├── hg38_dbnsfp30a.txt.idx
├── hg38_exac03.txt
├── hg38_exac03.txt.idx
├── hg38_kgXref.txt
├── hg38_knownGene.txt
├── hg38_knownGeneMrna.fa
├── hg38_refGene.txt
├── hg38_refGeneMrna.fa
└── hg38_refGeneVersion.txt

1 directory, 70 files

其中千人基因组分为不同的地区有共6个文件。

进行注释过滤

使用方法

table_annovar.pl
Usage:
     table_annovar.pl [arguments] <query-file> <database-location>

实例:

perl ~/wes_cancer/biosoft/annovar/table_annovar.pl ~/wes_cancer/project/5.gatk/9Y1640WES.indel.VQSR.vcf ~/wes_cancer/biosoft/annovar/humandb/ \
-buildver hg38 \
-out ~/wes_cancer/project/7.annotation/annovar/hg38_anno \
-remove \
-protocol refGene,knownGene,clinvar_20200316,avsnp147,dbnsfp30a,exac03,EAS.sites.2015_08,ALL.sites.2015_08 \
-operation g,g,f,f,f,f,f,f \
-nastring . \
-csvout -polish --thread 8

结果如下:

(base) root@1100150:~/wes_cancer/project/7.annotation/annovar# du -h hg38_anno.hg38_multianno.csv
2.4M    hg38_anno.hg38_multianno.csv

查年前几行:

(base) root@1100150:~/wes_cancer/project/7.annotation/annovar# less -S hg38_anno.hg38_multianno.csv |grep -v "#"|less -S |head
Chr,Start,End,Ref,Alt,Func.refGene,Gene.refGene,GeneDetail.refGene,ExonicFunc.refGene,AAChange.refGene,Func.knownGene,Gene.knownGene,GeneDetail.knownGene,ExonicFunc.knownGene,AAChange.knownGene,CLNALLELEID,CLNDN,CLNDISDB,CLNREVSTAT,CLNSIG,avsnp147,SIFT_score,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_pred,LRT_score,LRT_pred,MutationTaster_score,MutationTaster_pred,MutationAssessor_score,MutationAssessor_pred,FATHMM_score,FATHMM_pred,PROVEAN_score,PROVEAN_pred,VEST3_score,CADD_raw,CADD_phred,DANN_score,fathmm-MKL_coding_score,fathmm-MKL_coding_pred,MetaSVM_score,MetaSVM_pred,MetaLR_score,MetaLR_pred,integrated_fitCons_score,integrated_confidence_value,GERP++_RS,phyloP7way_vertebrate,phyloP20way_mammalian,phastCons7way_vertebrate,phastCons20way_mammalian,SiPhy_29way_logOdds,ExAC_ALL,ExAC_AFR,ExAC_AMR,ExAC_EAS,ExAC_FIN,ExAC_NFE,ExAC_OTH,ExAC_SAS,EAS.sites.2015_08,ALL.sites.2015_08
chr1,14574,.,A,G,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.
chr1,14590,.,G,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.
chr1,14599,.,T,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.
chr1,14604,.,A,G,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.
chr1,14610,.,T,C,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.
chr1,14653,.,C,T,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.
chr1,14932,.,G,T,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.
chr1,14937,.,T,C,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.
chr1,15903,.,G,GC,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.

流程是走完了,也没有报错,也没有结果,很是奇怪。

上一篇 下一篇

猜你喜欢

热点阅读