非模式生物基因GO富集分析:以水稻为例
2017-11-02 本文已影响690人
sober01
模式生物做什么都简单,非模式生物则很多缺少注释,没有注释你就没法做,只能是借助于各种软件比如blastgo,自己跑电子注释。但今天要讲的不是这种情况,很多物种还是有注释的,只是你有时候不知道该去那里下载,或者你有数据,却不知道该怎么用!很多的软件都是针对模式生物的,或者针对某一些类型的非模式生物,能够支持多种非模式生物,能够支持用户自己的注释文件的软件相对来讲,就非常少有了,然而clusterProfiler就是这类少有的软件之一。
获得OrgDb
- 今天要讲的是通过OrgDb来做GO分析,这是clusterProfiler的enrichGO函数所支持的背景注释,Bioconductor自带20个OrgDb可供使用,多半是模式生物,难道我们要做的物种不在这20个里面就不行了吗?显然不是的,clusterProfiler能支持的物种我自己都数不过来。
我们可以通过AnnotationHub在线检索并抓取OrgDb,比如这里以rice为例:
> require(AnnotationHub)
> hub <- AnnotationHub()
> query(hub, "oryza sativa")
AnnotationHub with 2 records
# snapshotDate(): 2017-04-25
# $dataprovider: Inparanoid8, ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Oryza sativa, Oryza sativa_Japonica_Group
# $rdataclass: Inparanoid8Db, OrgDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH10561"]]'
title
AH10561 | hom.Oryza_sativa.inp8.sqlite
AH55775 | org.Oryza_sativa_Japonica_Group.eg.sqlite
> rice <- hub[['AH55775']]
downloading from ‘https://annotationhub.bioconductor.org/fetch/62513’
retrieving 1 resource
|======================================================================| 100%
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: ‘Biobase’
The following object is masked from ‘package:AnnotationHub’:
cache
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
> length(keys(rice))[1]
[1] 32639
#通过检索,org.Oryza_sativa_Japonica_Group.eg.sqlite就是我们所要的OrgDb,可以通过相应的accession number, AH55775 抓取文件,并存入了rice对象中,它包含了32639个基因的注释
这个OrgDb,包含有以下一些注释信息:
> columns(rice)
[1] "ACCNUM" "ALIAS" "CHR" "ENTREZID" "EVIDENCE"
[6] "EVIDENCEALL" "GENENAME" "GID" "GO" "GOALL"
[11] "ONTOLOGY" "ONTOLOGYALL" "PMID" "REFSEQ" "SYMBOL"
##我们可以使用bitr来转换ID,甚至于直接检索GO注释:
> require(clusterProfiler)
> bitr(keys(rice)[1], 'ENTREZID', c("REFSEQ", "GO", "ONTOLOGY"), rice)
'select()' returned 1:many mapping between keys and columns
ENTREZID REFSEQ GO ONTOLOGY
1 3131385 NP_039457.2 GO:0005739 CC
2 3131385 NP_039457.2 GO:0005763 CC
##GO富集分析
> sample_genes <- keys(rice)[1:100]
> head(sample_genes)
[1] "3131385" "3131390" "3131391" "3131392" "3131393" "3131394"
##这里只是简单地使用ID列表中前100个ENTREZ基因ID,也可以使用其它的ID,通过借助于bitr进行转换,或者通过给enrichGO指定ID类型(keyType参数)。
> 有了OrgDb,使用起来,就跟文档中使用人类基因做为例子一样,用法一致,并且也可以通过clusterProfiler所提供的各种可视化函数对结果进行展示:
> require(clusterProfiler)
> res = enrichGO(sample_genes, OrgDb=rice, pvalueCutoff=1, qvalueCutoff=1)
> res
#
# over-representation test
#
#...@organism Oryza sativa_Japonica_Group
#...@ontology MF
#...@keytype ENTREZID
#...@gene chr [1:100] "3131385" "3131390" "3131391" "3131392" "3131393" "3131394" ...
#...pvalues adjusted by 'BH' with cutoff <1
#...28 enriched terms found
'data.frame': 28 obs. of 9 variables:
$ ID : chr "GO:0003735" "GO:0005198" "GO:0003723" "GO:0016830" ...
$ Description: chr "structural constituent of ribosome" "structural molecule activity" "RNA binding" "carbon-carbon lyase activity" ...
$ GeneRatio : chr "7/12" "7/12" "2/12" "1/12" ...
$ BgRatio : chr "22/478" "22/478" "14/478" "10/478" ...
$ pvalue : num 1.08e-07 1.08e-07 4.45e-02 2.26e-01 3.24e-01 ...
$ p.adjust : num 1.52e-06 1.52e-06 4.15e-01 1.00 1.00 ...
$ qvalue : num 1.42e-06 1.42e-06 3.90e-01 9.40e-01 9.40e-01 ...
$ geneID : chr "3131425/3131435/3131436/3131439/3131441/3131442/3131457" "3131425/3131435/3131436/3131439/3131441/3131442/3131457" "3131425/3131457" "3131463" ...
$ Count : int 7 7 2 1 3 1 2 4 4 1 ...
#...Citation
Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
clusterProfiler: an R package for comparing biological themes among
gene clusters. OMICS: A Journal of Integrative Biology
2012, 16(5):284-287
如果你有表达谱数据,你也可以使用gseGO进行GSEA分析
KEGG Organisms: Complete Genomes:
http://www.genome.jp/kegg/catalog/org_list.html
ref:Y叔
http://guangchuangyu.github.io/2016/01/go-analysis-using-clusterprofiler/