AnnotationDbi 使用(以 org.Hs.eg.db
2019-11-26 本文已影响0人
BeeBee生信
分析芯片数据应该都接触过 Bioconductor 上的注释包,像人种 org.Hs.eg.db 小鼠 org.Mm.eg.db 大鼠 org.Rn.eg.db. AnnotationDbi提供了访问注释包注释信息的方法,以最常用的人种 org.Hs.eg.db 为例子简单示范如何使用。
首先导入包,当然无需导入 AnnotationDbi 只要导入 org.Hs.eg.db 就行了。导入 tidyverse 是为了获取 %>%
操作符。
library(org.Hs.eg.db)
library(tidyverse)
使用 keytypes/columns
函数显示注释包包含哪些注释项目。
> keytypes(org.Hs.eg.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GO" "GOALL" "IPI" "MAP" "OMIM"
[16] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID"
[21] "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG" "UNIGENE"
[26] "UNIPROT"
> columns(org.Hs.eg.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GO" "GOALL" "IPI" "MAP" "OMIM"
[16] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID"
[21] "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG" "UNIGENE"
[26] "UNIPROT"
使用 keys
函数查看注释项目的键。
> keys(org.Hs.eg.db, keytype="PATH") %>% head()
[1] "04610" "00232" "00983" "01100" "00380" "00970"
> keys(org.Hs.eg.db, keytype="SYMBOL") %>% head()
[1] "A1BG" "A2M" "A2MP1" "NAT1" "NAT2" "NATP"
用 select
函数返回需要的注释数据。例子展示根据基因名返回 ENTREZID 和 UNIPROT ID
> symbols <- keys(org.Hs.eg.db, keytype="SYMBOL")[1:10]
> symbols
[1] "A1BG" "A2M" "A2MP1" "NAT1" "NAT2" "NATP"
[7] "SERPINA3" "AADAC" "AAMP" "AANAT"
> AnnotationDbi::select(org.Hs.eg.db, keys=symbols, columns=c("ENTREZID", "UNIPROT"), keytype="SYMBOL")
'select()' returned 1:many mapping between keys and columns
SYMBOL ENTREZID UNIPROT
1 A1BG 1 P04217
2 A1BG 1 V9HWD8
3 A2M 2 P01023
4 A2MP1 3 <NA>
5 NAT1 9 P18440
6 NAT1 9 Q400J6
7 NAT1 9 F5H5R8
8 NAT2 10 A4Z6T7
9 NAT2 10 P11245
10 NATP 11 <NA>
11 SERPINA3 12 A0A024R6P0
12 SERPINA3 12 P01011
13 AADAC 13 P22760
14 AAMP 14 A0A024R410
15 AAMP 14 Q13685
16 AAMP 14 C9JEH3
17 AANAT 15 F1T0I5
18 AANAT 15 Q16613