实验记录6:富集分析
梗概
- GO富集分析相关知识
- 利用clusterProfiler进行,需要基因列表的Entrez.ID以及FC值(fold change倍数变化)
- 参考做法
①Statistical analysis and visualization of functional profiles for genes and gene clusters(含clusterProfiler包引用文献)
或在R中执行命令:
browseVignettes("clusterProfiler")
②GEO数据挖掘小尝试:(三)利用clusterProfiler进行富集分析
什么是GO富集分析?
GeneOntology富集分析是高通量数据分析的标配,不管是转录组、甲基化、ChIP-seq还是重测序,都会用到对一个或多个集合的基因进行功能富集分析。分析结果可以指示这个集合的基因具有什么样的功能偏好性,进而据此判断相应的生物学意义。
GO富集分析是先筛选差异基因,再判断差异基因在哪些注释的通路存在富集。
GO富集分析中的超几何分布:https://www.jianshu.com/p/13f46bebebd4
进行富集分析
1. 安装r包clusterProfiler
在本地:
BiocManager::install("clusterProfiler", version = "3.8")
2.基因列表Entrez ID与FC值的准备
利用一些差异表达的基因作为输入的基因,之前Seurat的FindAllMarkers
有一批这些基因,但是没有EntrezID,需要进行一些处理。
先查看一下:
spleen.markers <- FindAllMarkers(object = spleen, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
print(x= head(x=spleen.markers,n = 10))
p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
MS4A1 2.410289e-189 1.592266 0.974 0.258 3.773307e-185 0 MS4A1
CD79A 1.245084e-177 1.485304 0.971 0.270 1.949180e-173 0 CD79A
HLA-DRA 1.792014e-151 1.370242 1.000 0.556 2.805397e-147 0 HLA-DRA
CD79B 2.280181e-149 1.265725 0.938 0.315 3.569624e-145 0 CD79B
CD74 3.217361e-144 1.256090 0.997 0.871 5.036778e-140 0 CD74
HLA-DPB1 4.238869e-137 1.154889 0.997 0.616 6.635949e-133 0 HLA-DPB1
HLA-DRB5 6.911780e-136 1.123850 0.990 0.437 1.082039e-131 0 HLA-DRB5
HLA-DQB1 4.004476e-134 1.121013 0.940 0.366 6.269007e-130 0 HLA-DQB1
HLA-DRB1 2.026996e-133 1.115709 0.992 0.452 3.173262e-129 0 HLA-DRB1
HLA-DPA1 3.492170e-130 1.099676 0.995 0.570 5.466992e-126 0 HLA-DPA1
spleen.markers一共有3,108个基因marker,保存为csv文件
(后来发现这一步是多余的,无需保存为文件,可直接提取基因列表)
write.csv(spleen.markers, file = "/Users/shinianyike/Desktop/zll/data/spleen/spleen_markers.csv")
获取基因列表
genelist <- spleen.markers$gene
head(genelist)
[1] "MS4A1" "CD79A" "HLA-DRA" "CD79B" "CD74" "HLA-DPB1"
转化为Entrez ID
library(clusterProfiler)
s.EntrezID <- bitr(genelist, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
Loading required package: org.Hs.eg.db
Error in eval(parse(text = OrgDb)) : 找不到对象'org.Hs.eg.db'
需要下载人类基因组注释的数据org.Hs.eg.db
安装与细节可在此网站查看Genome wide annotation for Human(含引用文献):
install.packages("org.Hs.eg.db")
Warning in install.packages :
package ‘org.Hs.eg.db’ is not available (for R version 3.4.4)
查看是否为R版本不够,当前版本为3.4.4。下为官方网站截图:
只需要版本2.7及以上,R版本是足够的
因此应该是安装命令的问题。
因此R < 3.5.0的版本应该用其他命令进行包的安装:
更换命令安装
BiocInstaller::biocLite("org.Hs.eg.db")
顺利完成。
再运行一次之前的命令。
s.EntrezID <- bitr(genelist, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
head(s.EntrezID)
SYMBOL ENTREZID
1 MS4A1 931
2 CD79A 973
3 HLA-DRA 3122
4 CD79B 974
5 CD74 972
6 HLA-DPB1 3115
查看行数:
dim(s.EntrezID)
[1] 2043 2
原本是3,108个,现在剩下2043个基因,原因是去除了重复的个数。
3. 富集分析
接下来进行富集分析:
s.ego <- enrichGO(gene = s.EntrezID.df$ENTREZID, OrgDb = 'org.Hs.eg.db', ont = 'ALL', pAdjustMethod = 'BH',pvalueCutoff = 0.05, qvalueCutoff = 0.2, keyType = 'ENTREZID')
head(s.ego)
筛选的p值为0.05,q值为0.2
ONTOLOGY ID Description GeneRatio
GO:0042119 BP GO:0042119 neutrophil activation 211/1931
GO:0002283 BP GO:0002283 neutrophil activation involved in immune response 208/1931
GO:0043312 BP GO:0043312 neutrophil degranulation 207/1931
GO:0002446 BP GO:0002446 neutrophil mediated immunity 208/1931
GO:0006613 BP GO:0006613 cotranslational protein targeting to membrane 77/1931
GO:0045047 BP GO:0045047 protein targeting to ER 78/1931
BgRatio pvalue p.adjust qvalue
GO:0042119 496/17381 2.402001e-74 1.352326e-70 9.264138e-71
GO:0002283 486/17381 7.667076e-74 2.158282e-70 1.478535e-70
GO:0043312 485/17381 3.218981e-73 6.040954e-70 4.138367e-70
GO:0002446 499/17381 2.372136e-71 3.338782e-68 2.287239e-68
GO:0006613 96/17381 5.653664e-56 6.366026e-53 4.361058e-53
GO:0045047 100/17381 5.659096e-55 5.310118e-52 3.637706e-52
geneID
GO:0042119 HVCN1/EEF2/SELL/IMPDH2/PTPN6/CTSH/CCL5/B2M/HSPA8/PTPRC/RAB27A/STOM/SURF4/HSP90AA1/BIN2/HSPA1A/HSPA6/HSPA1B/ATP6V0A1/CTSZ/RNASET2/HSP90AB1/SERPINA1/FCN1/CST3/CD68/CFP/FGL2/S100A12/FPR1/LYZ/CD14/CD36/MNDA/CFD/FCGR2A/MGST1/TLR2/CD93/C5AR1/TYROBP/FCER1G/LGALS3/
………………
………………
Count
GO:0042119 211
GO:0002283 208
GO:0043312 207
GO:0002446 208
GO:0006613 77
GO:0045047 78
剩下1931个基因:
GO:0042119 211/1931 neutrophil activation 中性粒细胞活化
GO:0002283 208/1931 neutrophil activation involved in immune response 参与免疫应答的中性粒细胞活化
GO:0043312 207/1931 neutrophil degranulation 中性粒细胞脱颗粒
GO:0002446 208/1931 neutrophil mediated immunity 中性粒细胞介导的免疫效应
GO:0006613 77/1931 cotranslational protein targeting to membrane 共翻译蛋白靶向膜
GO:0045047 78/1931 protein targeting to ER 蛋白质靶向内质网
讨论:有很多的中性粒细胞,而中性粒细胞一般在发生炎症的组织中才会出现。有点怀疑这个数据不是正常人的数据,而是疾病组织的数据。具体还要查看下HCA网站。
补充:后发现,确实这不是一个正常的数据,而是经过缺血处理的。
dim(s.ego)
[1] 1884 10
> dim(s.ego[s.ego$ONTOLOGY=='BP',])
[1] 1507 10
> dim(s.ego[s.ego$ONTOLOGY=='CC',])
[1] 215 10
> dim(s.ego[s.ego$ONTOLOGY=='MF',])
[1] 162 10
看来大部分的基因富集在BP中
4. 数据可视化
s.barplot <- barplot(s.ego, showCategory = 20, drop = T)
s.barplot
spleen_GObarplot.jpeg
s.barplot <- barplot(s.ego, showCategory = 25, drop = T)
s.barplot
spleen_GObarplot2.jpeg
dotplot(s.ego)
spleen_GOdotplot.jpeg
enrichMap(s.ego)
spleen_goMap.jpeg
感觉有点崩
2018-11-20
一些想法:
应该将每个cluster的基因都提取出来进行GO富集分析
得到每个cluster的功能主题
2018-11-23
以下部分为将每个cluster的top10基因提取出来做富集分析,举例第一个cluster的基因提取步骤,其他cluster的步骤相同:
spleen.mc0 <- filter(spleen.markers, cluster == 0)
genelist_spleen_c0 <- spleen.mc0$gene
head(spleen.mc0)
p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
1 2.410289e-189 1.592266 0.974 0.258 3.773307e-185 0 MS4A1
2 1.245084e-177 1.485304 0.971 0.270 1.949180e-173 0 CD79A
3 1.792014e-151 1.370242 1.000 0.556 2.805397e-147 0 HLA-DRA
4 2.280181e-149 1.265725 0.938 0.315 3.569624e-145 0 CD79B
5 3.217361e-144 1.256090 0.997 0.871 5.036778e-140 0 CD74
6 4.238869e-137 1.154889 0.997 0.616 6.635949e-133 0 HLA-DPB1
dim(spleen.mc0)
[1] 177 7
其他cluster的操作与cluster0相同。
genelist_spleen_c1 <- spleen.mc1$gene
spleen.mc2 <- filter(spleen.markers, cluster == 2)
genelist_spleen_c2 <- spleen.mc2$gene
spleen.mc3 <- filter(spleen.markers, cluster == 3)
genelist_spleen_c3 <- spleen.mc3$gene
spleen.mc4 <- filter(spleen.markers, cluster == 4)
genelist_spleen_c4 <- spleen.mc4$gene
spleen.mc5 <- filter(spleen.markers, cluster == 5)
genelist_spleen_c5 <- spleen.mc5$gene
spleen.mc6 <- filter(spleen.markers, cluster == 6)
genelist_spleen_c6 <- spleen.mc6$gene
spleen.mc7 <- filter(spleen.markers, cluster == 7)
genelist_spleen_c7 <- spleen.mc7$gene
spleen.mc8 <- filter(spleen.markers, cluster == 8)
genelist_spleen_c8 <- spleen.mc8$gene
每个cluster所对应基因数目.png
作柱形图
barplot(sc0_ego_BP, showCategory=12,font.size=7,title="EnrichmentGO_BP")
sc0_ego_BP <- enrichGO(gene = genelist_spleen_c0,OrgDb = org.Hg.eg.db,keytype = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.05)
sc1_ego_BP <- enrichGO(gene = genelist_spleen_c1,OrgDb = org.Hg.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.05)
sc2_ego_BP <- enrichGO(gene = genelist_spleen_c2,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.05)
sc3_ego_BP <- enrichGO(gene = genelist_spleen_c3,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.05)
sc4_ego_BP <- enrichGO(gene = genelist_spleen_c4,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.05)
sc5_ego_BP <- enrichGO(gene = genelist_spleen_c5,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.05)
sc6_ego_BP <- enrichGO(gene = genelist_spleen_c6,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.05)
sc7_ego_BP <- enrichGO(gene = genelist_spleen_c7,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.05)
sc8_ego_BP <- enrichGO(gene = genelist_spleen_c8,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.05)
sc0.EntrezID <- bitr(genelist_spleen_c0, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
sc1.EntrezID <- bitr(genelist_spleen_c1, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
sc2.EntrezID <- bitr(genelist_spleen_c2, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
sc3.EntrezID <- bitr(genelist_spleen_c3, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
sc4.EntrezID <- bitr(genelist_spleen_c4, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
sc5.EntrezID <- bitr(genelist_spleen_c5, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
sc6.EntrezID <- bitr(genelist_spleen_c6, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
sc7.EntrezID <- bitr(genelist_spleen_c7, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
sc8.EntrezID <- bitr(genelist_spleen_c8, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
> sc0_ggo <- groupGO(gene = genelist_spleen_c0,OrgDb = 'org.Hs.eg.db',keytype = "SYMBOL", ont = "CC", level= 3)
> head(sc0_ggo)
ID Description Count GeneRatio
GO:0005886 GO:0005886 plasma membrane 60 60/177
GO:0005628 GO:0005628 prospore membrane 0 0/177
GO:0005789 GO:0005789 endoplasmic reticulum membrane 13 13/177
GO:0019867 GO:0019867 outer membrane 0 0/177
GO:0031090 GO:0031090 organelle membrane 31 31/177
GO:0034357 GO:0034357 photosynthetic membrane 0 0/177
geneID
GO:0005886 MS4A1/CD79A/HLA-DRA/CD79B/CD74/HLA-DPB1/HLA-DRB5/HLA-DQB1/HLA-DRB1/HLA-DPA1/HLA-DQA2/CD37/HLA-DQA1/IGHD/FCER2/HLA-DMB/CD22/HLA-DMA/CD52/HLA-DOB/ADAM28/CD19/CD24/LY86/RPSA/TNFRSF13C/CD72/MARCH1/LTB/BLK/BLNK/GNG7/RALGPS2/SWAP70/SYPL1/NCF1/LAPTM5/HVCN1/BTK/EEF2/TSPAN13/STX7/CD40/FCRL2/SELL/TNFRSF13B/FCRL1/DRAM2/PLPP5/P2RX5/TSPAN3/PTPN6/RGS19/CD82/DAPP1/LAT2/INPP5D/RASGRP2/MARCKSL1/PNN
GO:0005628
GO:0005789 HLA-DRA/CD74/HLA-DPB1/HLA-DRB5/HLA-DQB1/HLA-DRB1/HLA-DPA1/HLA-DQA2/HLA-DQA1/MARCH1/SMIM14/RIC3/SEC62
GO:0019867
GO:0031090 HLA-DRA/CD74/HLA-DPB1/HLA-DRB5/HLA-DQB1/HLA-DRB1/HLA-DPA1/HLA-DQA2/HLA-DQA1/HLA-DMB/HLA-DMA/HLA-DOB/MARCH1/SYPL1/LAPTM5/HVCN1/SYNGR2/CYB561A3/STX7/SNX2/SELL/DRAM2/P2RX5/IMPDH2/RIC3/SLC25A6/GGA2/PLEKHF2/SMDT1/BLOC1S2/CHPT1
GO:0034357
write.table(sc0_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/pictures/12-6/sc0_ego_BP.csv", sep=",", row.names = FALSE)
sc0_ego_BP.png
作富集图
enrichMap(sc0_ego_BP)
enrichMap(sc1_ego_BP)
enrichMap(sc2_ego_BP)
enrichMap(sc3_ego_BP)
enrichMap(sc4_ego_BP)
enrichMap(sc5_ego_BP)
enrichMap(sc6_ego_BP)
enrichMap(sc7_ego_BP)
enrichMap(sc8_ego_BP)
12月15日:
library(dplyr)
library(clusterProfiler)
library(org.Hs.eg.db)
keytypes(org.Hs.eg.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "ENTREZID"
[7] "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GO" "GOALL"
[13] "IPI" "MAP" "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH"
[19] "PFAM" "PMID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
[25] "UNIGENE" "UNIPROT"
spleen_genelist = bitr(spleen_genelist, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
head(spleen_genelist)
SYMBOL ENTREZID
1 MS4A1 931
2 CD79A 973
3 HLA-DRA 3122
4 CD79B 974
5 CD74 972
6 HLA-DPB1 3115
以上不保存,重做:
library(dplyr)
library(clusterProfiler)
library(org.Hs.eg.db)
keytypes(org.Hs.eg.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "ENTREZID"
[7] "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GO" "GOALL"
[13] "IPI" "MAP" "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH"
[19] "PFAM" "PMID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
[25] "UNIGENE" "UNIPROT"
head(spleen.markers)
p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
MS4A1 2.410289e-189 1.592266 0.974 0.258 3.773307e-185 0 MS4A1
CD79A 1.245084e-177 1.485304 0.971 0.270 1.949180e-173 0 CD79A
HLA-DRA 1.792014e-151 1.370242 1.000 0.556 2.805397e-147 0 HLA-DRA
CD79B 2.280181e-149 1.265725 0.938 0.315 3.569624e-145 0 CD79B
CD74 3.217361e-144 1.256090 0.997 0.871 5.036778e-140 0 CD74
HLA-DPB1 4.238869e-137 1.154889 0.997 0.616 6.635949e-133 0 HLA-DPB1
spleen_genelist = spleen.markers[,2]
names(spleen_genelist) = as.character(spleen.markers[,7])
spleen_genelist = sort(spleen_genelist, decreasing = TRUE)
head(spleen_genelist)
IGHG3 S100A8 IGHG1 S100A9 IGHA1 IGHG4
5.564483 5.244824 5.221544 5.142906 5.027861 4.971419
spleen_ENTREZ <- bitr(names(spleen_genelist), fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
head(spleen_ENTREZ)
SYMBOL ENTREZID
1 IGHG3 3502
2 S100A8 6279
3 IGHG1 3500
4 S100A9 6280
5 IGHA1 3493
6 IGHG4 3503
spleen_ggo <- groupGO(gene = spleen_ENTREZ$ENTREZID,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 3,
readable = TRUE)
head(spleen_ggo)
Description Count
GO:0019953 sexual reproduction 67
GO:0019954 asexual reproduction 0
GO:0022414 reproductive process 149
GO:0032504 multicellular organism reproduction 70
GO:0032505 reproduction of a single-celled organism 0
GO:0061887 reproduction of symbiont in host 0
spleen_ego <- enrichGO(gene=spleen_ENTREZ$ENTREZID,
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05,
readable = TRUE)
head(spleen_ego)
ID Description GeneRatio
GO:0042119 GO:0042119 neutrophil activation 211/1931
GO:0002283 GO:0002283 neutrophil activation involved in immune response 208/1931
GO:0043312 GO:0043312 neutrophil degranulation 207/1931
GO:0002446 GO:0002446 neutrophil mediated immunity 208/1931
GO:0006613 GO:0006613 cotranslational protein targeting to membrane 77/1931
GO:0045047 GO:0045047 protein targeting to ER 78/1931
KEGG分析:
search_kegg_organism('hsa', by='kegg_code')
kegg_code scientific_name common_name
1 hsa Homo sapiens human
human <- search_kegg_organism('Homo sapiens', by='scientific_name')
dim(human)
[1] 1 3
kk <- enrichKEGG(gene = spleen_ENTREZ$ENTREZID,
organism = 'hsa',
pvalueCutoff = 0.05)
head(kk)
ID Description GeneRatio BgRatio
hsa03010 hsa03010 Ribosome 81/1163 153/7470
hsa00190 hsa00190 Oxidative phosphorylation 67/1163 133/7470
hsa05012 hsa05012 Parkinson disease 67/1163 142/7470
hsa04141 hsa04141 Protein processing in endoplasmic reticulum 73/1163 165/7470
hsa04145 hsa04145 Phagosome 67/1163 152/7470
hsa05169 hsa05169 Epstein-Barr virus infection 80/1163 201/7470
mkk <- enrichMKEGG(gene = spleen_ENTREZ$ENTREZID,organism = 'hsa')
head(mkk)
ID Description GeneRatio BgRatio
hsa03010 hsa03010 Ribosome 81/1163 153/7470
hsa00190 hsa00190 Oxidative phosphorylation 67/1163 133/7470
hsa05012 hsa05012 Parkinson disease 67/1163 142/7470
hsa04141 hsa04141 Protein processing in endoplasmic reticulum 73/1163 165/7470
hsa04145 hsa04145 Phagosome 67/1163 152/7470
hsa05169 hsa05169 Epstein-Barr virus infection 80/1163 201/7470
enrichMap(spleen_ego, n=15)
cnetplot(spleen_ego, showCategory= 2, categorySize="pvalue", foldChange=spleen_genelist)
同上述操作步骤,将脾脏的每一个cluster做KEGG分析:
sc0_kk <- enrichKEGG(gene = sc0.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc1_kk <- enrichKEGG(gene = sc1.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc2_kk <- enrichKEGG(gene = sc2.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc3_kk <- enrichKEGG(gene = sc3.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc4_kk <- enrichKEGG(gene = sc4.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc5_kk <- enrichKEGG(gene = sc5.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc6_kk <- enrichKEGG(gene = sc6.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc7_kk <- enrichKEGG(gene = sc7.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc8_kk <- enrichKEGG(gene = sc8.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc0_mkk <- enrichMKEGG(gene = sc0.EntrezID$ENTREZID,organism = 'hsa')
sc1_mkk <- enrichMKEGG(gene = sc1.EntrezID$ENTREZID,organism = 'hsa')
sc2_mkk <- enrichMKEGG(gene = sc2.EntrezID$ENTREZID,organism = 'hsa')
sc3_mkk <- enrichMKEGG(gene = sc3.EntrezID$ENTREZID,organism = 'hsa')
sc4_mkk <- enrichMKEGG(gene = sc4.EntrezID$ENTREZID,organism = 'hsa')
sc5_mkk <- enrichMKEGG(gene = sc5.EntrezID$ENTREZID,organism = 'hsa')
sc6_mkk <- enrichMKEGG(gene = sc6.EntrezID$ENTREZID,organism = 'hsa')
sc7_mkk <- enrichMKEGG(gene = sc7.EntrezID$ENTREZID,organism = 'hsa')
sc8_mkk <- enrichMKEGG(gene = sc8.EntrezID$ENTREZID,organism = 'hsa')
enrichMap(sc3_kk, n = 8)
enrichMap(sc4_kk, n = 8)
enrichMap(sc5_kk, n = 8)
enrichMap(sc6_kk, n = 8)
enrichMap(sc7_kk, n = 8)
enrichMap(sc8_kk, n = 8)
write.csv(sc0_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc0_ego_BP.csv")
write.csv(sc1_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc1_ego_BP.csv")
write.csv(sc2_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc2_ego_BP.csv")
write.csv(sc3_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc3_ego_BP.csv")
write.csv(sc4_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc4_ego_BP.csv")
write.csv(sc5_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc5_ego_BP.csv")
write.csv(sc6_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc6_ego_BP.csv")
write.csv(sc7_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc7_ego_BP.csv")
write.csv(sc8_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc8_ego_BP.csv")
#导出KEGG的富集信息
write.csv(sc0_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc0_kk.csv")
write.csv(sc1_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc1_kk.csv")
write.csv(sc2_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc2_kk.csv")
write.csv(sc3_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc3_kk.csv")
write.csv(sc4_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc4_kk.csv")
write.csv(sc5_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc5_kk.csv")
write.csv(sc6_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc6_kk.csv")
write.csv(sc7_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc7_kk.csv")
write.csv(sc8_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc8_kk.csv")
#导出MKEGG的富集信息
write.csv(sc0_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc0_mkk.csv")
write.csv(sc1_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc1_mkk.csv")
write.csv(sc2_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc2_mkk.csv")
write.csv(sc3_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc3_mkk.csv")
write.csv(sc4_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc4_mkk.csv")
write.csv(sc5_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc5_mkk.csv")
write.csv(sc6_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc6_mkk.csv")
write.csv(sc7_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc7_mkk.csv")
write.csv(sc8_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc8_mkk.csv")
12月25日
因需要simplify进行GO精简,之前的不太规范,重新做。
library(clusterProfiler)
library(org.Hs.eg.db)
head(spleen.markers)
p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
MS4A1 2.410289e-189 1.592266 0.974 0.258 3.773307e-185 0 MS4A1
CD79A 1.245084e-177 1.485304 0.971 0.270 1.949180e-173 0 CD79A
HLA-DRA 1.792014e-151 1.370242 1.000 0.556 2.805397e-147 0 HLA-DRA
CD79B 2.280181e-149 1.265725 0.938 0.315 3.569624e-145 0 CD79B
CD74 3.217361e-144 1.256090 0.997 0.871 5.036778e-140 0 CD74
HLA-DPB1 4.238869e-137 1.154889 0.997 0.616 6.635949e-133 0 HLA-DPB1
spleen_genelist = spleen.markers[,2]
names(spleen_genelist) = as.character(spleen.markers[,7])
spleen_genelist = sort(spleen_genelist, decreasing = TRUE)
head(spleen_genelist)
IGHG3 S100A8 IGHG1 S100A9 IGHA1 IGHG4
5.564483 5.244824 5.221544 5.142906 5.027861 4.971419
spleen_ENTREZ <- bitr(names(spleen_genelist), fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
head(spleen_ENTREZ)
SYMBOL ENTREZID
1 IGHG3 3502
2 S100A8 6279
3 IGHG1 3500
4 S100A9 6280
5 IGHA1 3493
6 IGHG4 3503
spleen_ego <- enrichGO(gene=spleen_ENTREZ$ENTREZID,
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
qvalueCutoff = 0.2,
readable = TRUE)
spleen.simp<- simplify(spleen_ego) # 去除重复GO
dotplot(spleen.simp, showCategory=10, title="GO_biological process")
每个cluster的也一样,去除重复GO:
sc0_ego_BP_simp <- simplify(sc0_ego_BP)
sc1_ego_BP_simp <- simplify(sc1_ego_BP)
sc2_ego_BP_simp <- simplify(sc2_ego_BP)
sc3_ego_BP_simp <- simplify(sc3_ego_BP)
sc4_ego_BP_simp <- simplify(sc4_ego_BP)
sc5_ego_BP_simp <- simplify(sc5_ego_BP)
sc6_ego_BP_simp <- simplify(sc6_ego_BP)
sc7_ego_BP_simp <- simplify(sc7_ego_BP)
sc8_ego_BP_simp <- simplify(sc8_ego_BP)
dotplot(sc0_ego_BP_simp, showCategory=10, title="GO biological process of cluster 0")
dotplot(sc1_ego_BP_simp, showCategory=10, title="GO biological process of cluster 1")
dotplot(sc2_ego_BP_simp, showCategory=10, title="GO biological process of cluster 2")
dotplot(sc3_ego_BP_simp, showCategory=10, title="GO biological process of cluster 3")
dotplot(sc4_ego_BP_simp, showCategory=10, title="GO biological process of cluster 4")
dotplot(sc5_ego_BP_simp, showCategory=10, title="GO biological process of cluster 5")
dotplot(sc6_ego_BP_simp, showCategory=10, title="GO biological process of cluster 6")
dotplot(sc7_ego_BP_simp, showCategory=10, title="GO biological process of cluster 7")
dotplot(sc8_ego_BP_simp, showCategory=10, title="GO biological process of cluster 8")
spleen.simp_ego.jpeg
ego_sc0_simp.jpeg
ego_sc1_simp.jpeg
ego_sc2_simp.jpeg
ego_sc3_simp.jpeg
ego_sc4_simp.jpeg
ego_sc5_simp.jpeg
ego_sc6_simp.jpeg
ego_sc7_simp.jpeg
ego_sc8_simp.jpeg
KEGG
sc0_kk <- enrichKEGG(gene = sc0.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc1_kk <- enrichKEGG(gene = sc1.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc2_kk <- enrichKEGG(gene = sc2.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc3_kk <- enrichKEGG(gene = sc3.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc4_kk <- enrichKEGG(gene = sc4.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc5_kk <- enrichKEGG(gene = sc5.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc6_kk <- enrichKEGG(gene = sc6.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc7_kk <- enrichKEGG(gene = sc7.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
sc8_kk <- enrichKEGG(gene = sc8.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
dotplot(sc0_kk, showCategory=10, title="KEGG enrichment of cluster 0")
dotplot(sc1_kk, showCategory=10, title="KEGG enrichment of cluster 1")
dotplot(sc2_kk, showCategory=10, title="KEGG enrichment of cluster 2")
dotplot(sc3_kk, showCategory=10, title="KEGG enrichment of cluster 3")
dotplot(sc4_kk, showCategory=10, title="KEGG enrichment of cluster 4")
dotplot(sc5_kk, showCategory=10, title="KEGG enrichment of cluster 5")
dotplot(sc6_kk, showCategory=10, title="KEGG enrichment of cluster 6")
dotplot(sc7_kk, showCategory=10, title="KEGG enrichment of cluster 7")
dotplot(sc8_kk, showCategory=10, title="KEGG enrichment of cluster 8")