实验记录14: clusterProfiler富集分析

2019-04-08 本文已影响167人 MC学公卫

之前的实验记录写的比较乱，这几天又开始做富集分析了，重写一次。这次写清楚一些！

Y叔的document镇楼：
http://www.bioconductor.org/packages/release/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html

1. 准备工作：

安装clusterProfiler，下载注释数据包

if (!requireNamespace("BiocManager", quietly = TRUE))
       install.packages("BiocManager")
BiocManager::install("clusterProfiler", version = "3.8")
# 下载人类的注释数据库，若物种是其他的请下载其他的！
BiocManager::install("org.Hs.eg.db", version = "3.8")

# 设置工作路径
setwd("/Users/mingchen/Desktop/thesis/data/immune_census/bone_marrow/scanpy/MKgene&CellType/3_29_MKgenes")
# 加载包
library(clusterProfiler)
library(org.Hs.eg.db)

在开始前，你要准备一串你要研究的基因，比如像这样：

BM-7_PC16.txt

2. GO（基因功能）富集分析

处理基因列表：

gs = read.table("./BM-7_PC16.txt")
gs = as.list(gs)
gs = gs[[1]]

将SYMBOL形式的基因名转化为ENTREZ ID。
这一步其实是不一定需要的，GO富集分析可以输入SYMBOL形式的基因列表，但是KEGG只能输入ENTREZ ID。

gs = bitr(gs, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
head(gs)
  SYMBOL ENTREZID
1   NKG7     4818
2   GNLY    10578
3   GZMB     3002
4    B2M      567
5   CST7     8530
6   GZMA     3001

GO分析：

ego.bp = enrichGO(  gene         =  gs$ENTREZID,
                    OrgDb         = org.Hs.eg.db,
                    ont           = "BP",
                    pAdjustMethod = "BH",
                    pvalueCutoff  = 0.05,
                    qvalueCutoff  = 0.2,
                    readable      = TRUE)

ont = "BP"的意思是biological process，出了BP外还有CC和MF可以选择，它们分别是：
BP，Biological Process 生物学过程
MF，Molecular Function 分子功能
CC，Cellular Component 细胞组成

去除重复GO：

bp.simp<- simplify(ego.bp) 
# 将信息导出为csv文件：
write.csv(bp.simp, file = "./GO/ego/table/ego_7.csv")

可以作点图：

dotplot(bp.simp, showCategory=10,title="GO_biological process_7")

或者条形图：

barplot(bp.simp,showCategory=10)

富集网络图：

emapplot(bp.simp)

3. KEGG（生物学通路）富集分析

kk <- enrichKEGG(gene         = gs$ENTREZID,
                 organism     = 'hsa',
                 pvalueCutoff = 0.05)
write.csv(kk,file="./GO/kegg/table/kegg7.csv")
dotplot(kk, showCategory=10,title="")

如何给过长的注释换行：

问题：

解决方法（摘自y叔公众号biobabble）：

如果使用google搜索”ggplot2 word wrap”的话，第一条链接里就有答案。

这个问题其实很简单，用stringr包的str_wrap来完成文本自动换行就行了。这里使用clusterProfiler的barplot来演示一下：

library(stringr)
library(ggplot2)
library(clusterProfiler)
data(geneList)
de <- names(geneList)[1:100]
x <- enrichKEGG(de)
p <- barplot(x) 
p + scale_x_discrete(labels=function(x) str_wrap(x, width=10))