一个R函数完成所有的富集分析

2022-07-31 本文已影响0人 Z_bioinfo

仅仅一个函数，就可以完成差异表达基因的GO，GSEA，KEGG 分析与可视化，下面来看一看

01 需要的R包

library(AnnotationDbi)
library(org.Hs.eg.db)  基因注释需要
library(clusterProfiler)
library(stringr)
library(enrichplot)#画图需要
library(stats)
library(dplyr)
library(msigdbr)

02 定义R函数

首先定义一个R函数，目的是：读入数据；进行富集分析。

#定义R函数
allEnrich <- function(df) {
df = read.csv(df, stringsAsFactors = F,
                header = T)
names(df) = c("gene", "avg_logFC")
#msigdbr 包提供多个物种的MSigDB数据
b = msigdbr(species = "Homo sapiens", category = "C2") %>%
      select(gs_name, entrez_gene)
a1 = df$avg_logFC
a2 = easyConvert(
species = "HUMAN",
queryList = df$gene,
queryType = "SYMBOL"
    ) 
names(a1) = a2$ENTREZID
b1 = a1[order(a1, decreasing = T)]
GSEA分析
gsearesults = GSEA(b1, TERM2GENE = b)
#进行CC分析
ego1 <- enrichGO(
gene         = df$gene,
OrgDb         = org.Hs.eg.db,
keyType       = 'SYMBOL',
ont           = "CC",
pAdjustMethod = "BH",
pvalueCutoff  = 0.01,
qvalueCutoff  = 0.05
)
#绘图
p1 = dotplot(ego1, showCategory = 15) 
ggplot2::ggsave("CC.pdf", p1, width = 7, height = 8)
#输出结果，保存到本地
write.csv(ego1@result, "Cell_component.csv")
=========================================
#进行BP分析
ego2 <- enrichGO(
gene         = df$gene,
OrgDb         = org.Hs.eg.db,
keyType       = 'SYMBOL',
ont           = "BP",
pAdjustMethod = "BH",
pvalueCutoff  = 0.01,
qvalueCutoff  = 0.05
)
#绘图
p2 = dotplot(ego2, showCategory = 15) 
ggplot2::ggsave("BP.pdf", p2, width = 7, height = 8)
#输出结果，保存到本地
write.csv(ego2@result, "Biological_process.csv")
=======================================
#进行MF分析
ego3 <- enrichGO(
gene         = df$gene,
OrgDb         = org.Hs.eg.db,
keyType       = 'SYMBOL',
ont           = "MF",
pAdjustMethod = "BH",
pvalueCutoff  = 0.01,
qvalueCutoff  = 0.05
)
#绘图
p3 = dotplot(ego3, showCategory = 15) 
ggplot2::ggsave("MF.pdf", p3, width = 7, height = 8)
#输出结果，保存到本地
write.csv(ego3@result, "Molecular_function.csv")
=========================================
GSEA可视化
p4 = gsearesults@result %>%
mutate(Description = gsub("_", " ", Description)) %>%
mutate(Description = reorder(Description, setSize)) %>% head(15) %>%
ggplot(aes(
x = enrichmentScore,
y = Description,
size = setSize,
color = pvalue
)) +
geom_point() +
scale_size(range = c(1, 5), name = "Size") +
scale_color_gradient(low = "blue", high = "red") +
xlab("GeneRatio") + ylab("") +
theme(axis.text.y = element_text(size = 10, color = "black")) 

ggplot2::ggsave("GSEA.pdf", p4, width = 7, height = 8)
write.csv(gsearesults@result, "GSEA_Results.csv")
==================================
KEGG分析
ego4 = enrichKEGG(
gene = names(a1),
organism = "hsa",
keyType = "kegg",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
minGSSize = 10,
maxGSSize = 500,
qvalueCutoff = 0.2,
use_internal_data = FALSE
)
p5 = dotplot(ego4, showCategory = 15) 
ggplot2::ggsave("KEGG.pdf", p5, width = 7, height = 8)
write.csv(ego4@result, "KEGG_pathway.csv")
    cat("Successfully~") 
}

03 读入数据分析

使用定义好的R函数：allEnrich，读入数据进行分析，数据格式必须为2列：

第一列 geneSymbol

第二类 Log2FoldChange

#给出包含geneSymbol，Log2FoldChange所在的csv文件
 allEnrich("significant_degs.csv")

 preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
wrong orderBy parameter; set to default `orderBy = "x"`
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
wrong orderBy parameter; set to default `orderBy = "x"`
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
wrong orderBy parameter; set to default `orderBy = "x"`
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
Reading KEGG annotation online:

Reading KEGG annotation online:

wrong orderBy parameter; set to default `orderBy = "x"`
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
Successfully~

成功完成分析后，这是富集分析输出的所有结果：

image.png

查看结果文件：

image.png

一个R函数完成所有的富集分析

01 需要的R包

02 定义R函数

03 读入数据分析

猜你喜欢

热点阅读