clusterProfiler优雅对非模式物种进行KEGG富集分
2021-09-11 本文已影响0人
R语言数据分析指南
最近在写进行转录组数据的分析文档,前面介绍了如何根据差异基因绘制火山图,今天来介绍如何通过clusterProfiler包对非模式物种进行KEGG富集分析
加载R包
library(stringr)
library(clusterProfiler)
library(magrittr)
library(tidyverse)
- 提取出差异基因
pull函数将数据转换成字符型向量
gene <- read.delim("genes.counts.DESeq2.xls") %>%
filter(abs(log2FoldChange)>1 & padj < 0.05) %>%
pull(id)
[1] "TF21248" "TF06839" "TF12888" "TF15759" "TF00332" "TF01354" "TF37902"
[8] "TF07750" "TF24638" "TF15758" "TF26429" "TF29285" "TF42188" "TF31923"
[15] "TF02479" "TF20321" "TF01786" "TF35196" "TF14446" "TF38125" "TF38352"
2.导入基因功能注释信息
功能注释可以通过eggnog网站进行注释,由于做KEGG富集分析只需要KEGG_Pathway编号和基因信息,因此从功能注释表提取出相关信息即可
gene_pathway <- read.delim('query_seqs.fa.emapper.annotations.xls',
sep="\t",check.names=F) %>%
dplyr::select(1,KEGG_ko,KEGG_Pathway) %>%
separate_rows(KEGG_Pathway,sep=',',convert = F) %>%
filter(str_detect(KEGG_Pathway,'ko')) %>%
mutate(KEGG_ko=str_remove(KEGG_ko,"ko:")) %>%
select(3,1)
KEGG_Pathway query_name
<chr> <chr>
1 ko00230 TF01677
2 ko00240 TF01677
3 ko01100 TF01677
4 ko03020 TF01677
5 ko00230 TF01678
3.对Ko进行翻译
pathway_name <- clusterProfiler:::kegg_list("pathway") %>%
mutate(across("from",str_replace,"path:map","ko")) %>%
set_colnames(c("path_id","path_name"))
可以通过以下代码对单个Ko号进行注释
bitr_kegg("K03043","kegg", "Path", "ko") -> x
ko2name(x$Path) -> y
merge(x, y, by.x='Path', by.y='ko')
4.enricher进富集分析
dd <- enricher(gene,
TERM2GENE = gene_pathway,
TERM2NAME = pathway_name,
pvalueCutoff = 0.05,
qvalueCutoff = 0.05)
dd %>% as.data.frame() %>% view()
喜欢的小伙伴欢迎关注我的公众号 ,下回更新不迷路
R语言数据分析指南,持续分享数据可视化的经典案例及一些生信知识,希望对大家