R与DAVID注释的区别2020-01-04
library(ggplot2)
library(clusterProfiler)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
library(org.Hs.eg.db)
df <- bitr(unique(DEG$symbol), fromType = "SYMBOL",
toType = c( "ENTREZID"),
OrgDb = org.Hs.eg.db)
head(df)
DEG=DEG[rownames(DEG)%in%gene,]
head(DEG)
DEG=merge(DEG,df,by.y='SYMBOL',by.x='symbol')
head(DEG)
save(DEG,file = 'anno_DEG_SM.Rdata')
gene_up= DEG[DEGg == 'DOWN','ENTREZID']
gene_diff=c(gene_up,gene_down)
gene_all=as.character(DEG[ ,'ENTREZID'] )
data(geneList, package="DOSE")
head(geneList)
boxplot(geneList)
boxplot(DEG$logFC)
geneList=DEGENTREZID
geneList=sort(geneList,decreasing = T)
以上是geneList 的做取,共得到659个差异基因与其ENTREZID
head(geneList)
3952 6781 1356 29923 8516 6581
6.729912 6.439553 5.081165 5.007642 4.877560 4.814260
接下来对基因进行KEGG分析
kk.diff <- enrichKEGG(gene = gene_diff,
organism = 'hsa',
pvalueCutoff = 0.05)
head(kk.diff)[,1:6]
head(kk.diff)[,1:6]
[1] ID Description GeneRatio BgRatio
[5] pvalue p.adjust
<0 行> (或0-长度的row.names)
当P值选取0.05的时候没有有价值的结果。
kk.diff <- enrichKEGG(gene = gene_diff,
organism = 'hsa',
pvalueCutoff = 0.99)
head(kk.diff)[,1:6]
ID Description
hsa04514 hsa04514 Cell adhesion molecules (CAMs)
hsa04151 hsa04151 PI3K-Akt signaling pathway
hsa04510 hsa04510 Focal adhesion
hsa05418 hsa05418 Fluid shear stress and atherosclerosis
hsa04015 hsa04015 Rap1 signaling pathway
GeneRatio BgRatio pvalue p.adjust
hsa04514 12/237 147/7946 0.001431242 0.1867090
hsa04151 21/237 354/7946 0.001988208 0.1867090
hsa04510 14/237 199/7946 0.002454145 0.1867090
hsa05418 11/237 139/7946 0.002861440 0.1867090
hsa04015 14/237 210/7946 0.004001455 0.2088759
barplot(kk.diff,showCategory = 20,title="Pathway enrichment")
当P值选为0.99是出来的结果也较少,且矫正P值颇高。 kegg_jianshu.png
GO注释的话情况类似。
接下来用同样的一批基因去网页在线注释。
用R做的结果显示的是矫正P值,数值基本都大于0.05。
且通路太少,仅有五个,我已经将P值的最低阈值设定为0.99了还是很少。
以上R与DAVID、KOBAS的注释对比,我觉得KOBAS的结果更好一些,但是绘图什么的可能不如R方便。