R语言做生信生信工具生信精读文献

FELLA:代谢组学富集分析软件介绍

2018-12-26  本文已影响62人  Dayueban
image

导读

通路富集分析技术对于理解代谢组学数据背后的潜在生物学意义是非常有用的,它们的目的是根据代谢途径中所包含的先验知识,为受影响的代谢物提供上下游关系。然而,对广义代谢通路的解释仍然具有挑战性,因为路径之间会有重叠和交叉。

文献介绍

主要成果

本篇文章主要介绍了一个R包FELLA,基于前期分析得到的差异代谢物来构建基于网络的富集分析。结果包括代谢通路、模块、酶、反应及代谢物。那么除了能够提供通路列表,FELLA还能够生成输入代谢物相关的中间物质(如模块、酶、反应)。可以反映特定研究条件下代谢通路之间的交集以及靶向潜在的酶和代谢物。

工作流程

下面这幅图高度概括了该软件的一个使用流程

图1 R包FELLA的设计思路。「Ⅰ选择物种和数据库;Ⅱ代谢物list输入和算法的选择;Ⅲ结果的生成导出」
  1. Block Ⅰ:本地数据库
  2. Block Ⅱ:富集分析
  3. Block Ⅲ:结果导出

那么FELLA同时通过shiny包又具备了可交互的工作模式

包的下载及演示

包的下载


# 该包位于bioinformatics网站上

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("FELLA", version = "3.8")

library(FELLA)  ##加载包

加载数据库

# 第一部分就是创建数据库,这里加载已经创建好的

data("FELLA.sample")
class(FELLA.sample)
## [1] "FELLA.DATA"
## attr(,"package")
## [1] "FELLA"
show(FELLA.sample)
## General data:
## - KEGG graph:
##   * Nodes:  670 
##   * Edges:  1677 
##   * Density:  0.003741383 
##   * Categories:
##     + pathway [2]
##     + module [6]
##     + enzyme [58]
##     + reaction [279]
##     + compound [325]
##   * Size:  366.9 Kb 
## - KEGG names are ready.
## -----------------------------
## Hypergeometric test:
## - Matrix is ready
##   * Dim:  325 x 2 
##   * Size:  25 Kb
## -----------------------------
## Heat diffusion:
## - Matrix not loaded.
## - RowSums are ready.
## -----------------------------
## PageRank:
## - Matrix not loaded.
## - RowSums are ready.

加载演示数据

# 第二部分就是加载数据集,也就是前面说的输入分析得到对结果有影响的代谢物list

data("input.sample")
input.full <- c(input.sample, paste0("intruder", 1:10))

show(input.full)
##  [1] "C00143"     "C00546"     "C04225"     "C16328"     "C00091"    
##  [6] "C15979"     "C16333"     "C05264"     "C05258"     "C00011"    
## [11] "C00083"     "C00044"     "C05266"     "C00479"     "C05280"    
## [16] "C01352"     "C05268"     "C16329"     "C00334"     "C05275"    
## [21] "C14145"     "C00081"     "C04253"     "C00027"     "C00111"    
## [26] "C00332"     "C00003"     "C00288"     "C05467"     "C00164"    
## [31] "intruder1"  "intruder2"  "intruder3"  "intruder4"  "intruder5" 
## [36] "intruder6"  "intruder7"  "intruder8"  "intruder9"  "intruder10"

# 下面就是通过函数`defineCompounds`来看下有哪些物质是与数据库匹配上的

myAnalysis <- defineCompounds(
    compounds = input.full, 
    data = FELLA.sample)

# 要注意的是有些你前期分析鉴定出的化合物可能并不一定能比对上KEGG数据库收集的化合物,那么这些比对失败的化合物就需要通过函数`getExcluded`排除,而比对上的代谢物用`getInput`函数

getInput(myAnalysis)
##  [1] "C00003" "C00011" "C00027" "C00044" "C00081" "C00083" "C00091" "C00111"
##  [9] "C00143" "C00164" "C00288" "C00332" "C00334" "C00479" "C00546" "C01352"
## [17] "C04225" "C04253" "C05258" "C05264" "C05266" "C05268" "C05275" "C05280"
## [25] "C05467" "C14145" "C15979" "C16328" "C16329" "C16333"

getExcluded(myAnalysis)
##  [1] "intruder1"  "intruder2"  "intruder3"  "intruder4"  "intruder5" 
##  [6] "intruder6"  "intruder7"  "intruder8"  "intruder9"  "intruder10"

具体分析

myAnalysis <- enrich(
    compounds = input.full, 
    method = listMethods(), 
    approx = "normality", 
    data = FELLA.sample)
#No background compounds specified. Default background will be used.
#Running hypergeom...
#Starting hypergeometric p-values calculation...
#Done.
#Running diffusion...
#Computing p-scores through the specified distribution.
#Done.
#Running PageRank...
#Computing p-scores through the specified distribution.
#Using provided damping factor...
#Done.
#Warning message:
#In defineCompounds(compounds = compounds, compoundsBackground = compoundsBackground,  :
#  Some compounds were introduced as affected but they do not belong to the background. These compounds will be excluded from the analysis. Use 'getExcluded' #  to see them.
show(myAnalysis)
## Compounds in the input: 30
##  [1] "C00003" "C00011" "C00027" "C00044" "C00081" "C00083" "C00091" "C00111"
##  [9] "C00143" "C00164" "C00288" "C00332" "C00334" "C00479" "C00546" "C01352"
## [17] "C04225" "C04253" "C05258" "C05264" "C05266" "C05268" "C05275" "C05280"
## [25] "C05467" "C14145" "C15979" "C16328" "C16329" "C16333"
## Background compounds: all available compounds (default)
## -----------------------------
## Hypergeometric test: ready.
## Top 2 p-values:
##     hsa00640     hsa00010 
## 8.540386e-09 9.999888e-01 
## 
## -----------------------------
## Heat diffusion: ready.
## P-scores under 0.05:  86
## -----------------------------
## PageRank: ready.
## P-scores under 0.05:  70

可视化

plot(
    x = myAnalysis, 
    method = "hypergeom", 
    main = "My first enrichment using the hypergeometric test in FELLA", 
    threshold = 1, 
    data = FELLA.sample)
图2 hypergeom图
plot(
    x = myAnalysis, 
    method = "diffusion", 
    main = "My first enrichment using the diffusion analysis in FELLA", 
    threshold = 0.1, 
    data = FELLA.sample)
图3 diffusion图
plot(
    x = myAnalysis, 
    method = "diffusion", 
    main = "My first enrichment using the diffusion analysis in FELLA", 
    threshold = 0.1, 
    data = FELLA.sample)
图4 pagerank图

导出结果

myTempDir <- getwd()
myExp_csv <- paste0(myTempDir, "/table.csv")
exportResults(
    format = "csv", 
    file = myExp_csv, 
    method = "pagerank", 
    threshold = 0.1, 
    object = myAnalysis, 
    data = FELLA.sample)

小结

那么整个关于FELLA软件的一般性使用方法就介绍到这里,当然软件背后的计算方法是需要更加细致的去学习和探究的。区别于网页分析软件Metaboanalyst,可以更加快速和不依赖于网络的限制,这就是我为什么更喜欢用软件的原因。

参考

[1] 文章链接:FELLA: an R package to enrich metabolomics data
[2] FELLA包链接1:FELLA
[3] FELLA包github网址链接:github

上一篇 下一篇

猜你喜欢

热点阅读