FELLA:代谢组学富集分析软件介绍

2018-12-26 本文已影响62人 Dayueban

image

导读

通路富集分析技术对于理解代谢组学数据背后的潜在生物学意义是非常有用的，它们的目的是根据代谢途径中所包含的先验知识，为受影响的代谢物提供上下游关系。然而，对广义代谢通路的解释仍然具有挑战性，因为路径之间会有重叠和交叉。

文献介绍

原标题：FELLA: an R package to enrich metabolomics data
译名：FELLA，一个代谢组学富集分析的R包
期刊：《BMC Bioinformatics》
作者：Sergio Picart-Armada（一作），Alexandre Perera Lluna（通讯作者）
实验室主页：B2SLab
单位：加泰罗尼亚理工大学等
领域：混合生物信息学&生物工程学、心血管疾病、代谢组学数据处理、软件开发应用

主要成果

本篇文章主要介绍了一个R包FELLA，基于前期分析得到的差异代谢物来构建基于网络的富集分析。结果包括代谢通路、模块、酶、反应及代谢物。那么除了能够提供通路列表，FELLA还能够生成输入代谢物相关的中间物质（如模块、酶、反应）。可以反映特定研究条件下代谢通路之间的交集以及靶向潜在的酶和代谢物。

工作流程

下面这幅图高度概括了该软件的一个使用流程

图1 R包FELLA的设计思路。「Ⅰ选择物种和数据库；Ⅱ代谢物list输入和算法的选择；Ⅲ结果的生成导出」

Block Ⅰ：本地数据库
Block Ⅱ：富集分析
Block Ⅲ:结果导出

那么FELLA同时通过shiny包又具备了可交互的工作模式

包的下载及演示

包的下载


# 该包位于bioinformatics网站上

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("FELLA", version = "3.8")

library(FELLA)  ##加载包

加载数据库

# 第一部分就是创建数据库，这里加载已经创建好的

data("FELLA.sample")
class(FELLA.sample)
## [1] "FELLA.DATA"
## attr(,"package")
## [1] "FELLA"
show(FELLA.sample)
## General data:
## - KEGG graph:
##   * Nodes:  670 
##   * Edges:  1677 
##   * Density:  0.003741383 
##   * Categories:
##     + pathway [2]
##     + module [6]
##     + enzyme [58]
##     + reaction [279]
##     + compound [325]
##   * Size:  366.9 Kb 
## - KEGG names are ready.
## -----------------------------
## Hypergeometric test:
## - Matrix is ready
##   * Dim:  325 x 2 
##   * Size:  25 Kb
## -----------------------------
## Heat diffusion:
## - Matrix not loaded.
## - RowSums are ready.
## -----------------------------
## PageRank:
## - Matrix not loaded.
## - RowSums are ready.

这里需要注意的是，FELLA.DATA只需要通过函数buildGraphFromKEGGREST和buildDataFromGraph创建一次即可，并且后期不允许认为修改

加载演示数据

# 第二部分就是加载数据集，也就是前面说的输入分析得到对结果有影响的代谢物list

data("input.sample")
input.full <- c(input.sample, paste0("intruder", 1:10))

show(input.full)
##  [1] "C00143"     "C00546"     "C04225"     "C16328"     "C00091"    
##  [6] "C15979"     "C16333"     "C05264"     "C05258"     "C00011"    
## [11] "C00083"     "C00044"     "C05266"     "C00479"     "C05280"    
## [16] "C01352"     "C05268"     "C16329"     "C00334"     "C05275"    
## [21] "C14145"     "C00081"     "C04253"     "C00027"     "C00111"    
## [26] "C00332"     "C00003"     "C00288"     "C05467"     "C00164"    
## [31] "intruder1"  "intruder2"  "intruder3"  "intruder4"  "intruder5" 
## [36] "intruder6"  "intruder7"  "intruder8"  "intruder9"  "intruder10"

# 下面就是通过函数`defineCompounds`来看下有哪些物质是与数据库匹配上的

myAnalysis <- defineCompounds(
    compounds = input.full, 
    data = FELLA.sample)

# 要注意的是有些你前期分析鉴定出的化合物可能并不一定能比对上KEGG数据库收集的化合物，那么这些比对失败的化合物就需要通过函数`getExcluded`排除，而比对上的代谢物用`getInput`函数

getInput(myAnalysis)
##  [1] "C00003" "C00011" "C00027" "C00044" "C00081" "C00083" "C00091" "C00111"
##  [9] "C00143" "C00164" "C00288" "C00332" "C00334" "C00479" "C00546" "C01352"
## [17] "C04225" "C04253" "C05258" "C05264" "C05266" "C05268" "C05275" "C05280"
## [25] "C05467" "C14145" "C15979" "C16328" "C16329" "C16333"

getExcluded(myAnalysis)
##  [1] "intruder1"  "intruder2"  "intruder3"  "intruder4"  "intruder5" 
##  [6] "intruder6"  "intruder7"  "intruder8"  "intruder9"  "intruder10"

需要注意的是，这里是准确匹配的方式，所以要特别小心有空格或者tab键。

具体分析

接下来就是富集分析：一旦FELLA.DATA和FELLA.USER确定下来，那么就可以很轻松的开始下一步的富集分析流程了，富集分析的方法有三种
- 超几何检验(method = "hypergeom")
- Diffusion（分析有意义子网络）
- PageRank（和Diffusion类似，只不过会对网络进行排序）
统计分析：对于前面Diffusion和PageRank方法，提供了两种统计方法
- Normal approximation(approx = "normality")，基于无效假设的分析的期望值和协方差矩阵的z-score计算得到得分值
- Monte Carlo trials(approx = "simulation")，随机变量的蒙特卡罗实验计算得分值
富集：方法、近似值和集成方法
- enrich函数包括前面的defineCompounds ，runHypergeom ，runDiffusion和runPagerank四种函数。一步分析法

myAnalysis <- enrich(
    compounds = input.full, 
    method = listMethods(), 
    approx = "normality", 
    data = FELLA.sample)
#No background compounds specified. Default background will be used.
#Running hypergeom...
#Starting hypergeometric p-values calculation...
#Done.
#Running diffusion...
#Computing p-scores through the specified distribution.
#Done.
#Running PageRank...
#Computing p-scores through the specified distribution.
#Using provided damping factor...
#Done.
#Warning message:
#In defineCompounds(compounds = compounds, compoundsBackground = compoundsBackground,  :
#  Some compounds were introduced as affected but they do not belong to the background. These compounds will be excluded from the analysis. Use 'getExcluded' #  to see them.
show(myAnalysis)
## Compounds in the input: 30
##  [1] "C00003" "C00011" "C00027" "C00044" "C00081" "C00083" "C00091" "C00111"
##  [9] "C00143" "C00164" "C00288" "C00332" "C00334" "C00479" "C00546" "C01352"
## [17] "C04225" "C04253" "C05258" "C05264" "C05266" "C05268" "C05275" "C05280"
## [25] "C05467" "C14145" "C15979" "C16328" "C16329" "C16333"
## Background compounds: all available compounds (default)
## -----------------------------
## Hypergeometric test: ready.
## Top 2 p-values:
##     hsa00640     hsa00010 
## 8.540386e-09 9.999888e-01 
## 
## -----------------------------
## Heat diffusion: ready.
## P-scores under 0.05:  86
## -----------------------------
## PageRank: ready.
## P-scores under 0.05:  70

可视化

在method = "hypergeom"参数下画的图是包含top通路以及其对应的代谢物的图

plot(
    x = myAnalysis, 
    method = "hypergeom", 
    main = "My first enrichment using the hypergeometric test in FELLA", 
    threshold = 1, 
    data = FELLA.sample)

图2 hypergeom图

在method = "diffusion"参数下画的图是包含模块、酶和生化反应途径

plot(
    x = myAnalysis, 
    method = "diffusion", 
    main = "My first enrichment using the diffusion analysis in FELLA", 
    threshold = 0.1, 
    data = FELLA.sample)

图3 diffusion图

在method = "pagerank"参数下画的图和diffusion类似

plot(
    x = myAnalysis, 
    method = "diffusion", 
    main = "My first enrichment using the diffusion analysis in FELLA", 
    threshold = 0.1, 
    data = FELLA.sample)

图4 pagerank图

导出结果

将数据（代谢通路注释的结果导出）

myTempDir <- getwd()
myExp_csv <- paste0(myTempDir, "/table.csv")
exportResults(
    format = "csv", 
    file = myExp_csv, 
    method = "pagerank", 
    threshold = 0.1, 
    object = myAnalysis, 
    data = FELLA.sample)

小结

那么整个关于FELLA软件的一般性使用方法就介绍到这里，当然软件背后的计算方法是需要更加细致的去学习和探究的。区别于网页分析软件Metaboanalyst，可以更加快速和不依赖于网络的限制，这就是我为什么更喜欢用软件的原因。

参考

[1] 文章链接：FELLA: an R package to enrich metabolomics data
[2] FELLA包链接1：FELLA
[3] FELLA包github网址链接：github