【DoRothEA】单细胞转录因子分析
今天学习另外一个转录因子活性预测工具:DoRothEA。很多文章用的也是这个工具,看起来比pySCENIC好像快很多。其实,单细胞转录因子分析本质上是计算一种specific的得分,DoRothEA计算的是 Viper 得分。
======安装=====
官网地址是:
https://saezlab.github.io/dorothea/
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("dorothea")
或者用devtools安装:
devtools::install_github("saezlab/dorothea")
还需要一个依赖包:
BiocManager::install("viper")
=====例子测试====
还是用经典的pbmc的例子。
加载需要的库:
library(Seurat)
library(dorothea)
library(tidyverse)
library(viper)
把pbmc的数据load进来,分析和前面的很多例子一样
pbmc <- readRDS("pbmc.rds")
str(pbmc@meta.data)
下面,加载这个包自带的human的数据库
## We read Dorothea Regulons for Human:
dorothea_regulon_human <- get(data("dorothea_hs", package = "dorothea"))
## We obtain the regulons based on interactions with confidence level A, B and C
regulon <- dorothea_regulon_human %>% dplyr::filter(confidence %in% c("A","B","C"))
然后计算TF的活性,通过使用函数 run_viper() 在 DoRothEA 的regulons上运行 VIPER 以获得 TFs activity。 在 seurat 对象的情况下,该函数返回相同的 seurat 对象,其中包含一个名为 dorothea 的assay,其中包含slot数据中的 TFs activity。
pbmc <- run_viper(pbmc, regulon,options = list(method = "scale", minsize = 4,
eset.filter = FALSE, cores = 1, verbose = FALSE))
Assays(pbmc)
image.png
可以看到,这个pbmc数据集里面的约 三千个细胞都有自己的266个转录因子的活性得分。
image.png下面,我们试试看FindAllMarkers函数获取各个单细胞亚群特异性的转录因子。
DefaultAssay(object = pbmc) <- "dorothea"
table(Idents(pbmc))
image.png
pbmc.markers <- FindAllMarkers(object = pbmc,
only.pos = TRUE,
min.pct = 0.25,
thresh.use = 0.25)
DT::datatable(pbmc.markers)
write.csv(pbmc.markers,file='pbmc.markers.csv')
下面,我们尝试热图展示:
library(dplyr)
pbmc.markers$fc = pbmc.markers$pct.1 - pbmc.markers$pct.2
top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(10, fc)
pbmc@assays$dorothea@data[1:4,1:4]
top10 =top10[top10$fc > 0.3,]
pbmc <- ScaleData(pbmc)
DoHeatmap(pbmc,top10$gene,size=3,slot = 'scale.data')
从热图可以看出,其实还是有一大部分TF在不同的细胞类型中特异性表达的。
image.png然后,我们尝试利用这个包来找下特异性比较好的TF regulons。官网有详细的文档介绍,比如如何选择TF activity per cell population,根据 previously computed VIPER scores on DoRothEA’s regulons.
首先也是获取Viper得分矩阵 ,根据不同细胞亚群进行归纳汇总 :
viper_scores_df <- GetAssayData(pbmc, slot = "scale.data", assay = "dorothea") %>%
data.frame(check.names = F) %>% t()
viper_scores_df[1:4,1:4]
image.png
## We create a data frame containing the cells and their clusters
CellsClusters <- data.frame(cell = names(Idents(pbmc)),
cell_type = as.character(Idents(pbmc)),
check.names = F)
head(CellsClusters)
## We create a data frame with the Viper score per cell and its clusters
viper_scores_clusters <- viper_scores_df %>%
data.frame() %>%
rownames_to_column("cell") %>%
gather(tf, activity, -cell) %>%
inner_join(CellsClusters)
## We summarize the Viper scores by cellpopulation
summarized_viper_scores <- viper_scores_clusters %>%
group_by(tf, cell_type) %>%
summarise(avg = mean(activity),
std = sd(activity))
# For visualization purposes, we select the 20 most variable TFs across clusters according to our scores.
head(summarized_viper_scores)
## We select the 20 most variable TFs. (20*9 populations = 180)
highly_variable_tfs <- summarized_viper_scores %>%
group_by(tf) %>%
mutate(var = var(avg)) %>%
ungroup() %>%
top_n(180, var) %>%
distinct(tf)
highly_variable_tfs
## We prepare the data for the plot
summarized_viper_scores_df <- summarized_viper_scores %>%
semi_join(highly_variable_tfs, by = "tf") %>%
dplyr::select(-std) %>%
spread(tf, avg) %>%
data.frame(row.names = 1, check.names = FALSE)
最后,整理好的数据如下所示:
summarized_viper_scores_df[1:4,1:4]
image.png
最终数据是挑选的top 20个TF,它们在不同的单细胞亚群里面的变化最大。这样就可以可视化这些细胞类型特异表达的TF regulon了。
有了数据,就很容易可视化:
palette_length = 100
my_color = colorRampPalette(c("Darkblue", "white","red"))(palette_length)
my_breaks <- c(seq(min(summarized_viper_scores_df), 0,
length.out=ceiling(palette_length/2) + 1),
seq(max(summarized_viper_scores_df)/palette_length,
max(summarized_viper_scores_df),
length.out=floor(palette_length/2)))
library(pheatmap)
viper_hmap <- pheatmap(t(summarized_viper_scores_df),fontsize=14,
fontsize_row = 10,
color=my_color, breaks = my_breaks,
main = "DoRothEA (ABC)", angle_col = 45,
treeheight_col = 0, border_color = NA)
image.png