单细胞挖掘思路单细胞流程

基于单细胞测序的蛋白活性推断(PISCES)分析流程笔记,凎!

2022-05-05  本文已影响0人  喂鸡组的阿翔

写在前面

What is PISCES?

PISCES workflow

1.加载PISCES 需要用到的功能包:
rm(list = ls())
options(warn = -1)
options(stringsAsFactors = F)

library(viper)
library(Seurat)
library(ggplot2)
library(ggpubr)
library(pheatmap)
library(RColorBrewer)
library(MUDAN)
library(umap)
library(igraph)
library(biomaRt)
library(uwot)
library(Matrix)
library(PISCES)
2.加载单细胞数据。
#加载质控并SCTransform整合后的Seurat Object
load(file = "sce_af_integrate.Rdata")
3.对细胞进行分群
# Seurat clustering
sce.combined.sct <- RunPCA(sce.combined.sct, verbose = FALSE)
sce.combined.sct <- RunUMAP(sce.combined.sct, dims = 1:30, verbose = FALSE)
sce.combined.sct <- FindNeighbors(sce.combined.sct, dims = 1:30, verbose = FALSE)
sce.combined.sct <- FindClusters(sce.combined.sct, method = "igraph" ,verbose = FALSE)
# PISCES clustering
sce.combined.sct <- CorDist(sce.combined.sct, pca.feats = 10)#运行时间长
sce.combined.sct <- LouvainKRange(sce.combined.sct, kmax = 100)#运行时间长

*我自己使用的单细胞数据集有60000+个细胞,因此上述代码中LouvainKRange()函数运行过程中会报错,简单检索后猜测是计算内容的矢量超出了R语言的上限(2^31-1),尝试减少细胞数量可以顺利运行,但咱不能无缘无故把质控后保留的细胞给删了;由于报错的内容和igraph这个包相关,进一步检索后发现有可能是LouvainKRange()这个函数在运行过程中把Seurat Object中的稀疏矩阵转换为了正常矩阵交给了igraph包中的graph_from_adjacency_matrix()函数。尝试过后发现将交给graph_from_adjacency_matrix()函数的对象转换为稀疏矩阵即可解决,代码如下很简单:

trace('LouvainClust', edit = T, where = asNamespace("PISCES"))
#在弹出的对话框中将这一段代码:
“graph.obj <- igraph::graph_from_adjacency_matrix(adj.mat)”
#改为
“graph.obj <- igraph::graph_from_adjacency_matrix(as(adj.mat, "sparseMatrix"))”
#保存后再次运行即可

这里我们就得到了基于PISCES的cell clusters,下一步使用MetaCells()函数为每一个cluster生成一个具有250个cells的表达矩阵用于后续ARACNe-network的构建;

4.构建ARACNe-network需要使用的cluster表达矩阵,下面是官方给的代码:
gexp.dist <- sce.combined.sct@assays$SCT@misc$dist.mat
pisces.metacell.mats <- MetaCells(as.matrix(sce.combined.sct@assays$RNA@counts),
                                  dist.mat = gexp.dist,
                                  clust.vect = sce.combined.sct@assays$SCT@misc$pisces.cluster)

如果你的数据是通过SCT转换后并使用Seurat包中的IntegrateData()函数整合的,那么上述代码就需要修改,因为你的pisces.cluster数据在Seurat Object中的存储对象不在SCT这个位置,integrated这个位置(具体需要你对Seurat Object这个对象的存储方式进行学习,网上资料很多,例如“生信技能树=。=!”),代码如下:

gexp.dist <- sce.combined.sct@assays$SCT@misc$dist.mat
pisces.metacell.mats <- MetaCells(as.matrix(sce.combined.sct@assays$RNA@counts),
                                  dist.mat = gexp.dist,
                                  clust.vect = sce.combined.sct@assays$integrated@misc$pisces.cluster)
5.保存Pisces-cluster结果并写出用于ARACNe-network构建
dir.create('pisces-clust_aracne-mats/')#当前工作路径下创建pisces-clust_aracne-mats文件夹
#将cluster表达矩阵以RDS的格式存储到pisces-clust_aracne-mats文件夹中
for (mcn in names(pisces.metacell.mats)) {
  saveRDS(pisces.metacell.mats[[mcn]], 
          file = paste('pisces-clust_aracne-mats/pisces-', 
                       mcn, '-arac.rds', sep = ''))
}
#你的PISCES clustering得到了几群细胞,这里你就会得到几个表达矩阵,每个表达矩阵中含有250个细胞样本

#再次读取表达矩阵,这里演示假如我得到了3群细胞的表达矩阵
pisces1 <- readRDS('pisces-clust_aracne-mats/pisces-1-arac.rds')
pisces2 <- readRDS('pisces-clust_aracne-mats/pisces-2-arac.rds')
pisces3 <- readRDS('pisces-clust_aracne-mats/pisces-3-arac.rds')
#存储为CSV格式
write.csv(pisces1,file = "pisces-1-arac.csv")
write.csv(pisces2,file = "pisces-2-arac.csv")
write.csv(pisces3,file = "pisces-3-arac.csv")

*将保存好的csv格式的文件修改为txt格式的文件用于后续ARACNe-network构建

6.ARACNe-network generation

从这里开始followARACNe-AP的tutorial,这部分内容可以基于windows平台的Terminal或者linux平台完成,我这里使用的是linux平台(租用的服务器),方法大同小异。以下使用的linux命令及其基础有不会的检索后都能查到。

mkdir JAVA_JDK#创建JAVA安装位置
mkdir ANTs    #创建ANTs安装位置
tar -zxvf /your/pathway/to/apache-ant-1.9.14-bin.tar.gz
tar -zxvf /your/pathway/to/jdk-8u131-linux-x64.tar.gz
ls #查看是否有“.profile”文件
vim .profile #使用文本编辑器打开并编辑“.profile”文件
export JAVA_HOME=/YOUR/PATHWAY/TO/jdk1.8.0_211 
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/ 
export PATH=$PATH:$JAVA_HOME/bin

export ANT_HOME=/YOUR/PATHWAY/TO/apache-ant-1.9.15/
export PATH=$ANT_HOME/bin:$PATH
source .profile
java -version
ant -version
ant main
#Step1 Calculate threshold with a fixed seed
java -Xmx5G -jar dist/aracne.jar -e test/pisces-1-arac.txt  -o outputFolder --tfs test/tfs.txt --pvalue 1E-8 --seed 1 \--calculateThreshold
#Step2: Run ARACNe on a single bootstrap
java -Xmx5G -jar dist/aracne.jar -e test/pisces-1-arac.txt  -o outputFolder --tfs test/tfs.txt --pvalue 1E-8 --seed 1
#Step3: Run 100 reproducible bootstraps
for i in {1..100}
do
java -Xmx5G -jar dist/aracne.jar -e test/pisces-1-arac.txt  -o outputFolder --tfs test/tfs.txt --pvalue 1E-8 --seed $i
done
#Step4: Consolidate bootstraps in the output folder
java -Xmx5G -jar dist/aracne.jar -o outputFolder --consolidate
#Step5: Run a single ARACNE with no bootstrap and no DPI
java -Xmx5G -jar dist/aracne.jar -e test/pisces-1-arac.txt  -o outputFolder --tfs test/tfs.txt --pvalue 1E-8 --seed 1 \ --nobootstrap --nodpi
#Step6: Consolidate bootstraps without Bonferroni correction
java -Xmx5G -jar dist/aracne.jar -o outputFolder --consolidate --nobonferroni
#!/bin/bash
for j in {1..14}
do
java -Xmx5G -jar dist/aracne.jar -e test/pisces-$j-arac.txt  -o outputFolder_$j --tfs test/sig-hugo.txt --pvalue 1E-8 --seed 1 \--calculateThreshold

java -Xmx5G -jar dist/aracne.jar -e test/pisces-$j-arac.txt  -o outputFolder_$j --tfs test/sig-hugo.txt --pvalue 1E-8 --seed 1

for i in {1..100} 
do 
java -Xmx5G -jar dist/aracne.jar -e test/pisces-$j-arac.txt  -o outputFolder_$j --tfs test/sig-hugo.txt --pvalue 1E-8 --seed $i 
done

java -Xmx5G -jar dist/aracne.jar -o outputFolder_$j --consolidate

java -Xmx5G -jar dist/aracne.jar -e test/pisces-$j-arac.txt  -o outputFolder_$j --tfs test/sig-hugo.txt --pvalue 1E-8 --seed 1 \ --nobootstrap --nodpi

java -Xmx5G -jar dist/aracne.jar -o outputFolder_$j --consolidate --nobonferroni
done
chmod +x aracen3.sh #给脚本一个执行权限
nohup ./aracen3.sh & #挂后台运行
jobs -l #查看脚本运行情况及进程号,关闭你的Xshell或MAC-Terminal后再次登录服务器该命令就看不到此进程了
pgrep java #查看脚本是否还在运行,关闭你的Xshell或MAC-Terminal后再次登录服务器该命令依旧可以查看进程
cat nohups.out #查看aracen3.sh运行过程中返回的信息
7.Protein activity based clustering analysis
#这里演示还是假如我们有三个cluster
RegProcess('pisces-1-net-final.tsv', mat1, out.dir = 'tutorial/', out.name = 'pisces-1-net-')
RegProcess('pisces-2-net-final.tsv', mat2, out.dir = 'tutorial/', out.name = 'pisces-2-net-')
RegProcess('pisces-3-net-final.tsv', mat3, out.dir = 'tutorial/', out.name = 'pisces-3-net-')

#RegProcess()函数生成的文件保存为rds格式,先使用readRDS()函数读进来。
c1.net <- readRDS('pisces-1-net-pruned.rds')
c2.net <- readRDS('pisces-2-net-pruned.rds')
c3.net <- readRDS('pisces-3-net-pruned.rds')
#使用list()函数构建net.list文件
net.list<-list(c1.net,c2.net,c3.net)

## viper and protein-activity based clustering
## net.list here would be a list of networks generated from ARACNe
sce.combined.sct <- AddPISCESAssay(sce.combined.sct)#将Seurat对象里的count矩阵拿出来命名为PISCES,然后放回到Seurat对象里去,并设置为当前默认的active.assay
sce.combined.sct <- CPMTransform(sce.combined.sct)#针对PISCESassay进行CPMTransform()、GESTransform()标准化。
sce.combined.sct <- GESTransform(sce.combined.sct)
sce.combined.sct <- PISCESViper(sce.combined.sct, net.list)#蛋白活性推断
sce.combined.sct <- CorDist(sce.combined.sct, pca.feats = 10)#根据蛋白活性矩阵计算细胞间距离
# 降维
sce.combined.sct <- MakeUMAP(sce.combined.sct)
sce.combined.sct <- MakeMDS(sce.combined.sct)
sce.combined.sct <- LouvainKRange(sce.combined.sct, kmin = 5, kmax = 100)
sce.combined.sct <- MWUMrs(sce.combined.sct)#找每个分群的master regulator

#作图
plot.df <- data.frame('MDS1' = sce.combined.sct@assays$PISCES@misc$mds[,1],
                      'MDS2' = sce.combined.sct@assays$PISCES@misc$mds[,2],
                      'UMAP1' = sce.combined.sct@assays$PISCES@misc$umap[,1],
                      'UMAP2' = sce.combined.sct@assays$PISCES@misc$umap[,2],
                      'Cluster' = as.factor(sce.combined.sct@assays$PISCES@misc$pisces.cluster))

umap.plot <- ggplot(plot.df, aes(UMAP1, UMAP2)) + geom_point(aes(color = Cluster)) +
  ggtitle(paste(tissue.title, ' (UMAP)', sep = '')) + plot.theme
print(umap.plot)

mds.plot <- ggplot(plot.df, aes(MDS1, MDS2)) + geom_point(aes(color = Cluster)) +
  ggtitle(paste(tissue.title, ' (MDS)', sep = '')) + plot.theme
print(mds.plot)

参考资料:

[1].https://github.com/califano-lab/PISCES
[2].Ding H, Douglass EF Jr, Sonabend AM, Mela A, Bose S, Gonzalez C, Canoll PD, Sims PA, Alvarez MJ, Califano A. Quantitative assessment of protein activity in orphan tissues and single cells using the metaVIPER algorithm. Nat Commun. 2018 Apr 16;9(1):1471. doi: 10.1038/s41467-018-03843-3. PMID: 29662057; PMCID: PMC5902599.
[3].PISCES: A pipeline for the Systematic, Protein Activity-based Analysis of Single Cell RNA Sequencing Data | bioRxiv
[4].Obradovic A, Chowdhury N, Haake SM, Ager C, Wang V, Vlahos L, Guo XV, Aggen DH, Rathmell WK, Jonasch E, Johnson JE, Roth M, Beckermann KE, Rini BI, McKiernan J, Califano A, Drake CG. Single-cell protein activity analysis identifies recurrence-associated renal tumor macrophages. Cell. 2021 May 27;184(11):2988-3005.e16. doi: 10.1016/j.cell.2021.04.038. Epub 2021 May 20. PMID: 34019793; PMCID: PMC8479759.
[5].https://github.com/califano-lab/ARACNe-AP

上一篇 下一篇

猜你喜欢

热点阅读