CytoTRACE2:单细胞转录组细胞分化潜能推断-拟时起点参考
参考我们之前发布过CytoTRACE推断细胞发育潜能(CytoTRACE:推断拟时细胞起点辅助(结尾有彩蛋)),还可以作为拟时的起点参考,目前CytoTRACE2出来了,年初的事情,CytoTRACE2不是CytoTRACE包的升级版。那么写这个内容一方面是与时俱进,CytoTRACE2优化了算法,我们可以应用。另一方面也是小伙伴使用CytoTRACE2的时候出现了错误,我们尝试一下,是否也会出现,如果有解决它。此外,CytoTRACE2具有R语言和python版,两者是分开的,R语言当然大多数没有问题,有些小伙伴也有python需求,所以我们这个帖子介绍两种版本的使用。这个方法支持小鼠和人的数据分析。关于CytoTRACE2可以详细阅读它的文章:Minji Kang, Jose Juan Almagro Armenteros, Gunsagar S. Gulati*, Rachel Gleyzer, Susanna Avagyan, Erin L. Brown, Wubing Zhang, Abul Usmani, Noah Earland, Zhenqin Wu, James Zou, Ryan C. Fields, David Y. Chen, Aadel A. Chaudhuri, Aaron M. Newman.bioRxiv 2024.03.19.585637; doi: https://doi.org/10.1101/2024.03.19.585637 (preprint)
-
The predicted potency scores additionally provide a continuous measure of developmental potential, ranging from 0 (differentiated) to 1 (totipotent).
-
Underlying this method is a novel, interpretable deep learning framework trained and validated across 31 human and mouse scRNA-seq datasets encompassing 28 tissue types, collectively spanning the developmental spectrum.
-
This framework learns multivariate gene expression programs for each potency category and calibrates outputs across the full range of cellular ontogeny, facilitating direct cross-dataset comparison of developmental potential in an absolute space.
R语言版:CytoTRACE2官网:https://github.com/digitalcytometry/cytotrace2
加载数据并安装R包,数据还是使用的之前的一篇Nature的,可以作为参考:
###加载数据及安装包
library(Seurat)
DimPlot(sce1, label = T)
sce_sub <- sce1[,sce1$cluster %in% c("YSMP","GMP","Myeloblast","Monocyte")]
devtools::install_github("digitalcytometry/cytotrace2", subdir = "cytotrace2_r")
library(CytoTRACE2)
CytoTRACE2的运行是很简单的,它的input可以是表达矩阵,也可以直接是seurat object。这里我们做了一个对比,使用counts和data得到的结果是一样的。
#data running-主要函数cytotrace2
cytotrace2_sce <- cytotrace2(sce_sub, #seurat对象
is_seurat = TRUE,
slot_type = "counts", #counts和data都可以
species = 'human')#物种要选择,默认是小鼠
class(cytotrace2_sce)
# [1] "Seurat"
# attr(,"package")
# [1] "SeuratObject"
# cytotrace2_res <- cytotrace2(sce_sub@assays$RNA$data, #seurat对象
# species = 'human')#物种要选择,默认是小鼠
#
# class(cytotrace2_res)
# [1] "data.frame"
结果可视化;
annotation <- data.frame(phenotype = sce_sub@meta.data$cluster) %>%
set_rownames(., colnames(sce_sub))
# plotting-一次性生成多个图,然后储存在一个list,用$查看即可
plots <- plotData(cytotrace2_result = cytotrace2_sce,
annotation = annotation,
is_seurat = TRUE)
#如果这些图您需要放在文章中,需要修饰也是可以的
#因为是基于ggplot的作图,所以修饰就很简单了
#比如我们修饰一下主题
library(ggplot2)
for(i in 1:(length(plots)-1)) {
plots[[i]] <- plots[[i]]+theme_bw()
}
#可以一个个查看图并保存
# #p1
# plots$CytoTRACE2_UMAP
# #p2
# plots$CytoTRACE2_Potency_UMAP
# #p3
# plots$CytoTRACE2_Relative_UMAP
# #p4
# plots$Phenotype_UMAP
# #p5
# plots$CytoTRACE2_Boxplot_byPheno
#我们这里为了方便展示,组合展示
library(cowplot)
plot_grid(plots[[1]],plots[[3]],plots[[4]],
plots[[5]],ncol=2)#ncol=4表示图片排为几列
得到的结论和CytoTRACE1是一致的。从图1到图5,可以看出celltype的分化潜能,总之分析和可视化都特别的方便简单!接下来看看python版本的!
Python版:CytoTRACE2官网:https://github.com/digitalcytometry/cytotrace2/tree/main/cytotrace2_python
首先还是安装CytoTRACE2包,终端安装即可。安装比较费时间,大概得30min。
cd data_analysis/cytotrace2_py/
git clone https://github.com/digitalcytometry/cytotrace2
cd cytotrace2/cytotrace2_python
conda env create -f environment_py.yml
conda activate cytotrace2-py
pip install .
python版本的CytoTRACE2的输入文件需要gene expression matrix以及celltype annotation data。如果是Seurat object,这些文件在R中准备即可:
gene_exp <- as.matrix(GetAssayData(sce_sub, layer = "counts"))
write.table(gene_exp, file = "gene_exp.txt", sep = '\t',quote=F)
cell_anno <- data.frame(cellid = rownames(sce_sub@meta.data),
celltype = sce_sub@meta.data$cluster)
write.table(cell_anno, file = "cell_anno.txt", sep = '\t',quote=F, row.names = F)
如果你的单细胞文件是python结果,scanpy准备这些文件,因为我们没有这样的数据,所以我们将演示的seurat obj转化为h5ad,演示数据获取:
getwd()
setwd("/home/tq_ziv/data_analysis/cytotrace2_py/")
# sce_sub <- sce1[,sce1$cluster %in% c("YSMP","GMP","Myeloblast","Monocyte")]
# save(sce_sub, file = "sce_sub.RData")
library(sceasy)
library(reticulate)
use_condaenv('sceasy')
loompy <- reticulate::import('loompy')
sceasy::convertFormat(sce_sub, from="seurat", to="anndata", outFile='sce_sub.h5ad')
import scanpy as sc
adata=sc.read_h5ad("./sce_sub.h5ad")
expression_matrix = pd.DataFrame(adata.to_df().T) #需要转置一下
expression_matrix.head()
expression_matrix.to_csv('expression_matrix.txt',sep="\t")
cell_annotations = pd.DataFrame(data=adata.obs["cluster"])
cell_annotations
cell_annotations.to_csv('cell_annotations.txt',sep="\t")
运行方式也有两种,一种是终端运行,方式类似于pyscenic:
#直接终端运行
cytotrace2 --input-path gene_exp.txt --annotation-path cell_anno.txt --species human
另外一种python运行,调用函数即可:
#python中运行
from cytotrace2_py.cytotrace2_py import *
exp_path = "./expression_matrix.txt"
annotation_path = "./cell_annotations.txt"
species = "human"
results = cytotrace2(exp_path,
annotation_path=annotation_path,
species=species)
输出结果和R是一样的,也是5个图。总体而言,还是R使用着得心应手,很舒服。如果觉得python版本太过于麻烦或者可能出现一些位置错误,建议将数据转化为seurat或者得到矩阵和注释文件,使用R版进行分析!
觉得我们分享有些用的,点个赞再走呗!