Seurat分析10x Visium空间转录组数据
本教程使用Seurat包进行10x Visium单细胞空间转录组数据分析。
这个教程涉及:
-
标准化
-
降维和聚类
-
检测空间差异表达基因
-
交互可视化
-
与单细胞转录组整合分析
-
整合切片信息
#1. R环境
## 检查Seurat版本
本教程:Seurat (>=3.2)
help(Seurat)
## 安装包:
# Enter commands in R (or R studio, if installed)
install.packages('Seurat')
- SeuratData是数据分发Seurat对象的包。使用它可以直接访问Seurat 教程中使用的数据集。
devtools:: install_github ('satijalab/seurat-data')
- patchwork 用于组合图
install.packages('patchwork')
## 导入包
library(Seurat)
library(SeuratData)
library(ggplot2)
library(patchwork)
library(dplyr)
# 数据
本教程使用的空间转录组数据来由Visium v1产生 ;细胞来自于老鼠两个脑前部切片和对应的两个脑后部切片。
数据来自于10X官网,见10X Datasets; 在本教程中可以使用 Load10X_Spatial()
函数直接导入,返回Seurat 对象,其中reads数据由 spaceranger分析流程产生,包含spot水平的表达数据以及对应的组织切片的图像数据。
InstallData("stxBrain")
brain <- LoadData("stxBrain", type = "anterior1")
# 数据预处理
## 数据标准化
plot1 <- VlnPlot(brain, features = "nCount_Spatial", pt.size = 0.1) + NoLegend()
plot2 <- SpatialFeaturePlot(brain, features = "nCount_Spatial") + theme(legend.position = "right")
wrap_plots(plot1, plot2)
image.png
结果表明不同spots之间的变化不仅受技术的影响,也收到了组织切片的影响。在缺少神经元位置(such as the cortical white matter)单细胞测序得到的数值比较小;当使用LogNormalize()时,会首先将这些spots size标准化到同一水平,这种处理可能存在问题。
在此,作者建议使用sctransform (Hafemeister and Satija, Genome Biology 2019)。sctransform 通过建立基因表达的regularized negative binomial models来解释技术误差同时保留生物学误差。
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)
## 比较sctransform 和log-normalization 的结果;
# rerun normalization to store sctransform residuals for all genes
brain <- SCTransform(brain, assay = "Spatial", return.only.var.genes = FALSE, verbose = FALSE)
# also run standard log normalization for comparison
brain <- NormalizeData(brain, verbose = FALSE, assay = "Spatial")
# Computes the correlation of the log normalized data and sctransform residuals with the number of UMIs
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "Spatial", slot = "data", do.plot = FALSE)
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "SCT", slot = "scale.data", do.plot = FALSE)
p1 <- GroupCorrelationPlot(brain, assay = "Spatial", cor = "nCount_Spatial_cor") + ggtitle("Log Normalization") +
theme(plot.title = element_text(hjust = 0.5))
p2 <- GroupCorrelationPlot(brain, assay = "SCT", cor = "nCount_Spatial_cor") + ggtitle("SCTransform Normalization") +
theme(plot.title = element_text(hjust = 0.5))
p1 + p2
image.png
在这儿,计算了每个基因UMI值和标准化之后结果的相关性。并且,根据基因表达的均值将其划分为6个组。在log-normalization 的结果中,前三个组(基因表达均值大)存在明显的偏差,这也表明技术偏差还是在log-normalization的结果中存在。相比之下,sctransform标准化大大减小了这种影响。
# 基因表达可视化
Seurat中的SpatialFeaturePlot()函数可以将分子数据叠加到组织组织学数据上进行展示。例如,在这组小鼠大脑数据中,Hpca基因是一个强大的海马标记,Ttr是脉络丛的标记。
SpatialFeaturePlot(brain, features = c("Hpca", "Ttr"))
image.png
Seurat中的默认参数强调分子数据的可视化。但是,也可以调整斑点的大小(及其透明度)以改善组织学图像的可视化效果:
- pt.size.factor: 缩放斑点的大小。默认值为1.6
- alpha-最小和最大透明度。默认值是c(1,1)
- 尝试将alpha c(0.1,1)设置为较低,以降低具有较低表达式的点的透明度
通过alpha
c(0.1, 1)降低地表达点的透明度
p1 <- SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1)
p2 <- SpatialFeaturePlot(brain, features = "Ttr", alpha = c(0.1, 1))
p1 + p2
image.png
# 降维、聚类和可视化
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30)
brain <- FindClusters(brain, verbose = FALSE)
brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
dimPlot()可视化UMAP结果;SpatialDimPlot()联合映像结果进行可视化;
p1 <- DimPlot(brain, reduction = "umap", label = TRUE)
p2 <- SpatialDimPlot(brain, label = TRUE, label.size = 3)
p1 + p2
image.png
SpatialDimPlot中的cells.highlight可以通过颜色区分细胞,并且可以标记不同类群细胞的空间位置。
SpatialDimPlot(brain, cells.highlight = CellsByIdentities(object = brain, idents = c(2, 1, 4, 3,
5, 8)), facet.highlight = TRUE, ncol = 3)
image.png
# 交互式可视化
SpatialDimPlot()
和SpatialFeaturePlot()
可以进行可视化;只需要设置interactive为TRUE就可以在Rstudio viewer面板展示结果。
- SpatialDimPlot
SpatialDimPlot(brain, interactive = TRUE)
image.png
- SpatialFeaturePlot
SpatialFeaturePlot(brain, features = "Ttr", interactive = TRUE)
image.png
- LinkedDimPlot
- LinkedDimPlot()可以将UMAP 和组织影像的结果联合展示。
LinkedDimPlot(brain)
image.png
# 差异表达基因鉴定
Seurat 提供了两种流程来鉴定表达与位置相关的基因。第一种是基于组织中已知区域进行基因差异表达分析。
de_markers <- FindMarkers(brain, ident.1 = 5, ident.2 = 6)
SpatialFeaturePlot(object = brain, features = rownames(de_markers)[1:3], alpha = c(0.1, 1), ncol = 3)
image.png
另一种方法是FindSpatiallyVariables()函数在不存在预注释的情况下寻找基因空间表达模式。FindSpatiallyVariables()函数可以调用markvariogram方法( Trendsceek),模拟空间转录组数据为标记点从而计算variogram,variogram可以表征基因表达水平受其空间位置的影响。具体点就是,计算一定距离(r,默认为5)两点之间的依赖性。
brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT", features = VariableFeatures(brain)[1:1000],
selection.method = "markvariogram")
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = "markvariogram"), 6)
SpatialFeaturePlot(brain, features = top.features, ncol = 3, alpha = c(0.1, 1))
image.png
还有更多方法可以用于鉴定差异表达基因,例如 SpatialDE, 和Splotch。
可视化鉴定的最显著
brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT", features = VariableFeatures(brain)[1:1000],
selection.method = "markvariogram")
# 根据切片区域提取单细胞数据
这儿提取前额皮质的数据进行进一步分析;提取数个细胞类群的数据,然后在切片数据上进行可视化。
cortex <- subset(brain, idents = c(1, 2, 3, 4, 6, 7))
# now remove additional cells, use SpatialDimPlots to visualize what to remove
# SpatialDimPlot(cortex,cells.highlight = WhichCells(cortex, expression = image_imagerow > 400
# | image_imagecol < 150))
cortex <- subset(cortex, anterior1_imagerow > 400 | anterior1_imagecol < 150, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 275 & anterior1_imagecol > 370, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 250 & anterior1_imagecol > 440, invert = TRUE)
p1 <- SpatialDimPlot(cortex, crop = TRUE, label = TRUE)
p2 <- SpatialDimPlot(cortex, crop = FALSE, label = TRUE, pt.size.factor = 1, label.size = 3)
p1 + p2
image.png
# 单细胞数据整合
10x Visium中每个spot长宽约50um,因此每个spot测序数据包含多个细胞。更多的研究者希望反卷积去研究每个空间点的数据从而去预测细胞类型的构成。Seurat作者使用公开的SMART-Seq2数据(a reference scRNA-seq dataset of ~14,000 adult mouse cortical cell taxonomy from the Allen Institute)测试一些列反卷积和整合方法。相比于反卷积,整合的方法能够取得更好的结果。这可能是空间转录组数据和普通单细胞转录组数据的噪音模型存在巨大的差异造成的。在此,使用 Seurat v3中基于anchor的整合方法,能够给予参考数据对新的数据集进行注释。
数据下载:data
allen_reference <- readRDS("./allen_cortex.rds")
# note that setting ncells=3000 normalizes the full dataset but learns noise models on 3k
# cells this speeds up SCTransform dramatically with no loss in performance
library(dplyr)
allen_reference <- SCTransform(allen_reference, ncells = 3000, verbose = FALSE) %>%
RunPCA(verbose = FALSE) %>%
RunUMAP(dims = 1:30)
# After subsetting, we renormalize cortex
cortex <- SCTransform(cortex, assay = "Spatial", verbose = FALSE) %>%
RunPCA(verbose = FALSE)
# the annotation is stored in the 'subclass' column of object metadata
DimPlot(allen_reference, group.by = "subclass", label = TRUE)
image.png
anchors <- FindTransferAnchors(reference = allen_reference, query = cortex, normalization.method = "SCT")
predictions.assay <- TransferData(anchorset = anchors, refdata = allen_reference$subclass, prediction.assay = TRUE,
weight.reduction = cortex[["pca"]], dims = 1:30)
cortex[["predictions"]] <- predictions.assay
对每个spots打分;特别是前额叶皮质区是层级兴奋性神经元。在这里,可以区分这些神经元亚型的不同顺序层:
DefaultAssay(cortex) <- "predictions"
SpatialFeaturePlot(cortex, features = c("L2/3 IT", "L4"), pt.size.factor = 1.6, ncol = 2, crop = TRUE)
image.png
基于预测分数也可以预测不同细胞类型在空间上的位置。使用同样的方法也可以鉴定空间差异表达基因,使用细胞类型预测分数打分基因代替基因表达。
cortex <- FindSpatiallyVariableFeatures(cortex, assay = "predictions", selection.method = "markvariogram",
features = rownames(cortex), r.metric = 5, slot = "data")
top.clusters <- head(SpatiallyVariableFeatures(cortex), 4)
SpatialPlot(object = cortex, features = top.clusters, ncol = 2)
image.png
最终,结果表明整合方法能够发现已知神经元或非神经元( including laminar excitatory, layer-1 astrocytes, and the cortical grey matter.)的空间定位模式。
SpatialFeaturePlot(cortex, features = c("Astro", "L2/3 IT", "L4", "L5 PT", "L5 IT", "L6 CT", "L6 IT",
"L6b", "Oligo"), pt.size.factor = 1, ncol = 2, crop = FALSE, alpha = c(0.1, 1))
image.png
# 整合多个切片数据
前面处理的数据都是anterior,现在读入对应的posterior数据,然后进行一样的厨师标准化过程。
brain2 <- LoadData("stxBrain", type = "posterior1")
brain2 <- SCTransform(brain2, assay = "Spatial", verbose = FALSE)
同时处理多个切片Seurat 对象数据;使用merge函数整合数据。
brain.merge <- merge(brain, brain2)
然后进行联合降维和聚类
DefaultAssay(brain.merge) <- "SCT"
VariableFeatures(brain.merge) <- c(VariableFeatures(brain), VariableFeatures(brain2))
brain.merge <- RunPCA(brain.merge, verbose = FALSE)
brain.merge <- FindNeighbors(brain.merge, dims = 1:30)
brain.merge <- FindClusters(brain.merge, verbose = FALSE)
brain.merge <- RunUMAP(brain.merge, dims = 1:30)
基于UMAP 展示,SpatialDimPlot()和SpatialFeaturePlot();
DimPlot(brain.merge, reduction = "umap", group.by = c("ident", "orig.ident"))
image.png
SpatialDimPlot(brain.merge)
image.png
SpatialFeaturePlot(brain.merge, features = c("Hpca", "Plp1"))
image.png
# 原文
Analysis, visualization, and integration of spatial datasets with Seurat