SeuratV3学习单细胞文献解读复现分子生物学

Seurat包学习笔记(三):Analysis of spati

2020-04-10  本文已影响0人  Davey1220

本次教程中,我们将学习如何使用Seurat3处理空间转录组数据。整体的分析流程类似于Seurat的单细胞RNA-seq分析流程,同时我们还引入了一些交互可视化的分析工具,将细胞所处的空间信息和分子表达信息进行整合。特别强调了空间和分子信息的集成。

image.png

主要的分析流程:

在本教程中,我们使用来自10x Genomics的Visium技术生成的数据集进行示例分析。同时,我们今后还会将Seurat扩展到处理其他空间转录组技术产生的数据类型,如SLIDE-Seq、STARmap和MERFISH等。

使用Seurat3处理10x Visium空间转录组数据

安装并加载所需的R包和数据集

# 使用devtools进行安装
devtools::install_github("satijalab/seurat", ref = "spatial")
library(Seurat)
library(ggplot2)
library(patchwork)
library(dplyr)

library(SeuratData)
# 使用SeuratData包安装小鼠大脑的空间转录组数据
InstallData("stxBrain")
library(stxBrain.SeuratData)
# 使用LoadData函数加载数据集
brain <- LoadData("stxBrain", type = "anterior1")
brain
An object of class Seurat 
31053 features across 2696 samples within 1 assay 
Active assay: Spatial (31053 features, 0 variable features)

# 查看Seurat对象包含的信息
View(brain)
image.png

10x genomics Visium技术产生的数据主要包括以下数据类型:

数据预处理

# 数据质控,查看原始数据的基本特征
plot1 <- VlnPlot(brain, features = "nCount_Spatial", pt.size = 0.1) + NoLegend()
plot2 <- SpatialFeaturePlot(brain, features = "nCount_Spatial") + theme(legend.position = "right")
wrap_plots(plot1, plot2)
image.png
# 数据标准化,使用SCTransform方法进行标准化
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)

比较SCTransform方法和常规log-normalization的效果

# rerun normalization to store sctransform residuals for all genes
# SCTransform
brain <- SCTransform(brain, assay = "Spatial", return.only.var.genes = FALSE, verbose = FALSE)

# also run standard log normalization for comparison
# NormalizeData
brain <- NormalizeData(brain, verbose = FALSE, assay = "Spatial")

# Computes the correlation of the log normalized data and sctransform residuals with the number of UMIs
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "Spatial", slot = "data", do.plot = FALSE)
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "SCT", slot = "scale.data", do.plot = FALSE)

p1 <- GroupCorrelationPlot(brain, assay = "Spatial", cor = "nCount_Spatial_cor") + ggtitle("Log Normalization") + 
    theme(plot.title = element_text(hjust = 0.5))
p2 <- GroupCorrelationPlot(brain, assay = "SCT", cor = "nCount_Spatial_cor") + ggtitle("SCTransform Normalization") + 
    theme(plot.title = element_text(hjust = 0.5))
p1 + p2
image.png

在上面的箱线图中,我们计算了每个特征(基因)与UMI数量(此处为变量nCount_Spatial)之间的相关性。然后,根据基因的平均表达量将它们进行了分组,最终生成了这些相关系数的箱线图。可以看到,使用常规的NormalizeData进行log-normalization未能使前三组的基因充分归一化,这表明技术因素继续影响高表达的基因的归一化表达估计值。相反,sctransform方法进行归一化可以有效的降低这种影响。

基因表达的可视化

在Seurat v3.2中,我们加入了新的功能函数来对空间转录组的数据进行可视化的探索。我们将Seurat的SpatialFeaturePlot函数扩展了FeaturePlot, 可以将基因的表达数据还原在样本组织的空间位置上。例如,在这个小鼠大脑切片数据中,Hpca基因是一个强烈的海马区(hippocampal )marker ,Ttr是一个脉络丛(choroid plexus)marker。

SpatialFeaturePlot(brain, features = c("Hpca", "Ttr"))
image.png

Seurat的默认参数强调了分子数据的可视化。我们也可以通过调整其他参数,如调整斑点的大小(和它们的透明度),来改善组织学图像的可视化效果。通过改变以下参数:

p1 <- SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1)
p2 <- SpatialFeaturePlot(brain, features = "Ttr", alpha = c(0.1, 1))
p1 + p2
image.png

数据降维、聚类与可视化

# PCA降维
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
# 图聚类分群
brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30)
brain <- FindClusters(brain, verbose = FALSE)
# UMAP降维可视化
brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
p1 <- DimPlot(brain, reduction = "umap", label = TRUE)
# 使用SpatialDimPlot函数进行可视化
p2 <- SpatialDimPlot(brain, label = TRUE, label.size = 3)
p1 + p2
image.png
# 使用cells.highlight参数高亮感兴趣的一些细胞
SpatialDimPlot(brain, cells.highlight = CellsByIdentities(object = brain, idents = c(1, 2, 5, 3, 4, 8)), facet.highlight = TRUE, ncol = 3)
image.png

交互式可视化

We have also built in a number of interactive plotting capabilities. Both SpatialDimPlot and SpatialFeaturePlot now have an interactive parameter, that when set to TRUE, will open up the Rstudio viewer pane with an interactive Shiny plot.

SpatialDimPlot(brain, interactive = TRUE)
image.png
LinkedDimPlot(brain)
image.png

鉴定空间可变基因(Spatially Variable Features)

Seurat提供了两种工作流程来识别与组织内空间位置相关的分子特征。

de_markers <- FindMarkers(brain, ident.1 = 4, ident.2 = 6)
SpatialFeaturePlot(object = brain, features = rownames(de_markers)[1:3], alpha = c(0.1, 1), ncol = 3)
image.png
brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT", features = VariableFeatures(brain)[1:1000], selection.method = "markvariogram")
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = "markvariogram"), 6)
SpatialFeaturePlot(brain, features = top.features, ncol = 3, alpha = c(0.1, 1))
image.png

可视化解剖区域的子集

# 提取子集
cortex <- subset(brain, idents = c(1, 2, 3, 5, 6, 7))
# now remove additional cells, use SpatialDimPlots to visualize what to remove
# SpatialDimPlot(cortex,cells.highlight = WhichCells(cortex, expression = #image_imagerow > 400 | image_imagecol < 150))
# 根据不同的条件进行筛选提取子集
cortex <- subset(cortex, anterior1_imagerow > 400 | anterior1_imagecol < 150, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 275 & anterior1_imagecol > 370, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 250 & anterior1_imagecol > 440, invert = TRUE)
# 数据可视化
p1 <- SpatialDimPlot(cortex, crop = TRUE, label = TRUE)
p2 <- SpatialDimPlot(cortex, crop = FALSE, label = TRUE, pt.size.factor = 1, label.size = 3)
p1 + p2
image.png

整合单细胞RNA-seq测序数据

# 加载单细胞RNA-seq数据
allen_reference <- readRDS("~/Downloads/allen_cortex.rds")
# note that setting ncells=3000 normalizes the full dataset but learns noise models on 3k cells this speeds up SCTransform dramatically with no loss in performance
library(dplyr)

# 进行数据标准化结合PCA降维
allen_reference <- SCTransform(allen_reference, ncells = 3000, verbose = FALSE) %>% RunPCA(verbose = FALSE) %>% RunUMAP(dims = 1:30)

# After subsetting, we renormalize cortex
cortex <- SCTransform(cortex, assay = "Spatial", verbose = FALSE) %>% RunPCA(verbose = FALSE)

# the annotation is stored in the 'subclass' column of object metadata
DimPlot(allen_reference, group.by = "subclass", label = TRUE)
image.png
# 识别整合的anchors
anchors <- FindTransferAnchors(reference = allen_reference, query = cortex, normalization.method = "SCT")
# 进行数据映射整合
predictions.assay <- TransferData(anchorset = anchors, refdata = allen_reference$subclass, prediction.assay = TRUE, weight.reduction = cortex[["pca"]])
# 添加预测信息
cortex[["predictions"]] <- predictions.assay

DefaultAssay(cortex) <- "predictions"
# 数据可视化
SpatialFeaturePlot(cortex, features = c("L2/3 IT", "L4"), pt.size.factor = 1.6, ncol = 2, crop = TRUE)
image.png
# 鉴定空间可变基因
cortex <- FindSpatiallyVariableFeatures(cortex, assay = "predictions", selection.method = "markvariogram", features = rownames(cortex), r.metric = 5, slot = "data")
top.clusters <- head(SpatiallyVariableFeatures(cortex), 4)
# 数据可视化
SpatialPlot(object = cortex, features = top.clusters, ncol = 2)
image.png
# marker基因可视化
SpatialFeaturePlot(cortex, features = c("Astro", "L2/3 IT", "L4", "L5 PT", "L5 IT", "L6 CT", "L6 IT", "L6b", "Oligo"), pt.size.factor = 1, ncol = 2, crop = FALSE, alpha = c(0.1, 1))
image.png

使用Seurat处理多个切片数据

brain2 <- LoadData("stxBrain", type = "posterior1")
brain2 <- SCTransform(brain2, assay = "Spatial", verbose = FALSE)
# 使用merge函数合并多个slices
brain.merge <- merge(brain, brain2)

DefaultAssay(brain.merge) <- "SCT"
VariableFeatures(brain.merge) <- c(VariableFeatures(brain), VariableFeatures(brain2))
# 数据降维,聚类与可视化
brain.merge <- RunPCA(brain.merge, verbose = FALSE)
brain.merge <- FindNeighbors(brain.merge, dims = 1:30)
brain.merge <- FindClusters(brain.merge, verbose = FALSE)
brain.merge <- RunUMAP(brain.merge, dims = 1:30)
DimPlot(brain.merge, reduction = "umap", group.by = c("ident", "orig.ident"))
image.png
SpatialDimPlot(brain.merge)
image.png
SpatialFeaturePlot(brain.merge, features = c("Hpca", "Plp1"))
image.png

使用Seurat3处理Slide-seq空间转录组数据

Here, we will be analyzing a dataset generated using Slide-seq v2of the mouse hippocampus.

安装并加载所需的数据集

library(SeuratData)
InstallData("ssHippo")
slide.seq <- LoadData("ssHippo")
View(slide.seq)

数据预处理

# 数据质控,查看原始数据的分布特征
plot1 <- VlnPlot(slide.seq, features = "nCount_Spatial", pt.size = 0, log = TRUE) + NoLegend()
# 对原始count进行log标准化处理
slide.seq$log_nCount_Spatial <- log(slide.seq$nCount_Spatial)
plot2 <- SpatialFeaturePlot(slide.seq, features = "log_nCount_Spatial") + theme(legend.position = "right")
wrap_plots(plot1, plot2)
image.png
# 使用SCTransform进行标准化处理
slide.seq <- SCTransform(slide.seq, assay = "Spatial", ncells = 3000, verbose = FALSE)
# PCA降维
slide.seq <- RunPCA(slide.seq)
slide.seq <- RunUMAP(slide.seq, dims = 1:30)
# 数据聚类分群
slide.seq <- FindNeighbors(slide.seq, dims = 1:30)
slide.seq <- FindClusters(slide.seq, resolution = 0.3, verbose = FALSE)
# 数据可视化
plot1 <- DimPlot(slide.seq, reduction = "umap", label = TRUE)
plot2 <- SpatialDimPlot(slide.seq, stroke = 0)
plot1 + plot2
image.png
# 使用cells.highlight参数高亮特定细胞
SpatialDimPlot(slide.seq, cells.highlight = CellsByIdentities(object = slide.seq, idents = c(0, 7, 13)), facet.highlight = TRUE)
image.png

整合单细胞RNA-seq参考数据

# 加载单细胞RNA-seq数据
ref <- readRDS("~/Downloads/mouse_hippocampus_reference.rds")

# 使用FindTransferAnchors函数识别参考的anchors
anchors <- FindTransferAnchors(reference = ref, query = slide.seq, normalization.method = "SCT", npcs = 50)
# 数据映射进行整合
predictions.assay <- TransferData(anchorset = anchors, refdata = ref$celltype, prediction.assay = TRUE, weight.reduction = slide.seq[["pca"]], dims = 1:50)
# 添加预测信息
slide.seq[["predictions"]] <- predictions.assay

DefaultAssay(slide.seq) <- "predictions"
# 数据可视化
SpatialFeaturePlot(slide.seq, features = c("Dentate Principal cells", "CA3 Principal cells", "Entorhinal cortex", "Endothelial tip", "Ependymal", "Oligodendrocyte"), alpha = c(0.1, 1))
image.png
slide.seq$predicted.id <- GetTransferPredictions(slide.seq)
Idents(slide.seq) <- "predicted.id"
SpatialDimPlot(slide.seq, cells.highlight = CellsByIdentities(object = slide.seq, idents = c("CA3 Principal cells", "Dentate Principal cells", "Endothelial tip")), facet.highlight = TRUE)
image.png

鉴定空间可变基因

Here, we demonstrate the latter with an implementation of Moran’s I available via FindSpatiallyVariableFeatures by setting method = 'moransi'.
Moran’s I computes an overall spatial autocorrelation and gives a statistic (similar to a correlation coefficient) that measures the dependence of a feature on spatial location.
This allows us to rank features based on how spatially variable their expression is.
In order to facilitate quick estimation of this statistic, we implemented a basic binning strategy that will draw a rectangular grid over Slide-seq puck and average the feature and location within each bin. The number of bins in the x and y direction are controlled by the x.cuts and y.cuts parameters respectively. Additionally, while not required, installing the optional Rfast2 package(install.packages('Rfast2')), will significantly decrease the runtime via a more efficient implementation.

DefaultAssay(slide.seq) <- "SCT"
# 识别空间可变基因,使用selection.method = "moransi"方法
slide.seq <- FindSpatiallyVariableFeatures(slide.seq, assay = "SCT", slot = "scale.data", features = VariableFeatures(slide.seq)[1:1000], selection.method = "moransi", x.cuts = 100, y.cuts = 100)
# 数据可视化
SpatialFeaturePlot(slide.seq, features = head(SpatiallyVariableFeatures(slide.seq, selection.method = "moransi"), 6), ncol = 3, alpha = c(0.1, 1), max.cutoff = "q95")
image.png

参考来源:
https://satijalab.org/seurat/v3.1/spatial_vignette.html

上一篇下一篇

猜你喜欢

热点阅读