Seurat——Fast integration using r
A modified workflow for the integration of scRNA-seq datasets.
Identify anchors:canonical correlation analysis (‘CCA’)→reciprocal PCA (‘RPCA’)
RPCA: project each dataset into the others PCA space and constrain the anchors by the same mutual neighborhood requirement.
适用条件
CCA:通过识别数据集之间的shared sources of variation,CCA适合在细胞类型保守的数据中识别anchors。在不同实验中,基因表达存在很大差异。因此,当实验条件或疾病状态引入非常强烈的表达变化时,或者当整合不同modality和物种的数据集时,基于CCA的整合可以实现整合分析。然而,基于CCA的集成也可能导致过度校正,尤其是当大量细胞在数据集中不重叠时。
RPCA:基于RPCA的整合运行速度显著加快,也代表了一种更保守的方法,即不同生物状态的细胞在整合后不太可能“对齐”。因此,我们建议在以下情况中使用RPCA:
1.一个数据集中有相当一部分细胞在其他数据集中没有匹配类型
2.数据集来自同一平台(即10倍基因组的多条通道)
3.有大量数据集或细胞需要整合
与 introduction to scRNA-seq integration类似,但是,本workflow要求用户在integration之前对每个数据集分别运行PCA。运行FindIntegrationAnchors()时,用户还应将“reduce”参数设置为“rpca”。(19-24行)
library(SeuratData)
# install dataset
InstallData("ifnb")
# load dataset
LoadData("ifnb")
# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")
# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
# select features that are repeatedly variable across datasets for integration run PCA on each
# dataset using these features
features <- SelectIntegrationFeatures(object.list = ifnb.list)
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features,
reduction = "rpca")
# this command creates an 'integrated' data assay
immune.combined <- IntegrateData(anchorset = immune.anchors)
# specify that we will perform downstream analysis on the corrected data note that the
# original unmodified data still resides in the 'RNA' assay
DefaultAssay(immune.combined) <- "integrated"
# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
p1 + p2
image.png
Modifying the strength of integration
结果表明,基于rpca的整合更为保守,在这个例子中,细 a subset of cells ( naive and memory T cells)在实验中并不perfectly align 。可以通过增加 k.anchor 参数来增加alignment的强度,该参数默认设置为5。将此参数增加到20将有助于调整这些群。
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features, reduction = "rpca",
k.anchor = 20)
immune.combined <- IntegrateData(anchorset = immune.anchors)
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE, repel = TRUE)
p1 + p2
image.png
Performing integration on datasets normalized with SCTransform
set the method
parameter to glmGamPoi
(install here)in order to enable faster estimation of regression parameters in SCTransform()
LoadData("ifnb")
ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform, method = "glmGamPoi")
features <- SelectIntegrationFeatures(object.list = ifnb.list, nfeatures = 3000)
ifnb.list <- PrepSCTIntegration(object.list = ifnb.list, anchor.features = features)
ifnb.list <- lapply(X = ifnb.list, FUN = RunPCA, features = features)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, normalization.method = "SCT",
anchor.features = features, dims = 1:30, reduction = "rpca", k.anchor = 20)
immune.combined.sct <- IntegrateData(anchorset = immune.anchors, normalization.method = "SCT", dims = 1:30)
immune.combined.sct <- RunPCA(immune.combined.sct, verbose = FALSE)
immune.combined.sct <- RunUMAP(immune.combined.sct, reduction = "pca", dims = 1:30)
# Visualization
p1 <- DimPlot(immune.combined.sct, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined.sct, reduction = "umap", group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
p1 + p2
method = "glmGamPoi"
ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform, method = "glmGamPoi")
image.png
无method参数
ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform)
image.png