Seurat——Fast integration using r

2022-04-15  本文已影响0人  重拾生活信心

A modified workflow for the integration of scRNA-seq datasets.

Identify anchors:canonical correlation analysis (‘CCA’)→reciprocal PCA (‘RPCA’)

RPCA: project each dataset into the others PCA space and constrain the anchors by the same mutual neighborhood requirement.

适用条件
CCA:通过识别数据集之间的shared sources of variation,CCA适合在细胞类型保守的数据中识别anchors。在不同实验中,基因表达存在很大差异。因此,当实验条件或疾病状态引入非常强烈的表达变化时,或者当整合不同modality和物种的数据集时,基于CCA的整合可以实现整合分析。然而,基于CCA的集成也可能导致过度校正,尤其是当大量细胞在数据集中不重叠时。

RPCA:基于RPCA的整合运行速度显著加快,也代表了一种更保守的方法,即不同生物状态的细胞在整合后不太可能“对齐”。因此,我们建议在以下情况中使用RPCA:

1.一个数据集中有相当一部分细胞在其他数据集中没有匹配类型
2.数据集来自同一平台(即10倍基因组的多条通道)
3.有大量数据集或细胞需要整合

introduction to scRNA-seq integration类似,但是,本workflow要求用户在integration之前对每个数据集分别运行PCA。运行FindIntegrationAnchors()时,用户还应将“reduce”参数设置为“rpca”。(19-24行)

library(SeuratData)
# install dataset
InstallData("ifnb")
# load dataset
LoadData("ifnb")

# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")

# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
    x <- NormalizeData(x)
    x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})

# select features that are repeatedly variable across datasets for integration run PCA on each
# dataset using these features
features <- SelectIntegrationFeatures(object.list = ifnb.list)
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
    x <- ScaleData(x, features = features, verbose = FALSE)
    x <- RunPCA(x, features = features, verbose = FALSE)
})

immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features, 
reduction = "rpca")

# this command creates an 'integrated' data assay
immune.combined <- IntegrateData(anchorset = immune.anchors)
# specify that we will perform downstream analysis on the corrected data note that the
# original unmodified data still resides in the 'RNA' assay
DefaultAssay(immune.combined) <- "integrated"

# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", group.by = "seurat_annotations", label = TRUE,
    repel = TRUE)
p1 + p2
image.png

Modifying the strength of integration

结果表明,基于rpca的整合更为保守,在这个例子中,细 a subset of cells ( naive and memory T cells)在实验中并不perfectly align 。可以通过增加 k.anchor 参数来增加alignment的强度,该参数默认设置为5。将此参数增加到20将有助于调整这些群。

immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features, reduction = "rpca",
  k.anchor = 20)
immune.combined <- IntegrateData(anchorset = immune.anchors)

immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)


# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE, repel = TRUE)
p1 + p2
image.png

Performing integration on datasets normalized with SCTransform

set the method parameter to glmGamPoi (install here)in order to enable faster estimation of regression parameters in SCTransform()

LoadData("ifnb")
ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform, method = "glmGamPoi")
features <- SelectIntegrationFeatures(object.list = ifnb.list, nfeatures = 3000)
ifnb.list <- PrepSCTIntegration(object.list = ifnb.list, anchor.features = features)
ifnb.list <- lapply(X = ifnb.list, FUN = RunPCA, features = features)


immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, normalization.method = "SCT",
    anchor.features = features, dims = 1:30, reduction = "rpca", k.anchor = 20)
immune.combined.sct <- IntegrateData(anchorset = immune.anchors, normalization.method = "SCT", dims = 1:30)



immune.combined.sct <- RunPCA(immune.combined.sct, verbose = FALSE)
immune.combined.sct <- RunUMAP(immune.combined.sct, reduction = "pca", dims = 1:30)

# Visualization
p1 <- DimPlot(immune.combined.sct, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined.sct, reduction = "umap", group.by = "seurat_annotations", label = TRUE,
    repel = TRUE)
p1 + p2

method = "glmGamPoi"

ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform, method = "glmGamPoi")
image.png

无method参数

ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform)
image.png
上一篇下一篇

猜你喜欢

热点阅读