Seurat——Fast integration using r

2022-04-15 本文已影响0人重拾生活信心

A modified workflow for the integration of scRNA-seq datasets.

Identify anchors：canonical correlation analysis (‘CCA’)→reciprocal PCA (‘RPCA’)

RPCA: project each dataset into the others PCA space and constrain the anchors by the same mutual neighborhood requirement.

适用条件
CCA:通过识别数据集之间的shared sources of variation，CCA适合在细胞类型保守的数据中识别anchors。在不同实验中，基因表达存在很大差异。因此，当实验条件或疾病状态引入非常强烈的表达变化时，或者当整合不同modality和物种的数据集时，基于CCA的整合可以实现整合分析。然而，基于CCA的集成也可能导致过度校正，尤其是当大量细胞在数据集中不重叠时。

RPCA：基于RPCA的整合运行速度显著加快，也代表了一种更保守的方法，即不同生物状态的细胞在整合后不太可能“对齐”。因此，我们建议在以下情况中使用RPCA：

1.一个数据集中有相当一部分细胞在其他数据集中没有匹配类型
2.数据集来自同一平台（即10倍基因组的多条通道）
3.有大量数据集或细胞需要整合

与 introduction to scRNA-seq integration类似，但是，本workflow要求用户在integration之前对每个数据集分别运行PCA。运行FindIntegrationAnchors（）时，用户还应将“reduce”参数设置为“rpca”。（19-24行）

library(SeuratData)
# install dataset
InstallData("ifnb")
# load dataset
LoadData("ifnb")

# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")

# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
    x <- NormalizeData(x)
    x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})

# select features that are repeatedly variable across datasets for integration run PCA on each
# dataset using these features
features <- SelectIntegrationFeatures(object.list = ifnb.list)
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
    x <- ScaleData(x, features = features, verbose = FALSE)
    x <- RunPCA(x, features = features, verbose = FALSE)
})

immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features, 
reduction = "rpca")

# this command creates an 'integrated' data assay
immune.combined <- IntegrateData(anchorset = immune.anchors)

# specify that we will perform downstream analysis on the corrected data note that the
# original unmodified data still resides in the 'RNA' assay
DefaultAssay(immune.combined) <- "integrated"

# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", group.by = "seurat_annotations", label = TRUE,
    repel = TRUE)
p1 + p2

image.png

Modifying the strength of integration

结果表明，基于rpca的整合更为保守，在这个例子中，细 a subset of cells ( naive and memory T cells)在实验中并不perfectly align 。可以通过增加 k.anchor 参数来增加alignment的强度，该参数默认设置为5。将此参数增加到20将有助于调整这些群。

immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features, reduction = "rpca",
  k.anchor = 20)
immune.combined <- IntegrateData(anchorset = immune.anchors)

immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)


# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE, repel = TRUE)
p1 + p2

image.png

Performing integration on datasets normalized with SCTransform

set the method parameter to glmGamPoi (install here)in order to enable faster estimation of regression parameters in SCTransform()

LoadData("ifnb")
ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform, method = "glmGamPoi")
features <- SelectIntegrationFeatures(object.list = ifnb.list, nfeatures = 3000)
ifnb.list <- PrepSCTIntegration(object.list = ifnb.list, anchor.features = features)
ifnb.list <- lapply(X = ifnb.list, FUN = RunPCA, features = features)


immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, normalization.method = "SCT",
    anchor.features = features, dims = 1:30, reduction = "rpca", k.anchor = 20)
immune.combined.sct <- IntegrateData(anchorset = immune.anchors, normalization.method = "SCT", dims = 1:30)



immune.combined.sct <- RunPCA(immune.combined.sct, verbose = FALSE)
immune.combined.sct <- RunUMAP(immune.combined.sct, reduction = "pca", dims = 1:30)

# Visualization
p1 <- DimPlot(immune.combined.sct, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined.sct, reduction = "umap", group.by = "seurat_annotations", label = TRUE,
    repel = TRUE)
p1 + p2

method = "glmGamPoi"

ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform, method = "glmGamPoi")

image.png

无method参数

ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform)

image.png

Seurat——Fast integration using r

Identify anchors：canonical correlation analysis (‘CCA’)→reciprocal PCA (‘RPCA’)

Modifying the strength of integration

Performing integration on datasets normalized with SCTransform

method = "glmGamPoi"

无method参数

猜你喜欢

热点阅读