单细胞 & 空间整合去批次方法比较(2)
2022-09-15 本文已影响0人
单细胞空间交响乐
作者,追风少年i
开头先放一张marker表,供大家参考
marker list这一篇内容很简单
接上一篇,上一篇文章单细胞 & 空间整合去批次方法比较介绍了以下几种方法整合去批次的代码
- CCA
- merge
- SCT
- merge & SCT
- merge加harmony
- SCT && harmony
关于上述方法我只强调一点,就是ScaleData的时候vars.to.regress的作用,这个要引起大家的重视。
这一篇我们要进行补充,因为上述的方法仍然存在一定的局限性。面对大数据集,几十上百万的细胞量,上述方法无能为力,R语言的原因,处理起来很慢。
-
rpca的方法:
ifnb.list <- SplitObject(ifnb, split.by = "stim")
# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
# select features that are repeatedly variable across datasets for integration run PCA on each
# dataset using these features
features <- SelectIntegrationFeatures(object.list = ifnb.list)
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})
####rpca
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features, reduction = "rpca"
# this command creates an 'integrated' data assay
immune.combined <- IntegrateData(anchorset = immune.anchors)
这里大家要注意rpca的方法,关于rpca,大家可以参考文章10X单细胞数据整合分析Seurat之rpca(large data,细胞量超过20万),原理我都已经说的很清楚了。
-
rpca + reference
bm280k.list <- SplitObject(bm280k, split.by = "orig.ident")
bm280k.list <- lapply(X = bm280k.list, FUN = function(x) {
x <- NormalizeData(x, verbose = FALSE)
x <- FindVariableFeatures(x, verbose = FALSE)
})
features <- SelectIntegrationFeatures(object.list = bm280k.list)
bm280k.list <- lapply(X = bm280k.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})
anchors <- FindIntegrationAnchors(object.list = bm280k.list, reference = c(1, 2), reduction = "rpca",
dims = 1:50)
bm280k.integrated <- IntegrateData(anchorset = anchors, dims = 1:50)
注意这里的reference在指定的时候一般指定为control样本,或者预后好的样本。
-
bbknn(scanpy,python),这是很多高分文章采用的大细胞量整合方法
import scanpy as sc
import pandas as pd
import seaborn as sns
adata_ref = sc.datasets.pbmc3k_processed() # this is an earlier version of the dataset from the pbmc3k tutorial
adata = sc.datasets.pbmc68k_reduced()
####注意这里的不同,高变基因取交集
var_names = adata_ref.var_names.intersection(adata.var_names)
adata_ref = adata_ref[:, var_names]
adata = adata[:, var_names]
####ref数据预处理
sc.pp.pca(adata_ref)
sc.pp.neighbors(adata_ref)
sc.tl.umap(adata_ref)
sc.tl.ingest(adata, adata_ref, obs='louvain')
adata.uns['louvain_colors'] = adata_ref.uns['louvain_colors'] # fix colors
adata_concat = adata_ref.concatenate(adata, batch_categories=['ref', 'new'])
adata_concat.obs.louvain = adata_concat.obs.louvain.astype('category')
adata_concat.obs.louvain.cat.reorder_categories(adata_ref.obs.louvain.cat.categories, inplace=True) # fix category ordering
adata_concat.uns['louvain_colors'] = adata_ref.uns['louvain_colors'] # fix category colors
####采用bbknn矫正批次
sc.tl.pca(adata_concat)
sc.external.pp.bbknn(adata_concat, batch_key='batch')
####然后进行标准过程即可,多样本就是不停的迭代这个过程。
python版本大细胞量harmony矫正。
import scanpy as sc
import scanpy.external as sce
####大家读取自己的数据
adata = sc.datasets.pbmc3k()
sc.pp.recipe_zheng17(adata)
sc.tl.pca(adata)
####矫正批次
sce.pp.harmony_integrate(adata, 'sample')
sc.pp.neighbors(adata)
sc.tl.umap(adata)
其中sce.pp.harmony_integrate(adata, 'sample')这句其实就是下面这个调用下面的语句
import harmonypy
harmony_out = harmonypy.run_harmony(adata.obsm["X_pca"], adata.obs, 'batch')
adata.obsm[adjusted_basis] = harmony_out.Z_corr.T ###obsm就是降维的数据
大样本量还是推荐scanpy做分析,bbknn或者harmony矫正。
生活很好,有你更好