单细胞测序分析: R harmony包 整合多个单细胞数据
2020-08-19 本文已影响0人
JeremyL
Overview of Harmony algorithm
Fast, sensitive and accurate integration of single-cell data with Harmony
#系统需求
- Linux, OS X, 和Windows系统均可以;
- R 版本需要3.4以上
- Python 用户参考harmonypy
#安装
library(devtools)
install_github("immunogenomics/harmony")
#例子
##PCA matrix
Harmony 可以迭代矫正PCA 降维数据;使用PCA数据,需要设置:do_pca=FALSE
data(cell_lines_small)
pca_matrix <- cell_lines_small$scaled_pcs
meta_data <- cell_lines_small$meta_data
harmony_embeddings <- HarmonyMatrix(pca_matrix, meta_data, 'dataset',
do_pca=FALSE)
##\## Output is a matrix of corrected PC embeddings
dim(harmony_embeddings)
harmony_embeddings[seq_len(5), seq_len(5)]
##\## Finally, we can return an object with all the underlying data structures
harmony_object <- HarmonyMatrix(pca_matrix, meta_data, 'dataset',
do_pca=FALSE, return_object=TRUE)
dim(harmony_object$Y) ## cluster centroids
dim(harmony_object$R) ## soft cluster assignment
dim(harmony_object$Z_corr) ## corrected PCA embeddings
head(harmony_object$O) ## batch by cluster co-occurence matrix
##Normalized gene matrix
Harmony期望导入的数据是标准化之后的数据。Harmony 会缩放数据,降维(PCA),最后数据整合。
library(harmony)
my_harmony_embeddings <- HarmonyMatrix(normalized_counts, meta_data, "dataset")
##Seurat
在Seurat分析流程中使用Harmony:Seurat V2 Seurat V3;使用RunHarmony()代替PCA,之后runUMAP().
seuratObj <- RunHarmony(seuratObj, "dataset")
seuratObj <- RunUMAP(seuratObj, reduction = "harmony")
##Harmony with two or more covariates
Harmony 可以基于多个协变量整合数据;整合时,通过向量指定协变量。
my_harmony_embeddings <- HarmonyMatrix(
my_pca_embeddings, meta_data, c("dataset", "donor", "batch_id"),
do_pca = FALSE
)
Seurat 流程中:
seuratObject <- RunHarmony(seuratObject, c("dataset", "donor", "batch_id"))
详细使用方法参考: advanced tutorial
Fast, sensitive and accurate integration of single-cell data with Harmony 文章代码复现见harmony2019
#参考:
Harmony
Fast, sensitive and accurate integration of single-cell data with Harmony