动态观察Harmony整合单细胞转录组
2021-04-17 本文已影响0人
Ace423
今天整个好玩的,动态观察Harmony整合单细胞转录组数据,可以直观感受下Harmony到底做了啥。
首先从10X网站下载三个donor的PBMC数据(https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/frozen_pbmc_donor_a),donor A, donor B, donor C。一系列操作包括标准化,高变基因,PCA之后,可以观察到在多个PC维度三组数据都有区别。
# Select only 2000 cells each dataset to speed up
pbmcA <- pbmcA[, 1:2000]
pbmcB <- pbmcB[, 1:2000]
pbmcC <- pbmcC[, 1:2000]
genes.int <- intersect(rownames(pbmcA), rownames(pbmcB))
genes.int <- intersect(genes.int, rownames(pbmcC))
counts <- cbind(pbmcA[genes.int,], pbmcB[genes.int,], pbmcC[genes.int,])
# CPM normalization
counts_to_cpm <- function(A) {
A@x <- A@x / rep.int(Matrix::colSums(A), diff(A@p))
A@x <- 1e6 * A@x
return(A)
}
cpm <- counts_to_cpm(counts)
log10cpm <- cpm
log10cpm@x <- log10(1 + log10cpm@x)
# variance normalize, identify overdispersed genes
cpm_info <- MUDAN::normalizeVariance(cpm, details = TRUE, verbose = FALSE)
# 30 PCs on overdispersed genes
set.seed(42)
pcs <- getPcs(
mat = log10cpm[cpm_info$ods,],
nGenes = length(cpm_info$ods),
nPcs = 30,
verbose = FALSE
)
donor_colors = c("dodgerblue", "orange", "firebrick")
p <- ggplot(dat_pca[sample(nrow(dat_pca)),]) +
scale_size_manual(values = c(0.9, 0.3, 0.5)) +
scale_color_manual(values = donor_colors) +
theme(legend.position = "none")
p1 <- p + geom_point(aes(PC1, PC2, color = donor, size = donor))
p2 <- p + geom_point(aes(PC3, PC4, color = donor, size = donor))
p3 <- p + geom_point(aes(PC5, PC6, color = donor, size = donor))
p4 <- p + geom_point(aes(PC7, PC8, color = donor, size = donor))
ggarrange(p1, p2, p3, p4, ncol=4, nrow=1)
主要有区别的PC是PC4, PC6和PC7。可以进一步画每个PC的密度图,这里略过。
![](https://img.haomeiwen.com/i24927444/0e87a3a7ca3b2ac3.png)
现在对于PCA矩阵进行Harmony操作,这里iterate 6次。
harmony_iters <- c(0, 1, 2, 3, 4, 5)
res <- lapply(harmony_iters, function(i) {
set.seed(42)
HarmonyMatrix(
theta = 0.15,
data_mat = pcs,
meta_data = meta$donor,
do_pca = FALSE,
verbose = FALSE,
max.iter.harmony = i
)
})
对于每次修正后的PC矩阵都进行UMAP投射,并画成动图。
animate_donor <- function(
filename, x, y,
nframes = 15, width = 250, height = 250 / 1.41, res = 150,
iters = 0:5, hide_legend = FALSE
) {
this_title <- "Harmony Iteration {closest_state}"
# p <- plot_donor(subset(d, iter %in% iters), {{x}}, {{y}})
x <- parse_expr(x)
y <- parse_expr(y)
p <- plot_donor(subset(d, iter %in% iters), !!x, !!y)
if (hide_legend) {
p <- p + theme(legend.position = "none")
} else {
p <- p + ggtitle(this_title)
}
p <- p + transition_states(iter, transition_length = 4, state_length = 1) +
ease_aes("cubic-in-out")
animation <- animate(
plot = p,
nframes = nframes, width = width, height = height, res = res
)
dir.create("animation", showWarnings = FALSE)
print(filename)
anim_save(filename, animation)
}
nframes <- 200
iters <- 0:3
animate_donor(
filename = "./donor-umap.gif",
nframes = nframes, iters = iters,
x = "UMAP1", y = "UMAP2", width = 800, height = 900 / 1.41
)
最终的结果如下图,可以非常直观的看到三组数据是如何融合在一起的,是不是很有趣~
![](https://img.haomeiwen.com/i24927444/e43aa9c7ed495fcb.gif)
Reference:
https://slowkow.com/notes/harmony-animation/