动态观察Harmony整合单细胞转录组

2021-04-17 本文已影响0人 Ace423

今天整个好玩的，动态观察Harmony整合单细胞转录组数据，可以直观感受下Harmony到底做了啥。

首先从10X网站下载三个donor的PBMC数据(https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/frozen_pbmc_donor_a)，donor A, donor B, donor C。一系列操作包括标准化，高变基因，PCA之后，可以观察到在多个PC维度三组数据都有区别。

# Select only 2000 cells each dataset to speed up
pbmcA <- pbmcA[, 1:2000]
pbmcB <- pbmcB[, 1:2000]
pbmcC <- pbmcC[, 1:2000]

genes.int <- intersect(rownames(pbmcA), rownames(pbmcB))
genes.int <- intersect(genes.int, rownames(pbmcC))
counts <- cbind(pbmcA[genes.int,], pbmcB[genes.int,], pbmcC[genes.int,])

# CPM normalization
counts_to_cpm <- function(A) {
  A@x <- A@x / rep.int(Matrix::colSums(A), diff(A@p))
  A@x <- 1e6 * A@x
  return(A)
}

cpm <- counts_to_cpm(counts)
log10cpm <- cpm
log10cpm@x <- log10(1 + log10cpm@x)

# variance normalize, identify overdispersed genes
cpm_info <- MUDAN::normalizeVariance(cpm, details = TRUE, verbose = FALSE) 

# 30 PCs on overdispersed genes
set.seed(42)
pcs <- getPcs(
  mat     = log10cpm[cpm_info$ods,],
  nGenes  = length(cpm_info$ods),
  nPcs    = 30,
  verbose = FALSE
)

donor_colors = c("dodgerblue", "orange", "firebrick")

p <- ggplot(dat_pca[sample(nrow(dat_pca)),]) +
  scale_size_manual(values = c(0.9, 0.3, 0.5)) +
  scale_color_manual(values = donor_colors) +
  theme(legend.position = "none")

p1 <- p + geom_point(aes(PC1, PC2, color = donor, size = donor))
p2 <- p + geom_point(aes(PC3, PC4, color = donor, size = donor))
p3 <- p + geom_point(aes(PC5, PC6, color = donor, size = donor))
p4 <- p + geom_point(aes(PC7, PC8, color = donor, size = donor))

ggarrange(p1, p2, p3, p4, ncol=4, nrow=1)

主要有区别的PC是PC4， PC6和PC7。可以进一步画每个PC的密度图，这里略过。

plot_zoom_png.png

现在对于PCA矩阵进行Harmony操作，这里iterate 6次。

harmony_iters <- c(0, 1, 2, 3, 4, 5)

res <- lapply(harmony_iters, function(i) {
  set.seed(42)
  HarmonyMatrix(
    theta            = 0.15,
    data_mat         = pcs,
    meta_data        = meta$donor,
    do_pca           = FALSE,
    verbose          = FALSE,
    max.iter.harmony = i
  )
})

对于每次修正后的PC矩阵都进行UMAP投射，并画成动图。

animate_donor <- function(
  filename, x, y,
  nframes = 15, width = 250, height = 250 / 1.41, res = 150,
  iters = 0:5, hide_legend = FALSE
) {
  this_title <- "Harmony Iteration {closest_state}"
  # p <- plot_donor(subset(d, iter %in% iters), {{x}}, {{y}})
  x <- parse_expr(x)
  y <- parse_expr(y)
  p <- plot_donor(subset(d, iter %in% iters), !!x, !!y)
  if (hide_legend) {
    p <- p + theme(legend.position = "none")
  } else {
    p <- p + ggtitle(this_title)
  }
  p <- p + transition_states(iter, transition_length = 4, state_length = 1) +
    ease_aes("cubic-in-out")
  animation <- animate(
    plot = p,
    nframes = nframes, width = width, height = height, res = res
  )
  dir.create("animation", showWarnings = FALSE)
  print(filename)
  anim_save(filename, animation)
}

nframes <- 200
iters <- 0:3
animate_donor(
  filename = "./donor-umap.gif",
  nframes = nframes, iters = iters,
  x = "UMAP1", y = "UMAP2", width = 800, height = 900 / 1.41
)

最终的结果如下图，可以非常直观的看到三组数据是如何融合在一起的，是不是很有趣～

donor-umap.gif

Reference:
https://slowkow.com/notes/harmony-animation/

动态观察Harmony整合单细胞转录组

猜你喜欢

热点阅读