课前准备---空间基因梯度(STG)

2024-07-31  本文已影响0人  单细胞空间交响乐

作者,Evil Genius

可以看基因、细胞、通路的空间梯度

细胞组成和信号传导在不同的生态位中有所不同,这可以诱导细胞亚群中基因表达的梯度。这种空间转录组梯度(STG)是肿瘤内异质性的重要来源,可以影响肿瘤的侵袭、进展和对治疗的反应。
肿瘤组织包含异质性细胞群,在复杂的细胞微环境中具有不同的转录、遗传和表观遗传特征。解剖这种多因素的肿瘤内异质性(ITH)是了解肿瘤发生、转移和治疗耐药性的基础。细胞中转录变异的一个来源是它们的微环境,微环境通过不同的方式塑造基因表达,如细胞间通讯(如配体受体信号)或局部信号提示(如pH值、氧、代谢物)。因此,一些细胞会随着它们的空间定位而表现出渐变的转录变异,这被称为“空间转录组梯度”(STG)。

实现的目标:同时检测到STGs的存在和方向

方法原理:应用NMF从ST数据的基因表达矩阵中获得定量的、可解释的细胞表型,同时检测每个生态位中线性空间梯度的存在和方向。

The LSGI framework and downstream analysis
三个需要回答的生物学问题

1、空间基因梯度的位置
2、空间基因梯度的方向性
3、空间基因梯度的生物学功能

为了实现目标,利用NMF将ST数据中所有细胞或SPOT的基因表达谱分解成多个因子,包括描述细胞组成和调节细胞表型。通过这一步,计算 cell loadings and gene loadings ,分别表明program在细胞/spot水平上的活性和program在基因水平上的属性。
关于空间的数据分析采用slide-window strategy ,在此基础上,cells/spots在overlapping windows中按空间定位分组,然后,使用空间坐标作为预测因子,并将细胞NMF loadings作为目标,对每个NMF program和每组细胞拟合线性模型。使用r平方来评价拟合优度,较大的值表示存在STG。梯度的方向由相应的回归系数决定。这些步骤创建了一个map,其中包含STG的定位和方向,以及它们在一个或多个NMF program中的分配。然后,利用精选的功能基因集,通过统计方法(例如,超几何测试)对program进行功能性注释。并研究在肿瘤ST数据集中,分配给不同程序的梯度的空间关系,或梯度到肿瘤-TME边界的空间关系。

R语言实现,以10X数据为例

library(Seurat)
library(Matrix)
library(RcppML)  
library(ggplot2)
library(dplyr)
library(LSGI)

img <- Read10X_Image(image.dir = "C:/Users/liang/work/42_LSGI/LSGI.test/Visium_FFPE_Human_Breast_Cancer_spatial/",
    image.name = "tissue_lowres_image.png", filter.matrix = TRUE)
data <- Load10X_Spatial(data.dir = "C:/Users/liang/work/42_LSGI/LSGI.test/",
    filename = "Visium_FFPE_Human_Breast_Cancer_filtered_feature_bc_matrix.h5",
    assay = "RNA", slice = "slice1", filter.matrix = TRUE, to.upper = FALSE,
    image = img)

data <- NormalizeData(data)
使用NMF将ST数据中所有细胞或SPOT的基因表达谱汇总到多个program中。
Run NMF(类似PCA)
# define functions as below some code adapted from this
# preprint:
# https://www.biorxiv.org/content/10.1101/2021.09.01.458620v1.full

scan.nmf.mse <- function(obj, ranks = seq(1, 30, 2), tol = 1e-04) {
    # users can customize the scan by changing 'ranks'
    dat <- obj@assays$RNA@data
    errors <- c()
    ranks <- seq(1, 30, 2)
    for (i in ranks) {
        # cat('rank: ', i, '\n')
        mod <- RcppML::nmf(dat, i, tol = 1e-04, verbose = F)
        mse_i <- mse(dat, mod$w, mod$d, mod$h)
        errors <- c(errors, mse_i)
    }
    results <- data.frame(rank = ranks, MSE = errors)
    return(results)
}

sr.nmf <- function(obj, k = 10, tol = 1e-06, assay = "RNA") {
    dat <- obj@assays$RNA@data
    nmf_model <- RcppML::nmf(dat, k = k, tol = tol, verbose = F)
    embeddings <- t(nmf_model$h)
    rownames(embeddings) <- colnames(obj)
    colnames(embeddings) <- paste0("nmf_", 1:k)
    loadings <- nmf_model$w
    rownames(loadings) <- rownames(obj)
    obj@reductions$nmf <- CreateDimReducObject(embeddings = embeddings,
        loadings = loadings, key = "nmf_", assay = assay)
    return(obj)
}

scan.nmf.res <- scan.nmf.mse(obj = data)
ggplot(scan.nmf.res, aes(x = rank, y = MSE)) + geom_point(size = 0.7) +
    geom_smooth(method = "loess", span = 0.2, color = "black",
        linewidth = 1, se = F) + labs(x = "NMF rank", y = "MSE") +
    theme_classic() + scale_y_continuous(expand = c(0.01, 0)) +
    theme(aspect.ratio = 1)

image
Prepare input data for LSGI
# LSGI requires two inputs: spatial_coords and embeddings
# In the current version, we require the spatial_coords
# have colnames as 'X', and 'Y'.
spatial_coords <- data@images$slice1@coordinates[, c(4, 5)]
colnames(spatial_coords) <- c("X", "Y")
print(head(spatial_coords))
#>                        X     Y
#> AAACAAGTATCTCCCA-1 16265 19934
#> AAACACCAATAACTGC-1 18526  7893
#> AAACAGAGCGACTCCT-1  7178 18782
#> AAACAGCTTTCAGAAG-1 14487  6446
#> AAACAGGGTCTATATT-1 15497  7025
#> AAACCGGGTAGGTACC-1 14237  9202

# row names of embeddings are cell/spot names the row names
# of embeddings and spatial_coords should be the same (in
# the same order as well) here
embeddings <- data@reductions$nmf@cell.embeddings
print(embeddings[1:5, 1:5])
#>                           nmf_1        nmf_2        nmf_3        nmf_4
#> AAACAAGTATCTCCCA-1 1.796119e-03 7.531603e-05 5.608306e-05 4.609495e-04
#> AAACACCAATAACTGC-1 2.799021e-05 2.456455e-04 8.588333e-04 7.773138e-04
#> AAACAGAGCGACTCCT-1 4.259168e-04 4.443754e-04 1.912114e-04 0.000000e+00
#> AAACAGCTTTCAGAAG-1 0.000000e+00 1.527858e-04 2.645159e-04 1.147089e-03
#> AAACAGGGTCTATATT-1 6.634297e-05 0.000000e+00 9.323769e-04 3.925687e-05
#>                           nmf_5
#> AAACAAGTATCTCCCA-1 1.404469e-04
#> AAACACCAATAACTGC-1 0.000000e+00
#> AAACAGAGCGACTCCT-1 5.804265e-04
#> AAACAGCTTTCAGAAG-1 0.000000e+00
#> AAACAGGGTCTATATT-1 3.575715e-05

计算空间基因等级
# n.grids.scale: LSGI calculate spatial gradients in
# multiple small neighborhoods (centered in the 'grid
# point'), and this n.grid.scale decide the number of this
# type of neighborhood. The number of neighborhoods equals
# to (total number of cells)/n.grids.scales

# n.cells.per.meta: number of cells/spots for each
# neighborhood
lsgi.res <- local.traj.preprocessing(spatial_coords = spatial_coords,
    n.grids.scale = 5, embeddings = embeddings, n.cells.per.meta = 25)

可视化

# plot multiple factors
plt.factors.gradient.ind(info = lsgi.res, r_squared_thresh = 0.6,
    minimum.fctr = 10)  # plot gradient (had to appear in at least 10 grids)

image
plt.factors.gradient.ind(info = lsgi.res, r_squared_thresh = 0.6,
    sel.factors = c("nmf_5", "nmf_6", "nmf_8"), minimum.fctr = 10)  # plot selected gradients

image
Plot single gradient with NMF loadings
# plot individual factor together with the NMF loadings
plt.factor.gradient.ind(info = lsgi.res, fctr = "nmf_1", r_squared_thresh = 0.6)

image
Distance analysis
dist.mat <- avg.dist.calc(info = lsgi.res, minimum.fctr = 10)  # calculate average distance between NMF gradients
plt.dist.heat(dist.mat)  # plot distance heatmap

image
功能注释
# this can be done in the same way of NMF factor annotation
# there are different ways of doing this analysis, here we
# use hypergeometric test with top 50 genes in each NMF
# (top loadings) here we only use hallmark gene sets as a
# brief example the nmf information can be fetched from the
# Seurat object

get.nmf.info <- function(obj, top.n = 50) {
    feature.loadings <- as.data.frame(obj@reductions$nmf@feature.loadings)

    top.gene.list <- list()
    for (i in 1:ncol(feature.loadings)) {
        o <- order(feature.loadings[, i], decreasing = T)[1:top.n]
        features <- rownames(feature.loadings)[o]
        top.gene.list[[colnames(feature.loadings)[i]]] <- features
    }
    nmf.info <- list(feature.loadings = feature.loadings, top.genes = top.gene.list)
    return(nmf.info)
}

nmf_info <- get.nmf.info(data)
str(nmf_info)  # show the structure of nmf information extracted from the Seurat object after running NMF
#> List of 2
#>  $ feature.loadings:'data.frame':    17943 obs. of  10 variables:
#>   ..$ nmf_1 : num [1:17943] 0.00 5.40e-05 1.51e-05 2.56e-05 0.00 ...
#>   ..$ nmf_2 : num [1:17943] 0.00 9.84e-05 8.26e-06 1.75e-05 0.00 ...
#>   ..$ nmf_3 : num [1:17943] 7.65e-07 7.10e-05 2.20e-05 1.02e-05 0.00 ...
#>   ..$ nmf_4 : num [1:17943] 0.00 4.47e-05 1.29e-05 3.60e-06 0.00 ...
#>   ..$ nmf_5 : num [1:17943] 0.00 7.60e-05 2.17e-05 9.68e-06 0.00 ...
#>   ..$ nmf_6 : num [1:17943] 3.83e-05 1.42e-05 2.75e-05 1.44e-05 0.00 ...
#>   ..$ nmf_7 : num [1:17943] 1.75e-05 2.82e-05 1.85e-05 0.00 2.30e-06 ...
#>   ..$ nmf_8 : num [1:17943] 0.00 2.83e-05 0.00 1.61e-05 1.45e-06 ...
#>   ..$ nmf_9 : num [1:17943] 9.19e-06 3.53e-05 2.34e-05 0.00 0.00 ...
#>   ..$ nmf_10: num [1:17943] 0.00 6.52e-05 0.00 8.09e-06 0.00 ...
#>  $ top.genes       :List of 10
#>   ..$ nmf_1 : chr [1:50] "FGB" "FTH1" "LTF" "PABPC1" ...
#>   ..$ nmf_2 : chr [1:50] "MUCL1" "FTH1" "AZGP1" "TMSB4X" ...
#>   ..$ nmf_3 : chr [1:50] "CD74" "TMSB4X" "B2M" "HLA-DRA" ...
#>   ..$ nmf_4 : chr [1:50] "FTL" "APOE" "APOC1" "CTSD" ...
#>   ..$ nmf_5 : chr [1:50] "COL1A1" "POSTN" "COL1A2" "SPARC" ...
#>   ..$ nmf_6 : chr [1:50] "COL1A1" "COL3A1" "COL1A2" "SPARC" ...
#>   ..$ nmf_7 : chr [1:50] "IGKC" "IGHG2" "IGHA1" "IGLC1" ...
#>   ..$ nmf_8 : chr [1:50] "IFI6" "MUCL1" "ISG15" "IFITM3" ...
#>   ..$ nmf_9 : chr [1:50] "IGFBP7" "VWF" "AQP1" "HSPG2" ...
#>   ..$ nmf_10: chr [1:50] "FTH1" "FTL" "TMSB4X" "IGKC" ...

# obtain gene sets
library(msigdbr)
library(hypeR)

mdb_h <- msigdbr(species = "Homo sapiens", category = "H")

gene.set.list <- list()
for (gene.set.name in unique(mdb_h$gs_name)) {
    gene.set.list[[gene.set.name]] <- mdb_h[mdb_h$gs_name %in%
        gene.set.name, ]$gene_symbol
}

# run hypeR test
mhyp <- hypeR(signature = nmf_info$top.genes, genesets = gene.set.list,
    test = "hypergeometric", background = rownames(nmf_info[["feature.loadings"]]))
hyper.data <- mhyp$data
hyper.res.list <- list()
for (nmf.name in names(hyper.data)) {
    res <- as.data.frame(hyper.data[[nmf.name]]$data)
    hyper.res.list[[nmf.name]] <- res
}

print(head(hyper.res.list[[1]]))  # here we output part of the NMF_1 annotation result
#>                                                                 label    pval
#> HALLMARK_COMPLEMENT                               HALLMARK_COMPLEMENT 8.5e-07
#> HALLMARK_APOPTOSIS                                 HALLMARK_APOPTOSIS 7.0e-05
#> HALLMARK_INTERFERON_GAMMA_RESPONSE HALLMARK_INTERFERON_GAMMA_RESPONSE 2.0e-04
#> HALLMARK_COAGULATION                             HALLMARK_COAGULATION 4.7e-04
#> HALLMARK_ESTROGEN_RESPONSE_LATE       HALLMARK_ESTROGEN_RESPONSE_LATE 2.0e-03
#> HALLMARK_ANDROGEN_RESPONSE                 HALLMARK_ANDROGEN_RESPONSE 2.3e-03
#>                                        fdr signature geneset overlap background
#> HALLMARK_COMPLEMENT                4.2e-05        50     188       7      17943
#> HALLMARK_APOPTOSIS                 1.7e-03        50     155       5      17943
#> HALLMARK_INTERFERON_GAMMA_RESPONSE 3.3e-03        50     193       5      17943
#> HALLMARK_COAGULATION               5.9e-03        50     130       4      17943
#> HALLMARK_ESTROGEN_RESPONSE_LATE    1.9e-02        50     192       4      17943
#> HALLMARK_ANDROGEN_RESPONSE         1.9e-02        50      94       3      17943
#>                                                              hits
#> HALLMARK_COMPLEMENT                CFB,CLU,CP,CTSD,FN1,LTF,S100A9
#> HALLMARK_APOPTOSIS                      APP,CLU,ERBB2,SOD2,SQSTM1
#> HALLMARK_INTERFERON_GAMMA_RESPONSE       B2M,CFB,NAMPT,SOD2,TAPBP
#> HALLMARK_COAGULATION                              CFB,CLU,FGG,FN1
#> HALLMARK_ESTROGEN_RESPONSE_LATE               LSR,LTF,S100A9,XBP1
#> HALLMARK_ANDROGEN_RESPONSE                         AZGP1,B2M,KRT8

# Visualize annotation results
ggplot(hyper.res.list[[1]][1:5, ], aes(x = reorder(label, -log10(fdr)),
    y = overlap/signature, fill = -log10(fdr))) + geom_bar(stat = "identity",
    show.legend = T) + xlab("Gene Set") + ylab("Gene Ratio") +
    viridis::scale_fill_viridis() + theme_classic() + coord_flip() +
    theme(axis.text.x = element_text(color = "black", size = 12,
        angle = 45, hjust = 1), axis.text.y = element_text(color = "black",
        size = 8, angle = 0), panel.border = element_rect(colour = "black",
        fill = NA, size = 1))

image

加载底片

# Finally, for Visium data only, we have the following
# functions that can plot the gradient superimposed with HE
# image

library(magick)
plot.overlay.factor <- function(object, info, sel.factors = NULL,
    r_squared_thresh = 0.6, minimum.fctr = 20) {
    scf <- object@images[["slice1"]]@scale.factors[["lowres"]]
    object <- subset(object, cells = rownames(info$spatial_coords))
    print(identical(rownames(object@meta.data), rownames(info$spatial_coords)))
    object <- rotateSeuratImage(object, rotation = "L90")
    object@meta.data <- cbind(object@meta.data, info$embeddings)

    lin.res.df <- get.ind.rsqrs(info)
    lin.res.df <- na.omit(lin.res.df)
    lin.res.df <- lin.res.df[lin.res.df$rsquared > r_squared_thresh,
        ]
    if (!is.null(sel.factors)) {
        lin.res.df <- lin.res.df[lin.res.df$fctr %in% sel.factors,
            ]
    }
    lin.res.df <- lin.res.df %>%
        group_by(fctr) %>%
        filter(n() >= minimum.fctr) %>%
        ungroup()

    spatial_coords <- info$spatial_coords
    spatial_coords$X <- spatial_coords$X * scf
    spatial_coords$Y <- spatial_coords$Y * scf

    lin.res.df$Xend <- lin.res.df$X + lin.res.df$vx.u
    lin.res.df$Yend <- lin.res.df$Y + lin.res.df$vy.u

    lin.res.df$X <- lin.res.df$X * scf
    lin.res.df$Xend <- lin.res.df$Xend * scf
    lin.res.df$Y <- lin.res.df$Y * scf
    lin.res.df$Yend <- lin.res.df$Yend * scf

    p <- SpatialFeaturePlot(object, features = NULL, alpha = c(0)) +
        NoLegend() + geom_segment(data = as.data.frame(lin.res.df),
        aes(x = X, y = Y, xend = Xend, yend = Yend, color = fctr,
            fill = NULL), linewidth = 0.4, arrow = arrow(length = unit(0.1,
            "cm"))) + scale_color_brewer(palette = "Paired") +
        # scale_fill_gradient(low='lightgrey', high='navy')
        # +
    theme_classic() + theme(axis.text.x = element_text(face = "bold",
        color = "black", size = 12, angle = 0, hjust = 1), axis.text.y = element_text(face = "bold",
        color = "black", size = 12, angle = 0))

    return(p)
}

# Adapted from this link:
# https://github.com/satijalab/seurat/issues/2702#issuecomment-1626010475
rotateSeuratImage <- function(seuratVisumObject, slide = "slice1",
    rotation = "Vf") {
    if (!(rotation %in% c("180", "Hf", "Vf", "L90", "R90"))) {
        cat("Rotation should be either 180, L90, R90, Hf or Vf\n")
        return(NULL)
    } else {
        seurat.visium <- seuratVisumObject
        ori.array <- (seurat.visium@images)[[slide]]@image
        img.dim <- dim(ori.array)[1:2]/(seurat.visium@images)[[slide]]@scale.factors$lowres
        new.mx <- c()
        # transform the image array
        for (rgb_idx in 1:3) {
            each.mx <- ori.array[, , rgb_idx]
            each.mx.trans <- rotimat(each.mx, rotation)
            new.mx <- c(new.mx, list(each.mx.trans))
        }

        # construct new rgb image array
        new.X.dim <- dim(each.mx.trans)[1]
        new.Y.dim <- dim(each.mx.trans)[2]
        new.array <- array(c(new.mx[[1]], new.mx[[2]], new.mx[[3]]),
            dim = c(new.X.dim, new.Y.dim, 3))

        # swap old image with new image
        seurat.visium@images[[slide]]@image <- new.array

        ## step4: change the tissue pixel-spot index
        img.index <- (seurat.visium@images)[[slide]]@coordinates

        # swap index
        if (rotation == "Hf") {
            seurat.visium@images[[slide]]@coordinates$imagecol <- img.dim[2] -
                img.index$imagecol
        }

        if (rotation == "Vf") {
            seurat.visium@images[[slide]]@coordinates$imagerow <- img.dim[1] -
                img.index$imagerow
        }

        if (rotation == "180") {
            seurat.visium@images[[slide]]@coordinates$imagerow <- img.dim[1] -
                img.index$imagerow
            seurat.visium@images[[slide]]@coordinates$imagecol <- img.dim[2] -
                img.index$imagecol
        }

        if (rotation == "L90") {
            seurat.visium@images[[slide]]@coordinates$imagerow <- img.dim[2] -
                img.index$imagecol
            seurat.visium@images[[slide]]@coordinates$imagecol <- img.index$imagerow
        }

        if (rotation == "R90") {
            seurat.visium@images[[slide]]@coordinates$imagerow <- img.index$imagecol
            seurat.visium@images[[slide]]@coordinates$imagecol <- img.dim[1] -
                img.index$imagerow
        }

        return(seurat.visium)
    }
}

rotimat <- function(foo, rotation) {
    if (!is.matrix(foo)) {
        cat("Input is not a matrix")
        return(foo)
    }
    if (!(rotation %in% c("180", "Hf", "Vf", "R90", "L90"))) {
        cat("Rotation should be either L90, R90, 180, Hf or Vf\n")
        return(foo)
    }
    if (rotation == "180") {
        foo <- foo %>%
            .[, dim(.)[2]:1] %>%
            .[dim(.)[1]:1, ]
    }
    if (rotation == "Hf") {
        foo <- foo %>%
            .[, dim(.)[2]:1]
    }

    if (rotation == "Vf") {
        foo <- foo %>%
            .[dim(.)[1]:1, ]
    }
    if (rotation == "L90") {
        foo = t(foo)
        foo <- foo %>%
            .[dim(.)[1]:1, ]
    }
    if (rotation == "R90") {
        foo = t(foo)
        foo <- foo %>%
            .[, dim(.)[2]:1]
    }
    return(foo)
}

# then run
plot.overlay.factor(object = data, info = lsgi.res, sel.factors = NULL)
#> [1] TRUE

image
# or plot any selected factors
plot.overlay.factor(object = data, info = lsgi.res, sel.factors = c("nmf_3",
    "nmf_7"))
#> [1] TRUE

image

生活很好,有你更好

上一篇下一篇

猜你喜欢

热点阅读