10X空间转录组数据分析之思路总结（针对肿瘤样本）

2021-06-07 本文已影响0人单细胞空间交响乐

hello，大家好，今天我们来分享一下有关空间转录组研究肿瘤样本切片的一些思路，主要的参考文献是Comprehensive Analysis of Spatial Architecture in Primary Liver Cancer，里面的方法都很经典，值得我们多多关注，关于文献的分析内容呢，大家感兴趣可以看一看，我们这里呢就总结方法思路，希望大家能用的到，我们逐步来分析

第一部分、取样，肿瘤样本的切片取样这个也很有讲究，如下图。（客户的切片是不可以展示的，我这里采用了文献的切片）。

图片.png

采用切片的思路就是三部分都要取到，1、正常区域、2、肿瘤区域、3、边界区域，三者必不可少。但是更加建议的是一个冷冻块至少切三个片，分别是，只有正常区域的切片、只有肿瘤区域的切片和包含正常肿瘤区域的切片，这样包含的信息最为全面。如下图：

图片.png

第二部分，空间转录组的基本分析，这个地方也需要各位注意，重点的地方我加粗

For the gene-spot matrixes generated by Space Ranger, some routine statistical analyses were performed firstly, including calculating the number of the detected UMIs (nUMI), and genes (nGene) in each spot. Based on them, the basic quality controls (QC) were applied on the data. In detail, the spots with extremely low nUMI or nGene (outliers), and the spots isolated from the main tissue sections were removed. The genes expressed in less than 3 spots, and mitochondrial, ribosomal genes were filtered. 其实个人建议不要去除，不过具体情况具体分析，不能干扰下游分析的真实性。

第三部分，空间转录组数据的整合分析

After QC, we used the R package harmony (v1.0) (30) to integrate the expression data from different sections of each patient, and used the Seurat package (v3.1.5) to perform the basic downstream analysis and visualization. In detail, we firstly combined the expression matrixes of each patient’s all sections, and performed normalization, log-transformation, centering and scaling on them. Next, we identified 2,000 highly variable genes according to their expression means and variances. Based on them, principal components analysis (PCA) was performed to project the spots into a low-dimensional space, which was defined by the first 20 principal components (PCs). Then, by setting the section source as the batch factor and using the “RunHarmony” function, we iteratively corrected the spots’ low-dimensional PC representation to reduce the of impact of batch effect. After this step, the corrected PC matrixes were used to perform unsupervised shared-nearest-neighbor-based clustering and UMAP (uniform manifold approximation and projection) visualization analysis further. And to compare the clusters at gene level, we identified differentially expressed genes of the all or selected clusters by using fold-change analysis and Wilcoxon Rank Sum test with Bonferroni correction.

第三部分这个地方大家应该都很熟悉了吧，就是单细胞做harmony矫正的做法，这个地方没做过的面壁反思一下。

第四部分，Cluster similarity analysis，这个也是一个比较常规的点，不过一般都是10X单细胞数据在用，整合分析之后每个cluster会包含不同的样本，在每个切片上的空间位置也千差万别，不过能聚类到一起，说明表达相似，这里的工作就是比较这些cluster的相关性。

For the clusters from different patients, we represented them by their spots’ average expression profiles (the log-transformed normalization values). To reduce the impact of extreme values, we excluded some outlier spots in advance, whose first three PC values beyond the range of the mean±3standard deviation of the cluster they belonged to. Moreover, only the genes with the mean above 0.1 and the variance above 0.05 across all the cluster expression vectors were retained for the downstream comparison analyses 这个地方还是很值得注意的，剔除异常值采用的是whose first three PC values beyond the range of the mean±3standard deviation of the cluster they belonged to，表示很赞用。

To measure the clusters’ similarities across patients, we preformed two types of analyses, hierarchical clustering and low-dimensional projection. In detail, we firstly applied PCA on the centered and scaled clusters’ average expression profiles, and used the first five PCs to perform hierarchical clustering，这里的层次聚类采用了前五个PC，这样的层次聚类大家可以学一学.

图片.png

层次聚类图上的信息也很丰富，colorbar采用的是平均值。

Besides, the diffusion map was used to project clusters of different patients into a two-dimension space (the first two diffusion components) based on the package destiny (34) with default parameter setting

图片.png

For convenience of comparison, we annotated each cluster with a region label (normal, stromal, or tumor), which was decided by integrating the information of the cluster’s marker genes and H&E staining images.也是非常好的一个点，明显diffusion map 的结果具有区域性，相同的区域一般聚集在一起。

第五部分，Cell type scoring by a signature-based strategy

At the current Visium ST resolution, each spot may contain approximately 8-20 cells, so that we couldn’t assign a certain cell type for each spot.（这也是限制10X空间转录组发展的最大原因）。Considering this, to compare the distribution of cell types across the tissue sections, we proposed a signature-based strategy to score the cell type enrichments in each spot.（marker gene的富集，这个方式我在我的公开课上提到过，marker gene富集的方式看看各个地方的细胞类型的富集程度）.

做marker gene富集的步骤，我们来看一下文章是怎么做的

第一步，we curated a set of gene signatures of common cell types in liver cancer based on the Xcell signatures and biology prior knowledge（找marker gene）

图片.png

第二步，很关键，Then, we defined the average log-transformed normalization expression values of the genes in the signature as the corresponding cell type scores.（这个富集分数的计算方式，让我猝不及防~~~~~😄）。

第三步，Taking advantage of these scores, the cell type relative enrichment degree across different tissue regions can be compared.（嗯，梯度比较，这个就比较正常了）。By testing on some single cell RNA-seq datasets of liver cancer, we proved that our curated gene signatures had high sensitivity and specificity.(marker gene的验证确实很重要)。

后面作者还进行了MIA的分析模式，关于MIA这里就不展开讲了，大家可以参考我的文章MIA用于单细胞和空间的联合分析。which determined the cell type enrichment degrees by performing hypergeometric test on the overlap between the tissue region-specific genes of ST data and the cell type-specific genes of single cell data.

Here, we took advantage of cell type annotation and differential expression gene results of a liver cancer single cell dataset and performed MIA on the clusters of our ST data, so that we can use the p-values of hypergeometric test to measure the enrichment of different cell types in each cluster（下图C）

图片.png

By comparing these enrichment degrees and the mean values of our signature-based cell type scores of the all ST clusters, we observed generally high correlation（上图D）。which proved the reliability of our signature-based cell type scoring method. At the same time, it had the advantage of not requiring single cell data, which was more flexible.

第六部分，Intratumor spatial heterogeneity measurement ，衡量空间异质性。两个思路transcriptome diversity degree and spatial continuity degree，我们详细看一看。

transcriptome diversity degree,这个地方有点东西

For the transcriptome diversity degree, we firstly calculated the Pearson correlation coefficients between each pair of tumor region spots based on the highly variable genes.（首先计算每个spot的Pearson的相关性）.然后我们将样本的转录组多样性程度定义为这些相关性的中值绝对偏差（MAD）的 1.4826 倍，这是标准偏差的近似值，但可以避免异常值的影响。该度量越大意味着样本肿瘤点之间的相似性具有更大的方差,使样本具有更高的瘤内异质性。公式化地，它可以计算为

图片.png

where e_i indicated the expression vector of the tumor region spot i, and the MAD was defined as

图片.png

spatial continuity degree

first compared the cluster identities of each tumor region spot with its six neighbor spots

Then the total fraction of the neighbor spots with the same cluster identity was defined as the spatial continuity degree.（然后将具有相同簇标识的相邻点的总分数定义为空间连续度。）。该指标测量了肿瘤区域的空间异质性。

The larger this metric meant the sample’s tumor region more tended to be block-like (higher spatial continuity degree and lower spatial mixed degree). Formulaically, it can be calculated as

图片.png

where i indicated a tumor region spot, and I() was the indicative function.

第七部分，GSVA分析，这部分大家应该都知道才对

In detail, the log-transformed normalization expression matrix of tumor spots was inputted into the “gsva” function with the default parameters setting.

to compare the tumor clusters across patients at pathway level, we averaged the resulting GSVA score matrixes over each cluster and performed hierarchical clustering on them with Ward's minimum variance method （这部分相对简单）。

图片.png

第八部分，Spatial gradient change analysis，这个很重要

The spatial gradient distributions of hallmark pathway activities were analyzed on our leading-edge samples (L-sections) and the intact HCC nodule (HCC-5).

For the leading-edge samples, we focused on analyzing the gradient changes from capsules or tumor-normal boundary lines to the both tumor and normal sides.（正常区域向肿瘤区域过度的地方）。

we divided the normal and tumor regions into continuous zones parallel to the shape of the boundary lines at intervals of 5 spots（有点意思）。And the gradient changes along these zones were analyzed。

图片.png

第九部分，空间通讯分析Cluster interaction analysis

这里作者做通讯分析只做临近cluster的通讯分析，For each pair of neighbor tumor clusters, we selected their interface regions with 4 spots wide (2 spots wide for each cluster) and excluded the spots identified as stromal clusters（看来也不是盲目的全部选择,体现了空间做通讯位置的重要性）。

图片.png

方法就是cellphoneDB

图片.png

第10部分，Copy number variation (CNV) comparison analysis

作者直接用空间数据做inferCNV，结果么，文献的结果很符合实际。

图片.png

生活很好，等你超越

10X空间转录组数据分析之思路总结（针对肿瘤样本）

第一部分、取样，肿瘤样本的切片取样这个也很有讲究，如下图。（客户的切片是不可以展示的，我这里采用了文献的切片）。

第二部分，空间转录组的基本分析，这个地方也需要各位注意，重点的地方我加粗

第三部分，空间转录组数据的整合分析