单细胞测序专题集合

单细胞转录组数据分析||Seurat3教程: 自定义降维方法MD

2020-05-08  本文已影响0人  周运来就是我

Seurat - Dimensional Reduction Vignette

我们知道单细胞转录组数据一个主要的特点就是数据稀疏,维度较高。基于此,Seurat提供了不少降维的方法:

主要是PCA,TSNE,UMAP三种,其实降维方法何其的多:

那么,我们如果想对我们的数据应用其他降维方法,我们需要如何操作呢?今天我们就带大家走一走,Seurat对象的【multi-dimensional scaling (MDS)】降维方法。若要求原始空间中样本之间的距离在低维空间中得以保持,即得到"多维缩放" (Multiple Dimensional Scaling,简称 MDS),基于此,来探究降维的一般方法以及进一步了解Seurat的数据结构。

什么,PCA,TSNE,UMAP我还没搞明白呢? MDS是什么意思?看看运来哥上一段感情经历的笔记啊:

数量生态学笔记||非约束排序|NMDS

Seurat3 中的降维结构

在Seurat v3.0中,存储和与维度缩减信息的交互已经被一般化并正式化为DimReduc对象。每个维度缩减过程作为一个命名列表的元素存储在object@slot中的DimReduc对象中。访问这些缩减可以通过[[操作符调用所需的缩减的名称来完成。例如,在使用RunPCA运行主成分分析之后,object[['pca']]将包含pca的结果。通过向列表中添加新元素,用户可以添加额外的、自定义的维度缩减。每个存储的维度缩减包含以下slot:

为了访问这些插槽,我们提供了EmbeddingsLoadingsStdev函数:

library(Seurat)
pbmc_small[["pca"]]

A dimensional reduction object with key PC_ 
 Number of dimensions: 19 
 Projected dimensional reduction calculated:  TRUE 
 Jackstraw run: TRUE 
 Computed using assay: RNA

我们用相应的函数方法来查看一下啊

> head(Embeddings(pbmc_small, reduction = "pca")[, 1:5])  # 细胞  PCA坐标值
                      PC_1       PC_2       PC_3      PC_4       PC_5
ATGCCAGAACGACT -0.77403708 -0.8996461 -0.2493078 0.5585948  0.4650838
CATGGCCTGTGCAT -0.02602702 -0.3466795  0.6651668 0.4182900  0.5853204
GAACCTGATGAACC -0.45650250  0.1795811  1.3175907 2.0137210 -0.4818851
TGACTGGATTCTCA -0.81163243 -1.3795340 -1.0019320 0.1390503 -1.5982232
AGTCAGACTGCACA -0.77403708 -0.8996461 -0.2493078 0.5585948  0.4650838
TCTGATACACGTGT -0.77403708 -0.8996461 -0.2493078 0.5585948  0.4650838
> head(Loadings(pbmc_small, reduction = "pca")[, 1:5])  # 基因在每个主成分中的loading值
              PC_1        PC_2        PC_3        PC_4         PC_5
PPBP    0.33832535  0.04095778  0.02926261  0.03111034 -0.090420744
IGLL5  -0.03504289  0.05815335 -0.29906272  0.54744454  0.214603428
VDAC3   0.11990482 -0.10994433 -0.02386025  0.06015126 -0.809207588
CD1C   -0.04690284  0.19835522 -0.35090617 -0.51112169 -0.130306281
AKR1C3 -0.03894635 -0.42880452  0.08845847 -0.27274386  0.087791646
PF4     0.34392057  0.02474860 -0.02519515 -0.01231411 -0.006725932
> head(Stdev(pbmc_small, reduction = "pca"))  # 标准差
[1] 2.7868782 1.6145733 1.3162945 1.1241143 1.0347596 0.9876531

Seurat提供了RunPCA (pca)和RunTSNE (tsne),并表示了通常应用于scRNA-seq数据的降维技术。当使用这些功能时,所有插槽都会自动填充。

我们还允许用户添加单独计算的自定义维缩减技术的结果(例如,多维缩放(MDS)或零膨胀因子分析)。您所需要的只是一个矩阵,其中包含低维空间中每个单元的坐标,如下所示.

存储自定义维度缩减计算

Classical (Metric) Multidimensional Scaling
Classical multidimensional scaling (MDS) of a data matrix. Also known as principal coordinates analysis (Gower, 1966).

虽然不是作为Seurat包的一部分,但它很容易在r中运行多维缩放(MDS)。如果你有兴趣运行MDS并将输出存储在Seurat对象中:

# Before running MDS, we first calculate a distance matrix between all pairs of cells.  Here we
# use a simple euclidean distance metric on all genes, using scale.data as input
d <- dist(t(GetAssayData(pbmc_small, slot = "scale.data")))
# Run the MDS procedure, k determines the number of dimensions
mds <- cmdscale(d = d, k = 2)

head(mds)
                     [,1]       [,2]
ATGCCAGAACGACT 0.77403708 -0.8996461
CATGGCCTGTGCAT 0.02602702 -0.3466795
GAACCTGATGAACC 0.45650250  0.1795811
TGACTGGATTCTCA 0.81163243 -1.3795340
AGTCAGACTGCACA 0.77403708 -0.8996461
TCTGATACACGTGT 0.77403708 -0.8996461
# cmdscale returns the cell embeddings, we first label the columns to ensure downstream
# consistency
colnames(mds) <- paste0("MDS_", 1:2)
# We will now store this as a custom dimensional reduction called 'mds'
pbmc_small[["mds"]] <- CreateDimReducObject(embeddings = mds, key = "MDS_", assay = DefaultAssay(pbmc_small))

pbmc_small
An object of class Seurat 
230 features across 80 samples within 1 assay 
Active assay: RNA (230 features)
 3 dimensional reductions calculated: pca, tsne, mds

我们的对象中已经有了mds这个slot了,下面我们像pca , tsne. umap,那样可视化它:

# We can now use this as you would any other dimensional reduction in all downstream functions
DimPlot(pbmc_small, reduction = "mds", pt.size = 0.5)
pbmc_small <- ProjectDim(pbmc_small, reduction = "mds")
MDS_ 1 
Positive:  HLA-DPB1, HLA-DQA1, S100A9, S100A8, GNLY, RP11-290F20.3, CD1C, AKR1C3, IGLL5, VDAC3 
       PARVB, RUFY1, PGRMC1, MYL9, TREML1, CA2, TUBB1, PPBP, PF4, SDPR 
Negative:  SDPR, PF4, PPBP, TUBB1, CA2, TREML1, MYL9, PGRMC1, RUFY1, PARVB 
       VDAC3, IGLL5, AKR1C3, CD1C, RP11-290F20.3, GNLY, S100A8, S100A9, HLA-DQA1, HLA-DPB1 
MDS_ 2 
Positive:  HLA-DPB1, HLA-DQA1, S100A8, S100A9, CD1C, RP11-290F20.3, PARVB, IGLL5, MYL9, SDPR 
       PPBP, CA2, RUFY1, TREML1, PF4, TUBB1, PGRMC1, VDAC3, AKR1C3, GNLY 
Negative:  GNLY, AKR1C3, VDAC3, PGRMC1, TUBB1, PF4, TREML1, RUFY1, CA2, PPBP 
       SDPR, MYL9, IGLL5, PARVB, RP11-290F20.3, CD1C, S100A9, S100A8, HLA-DQA1, HLA-DPB1 
Warning message:
In print.DimReduc(x = redeuc, dims = dims.print, nfeatures = nfeatures.print,  :
  Only 2 dimensions have been computed.
# Display the results as a heatmap
DimHeatmap(pbmc_small, reduction = "mds", dims = 1, cells = 500, projected = TRUE, balanced = TRUE)
VlnPlot(pbmc_small, features = "MDS_1")

查看MDS1维度如何与PC1维度相关性:

# See how the first MDS dimension is correlated with the first PC dimension
FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "PC_1")
FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "tSNE_1")


Dimensional Reduction Vignette

上一篇 下一篇

猜你喜欢

热点阅读