sc-RAN-seq 数据分析||Seurat新版教程: Usi
2019-06-19 本文已影响56人
周运来就是我
本教程介绍了Seurat包与sctransform 包一起分析时候的一般方法。首先你也许会问:sctransform包是用来干嘛的?
这是一个值得关心和探讨的问题,我也将做一个简单的回答。最好的办法当然是到包所在的官网去读人家的自我介绍了sctransform: Variance Stabilizing Transformations for Single Cell UMI Data
A normalization method for single-cell UMI count data using a variance stabilizing transformation. The transformation is based on a negative binomial regression model with regularized parameters. As part of the same regression framework, this package also provides functions for batch correction, and data correction. See Hafemeister and Satija 2019 <doi:10.1101/576827> for more details.
有没有感觉很强大,在Seurat里面被封为SCTransform()函数可以:
- 代替 NormalizeData, ScaleData, and FindVariableFeatures.三个函数
- 转换完了在SCT assay 里
- 在均一化的同时可以移除线粒体细胞等的影响
library(Seurat)
packageVersion("Seurat")
# https://satijalab.org/seurat/mca.html
library(dplyr)
library(ggsci)
library(ggplot2)
library(sctransform)
# Load the PBMC dataset
list.files("D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
?Read10X
pbmc.data <- Read10X(data.dir = "D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
# Initialize the Seurat object with the raw (non-normalized data).
?CreateSeuratObject
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc
一步数据均一化,也是比较慢的。
# store mitochondrial percentage in object meta data
pbmc <- PercentageFeatureSet(pbmc, pattern = "^MT-", col.name = "percent.mt")
# run sctransform
pbmc <- SCTransform(pbmc, vars.to.regress = "percent.mt", verbose = FALSE)
pbmc
An object of class Seurat
26286 features across 2700 samples within 2 assays
Active assay: SCT (12572 features)
1 other assay present: RNA
然后是标准操作。
# These are now standard steps in the Seurat workflow for visualization and clustering
pbmc <- RunPCA(pbmc, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindClusters(pbmc, verbose = FALSE)
DimPlot(pbmc, label = TRUE) + NoLegend()
为什么在转化之后PC轴要比之前我们用NormalizeData 的时候要多呢?
- 更有效地消除技术影响
- 更高的PC维度可能代表一些微妙的生物学差异
- In addition, sctransform returns 3,000 variable features by default, instead of 2,000. The rationale is similar, the additional variable features are less likely to be driven by technical differences across cells, and instead may represent more subtle biological fluctuations.
我们还可以像下面这样去运行:
pbmc <- CreateSeuratObject(pbmc_data) %>% PercentageFeatureSet(pattern = "^MT-", col.name = "percent.mt") %>%
SCTransform(vars.to.regress = "percent.mt") %>% RunPCA() %>% FindNeighbors(dims = 1:30) %>%
RunUMAP(dims = 1:30) %>% FindClusters()
Where are normalized values stored for sctransform?
- pbmc[["SCT"]]@scale.data
- pbmc[["SCT"]]@data 用于可视化的数据
- pbmc[["SCT"]]@counts 校正后的count
- You can use the corrected log-normalized counts for differential expression and integration. However, in principle, it would be most optimal to perform these calculations directly on the residuals (stored in the scale.data slot) themselves. This is not currently supported in Seurat v3, but will be soon.
# These are now standard steps in the Seurat workflow for visualization and clustering Visualize
# canonical marker genes as violin plots.
VlnPlot(pbmc, features = c("CD8A", "GZMK", "CCL5", "S100A4", "ANXA1", "CCR7", "ISG15", "CD3D"),
pt.size = 0.2, ncol = 4)
# Visualize canonical marker genes on the sctransform embedding.
FeaturePlot(pbmc, features = c("CD8A", "GZMK", "CCL5", "S100A4", "ANXA1", "CCR7"), pt.size = 0.2,
ncol = 3)
FeaturePlot(pbmc, features = c("CD3D", "ISG15", "TCL1A", "FCER2", "XCL1", "FCGR3A"), pt.size = 0.2,
ncol = 3)