单细胞数据挖掘(2)-Seurat对象构建、质控及绘图(生信技能

2021-01-13 本文已影响0人北欧森林

本笔记来源于B站@生信技能树-jimmy；学习视频链接: 「生信技能树」单细胞数据挖掘

构建seurat对象、质控、绘图

# 2.1 构建seurat对象，质控
# In total, 2,343 cells from tumor cores were included in this analysis.
# quality control standards: 
# 1) genes detected in < 3 cells were excluded; 筛选基因
# 2) cells with < 50 total detected genes were excluded; 筛选细胞 
# 3) cells with ≥ 5% of mitochondria-expressed genes were excluded. 筛选细胞
sessionInfo()
library("Seurat")
?CreateSeuratObject

sce.meta <- data.frame(Patient_ID=group$Patient_ID,
                       row.names = group$sample)
head(sce.meta)
table(sce.meta$Patient_ID)

SingleCell2_1.JPG

# 函数 CreateSeuratObject 有多种多样的执行方式
scRNA = CreateSeuratObject(counts=a.filt,
                           meta.data = sce.meta,
                           min.cells = 3, 
                           min.features = 50)
# counts:a matrix-like object with unnormalized data with cells as columns and features as rows 
# meta.data:Additional cell-level metadata to add to the Seurat object
# min.cells: features detected in at least this many cells. 
# min.features:cells where at least this many features are detected.

head(scRNA@meta.data)
# nCount_RNA：the number of cell total counts
# nFeature_RNA：the number of cell's detected gene

summary(scRNA@meta.data)
scRNA@assays$RNA@counts[1:4,1:4]
# 可以看到，之前的counts矩阵存储格式发生了变化：4 x 4 sparse Matrix of class "dgCMatrix"

SingleCell2_2.JPG

dim(scRNA)
#  20047  2342 -- 仅过滤掉1个细胞

# 接下来根据线粒体基因表达筛选低质量细胞
# Calculate the proportion of transcripts mapping to mitochondrial genes
table(grepl("^MT-",rownames(scRNA)))
# > FALSE 
# > 20047 没有线粒体基因
scRNA[["percent.mt"]] <- PercentageFeatureSet(scRNA, pattern = "^MT-")
# 在meta.data中增加了一列“percent.mt”
head(scRNA@meta.data) 
summary(scRNA@meta.data)
# 结果显示没有线粒体基因，因此这里过滤也就没有意义，但是代码留在这里
# 万一大家的数据里面有线粒体基因，就可以如此这般进行过滤啦。
pctMT=5  # ≥ 5% of mitochondria-expressed genes
scRNA <- subset(scRNA, subset = percent.mt < pctMT)
dim(scRNA)

table(grepl("^ERCC-",rownames(scRNA)))
# >FALSE  TRUE 
# >19961    86     #发现是有ERCC基因
# External RNA Control Consortium，是常见的已知浓度的外源RNA分子spike-in的一种
# 指标含义类似线粒体含量，ERCC含量大，则说明total sum变小
scRNA[["percent.ERCC"]] <- PercentageFeatureSet(scRNA, pattern = "^ERCC-")
head(scRNA@meta.data)
summary(scRNA@meta.data)
rownames(scRNA)[grep("^ERCC-",rownames(scRNA))]
# 可以看到有不少ERCC基因

sum(scRNA$percent.ERCC< 40)
# 较接近原文过滤后的数量2149，但感觉条件有点宽松了
# 网上看了相关教程，一般ERCC占比不高于10%！
sum(scRNA$percent.ERCC< 10)   #就只剩下460个cell，明显低于文献中的数量
pctERCC=40
scRNA <- subset(scRNA, subset = percent.ERCC < pctERCC)
dim(scRNA)
# >20047  2142   #原文为19752  2149
dim(a.filt)
# >23460  2343 未过滤前

# 2.2 可视化
# 图A：观察不同组cell的counts、feature分布
col.num <- length(unique(scRNA@meta.data$Patient_ID))
library(ggplot2)

p1_1.1 <- VlnPlot(scRNA,
                features = c("nFeature_RNA"),
                group.by = "Patient_ID",
                cols =rainbow(col.num)) +
  theme(legend.position = "none") +
  labs(tag = "A")
p1_1.1
p1_1.2 <- VlnPlot(scRNA,
                features = c("nCount_RNA"),
                group.by = "Patient_ID",
                cols =rainbow(col.num)) +
  theme(legend.position = "none") 
p1_1.2
p1_1 <- p1_1.1 | p1_1.2
p1_1

Fig1.A.jpeg

# Quality Control figure
VlnPlot(scRNA,
        features = c("nFeature_RNA","nCount_RNA","percent.ERCC"))

QualityControl.jpeg

#图B：nCount_RNA与对应的nFeature_RNA关系
p1_2 <- FeatureScatter(scRNA, feature1 = "nCount_RNA", feature2 = "nFeature_RNA",
                       group.by = "Patient_ID",pt.size = 1.3) +
  labs(tag = "B")
p1_2

Fig1.B.jpeg

FeatureScatter(scRNA, feature1 = "nCount_RNA", feature2 = "percent.ERCC")
sessionInfo()

FeatureScatter.jpeg

单细胞数据挖掘(2)-Seurat对象构建、质控及绘图(生信技能

猜你喜欢

热点阅读