初学单细胞转录组差异分析

2020-09-11 本文已影响0人小贝学生信

单细胞转录组差异分析主要分为两步：首先构建Seurat对象，然后查找或添加分组信息，最后执行差异分析即可。
转录组差异分析在之前单细胞富集分析里，也有提到。

1、下载示例数据

GSE81861
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE81861
如下图，选取其中两组样本进行比较。
1

这个GEO提供的是分析好的csv文件，分为FPKM与COUNTS两类。之后或学习二者的区别。

2、构建对象

由于和之前的三要素原始文件不同，我们下载的是组装好的表达矩阵。因此构建Seurat对象也有所不同。

#表达矩阵
rm(list = ls())
nm_e <- read.csv("GSE81861_CRC_NM_epithelial_cells_FPKM.csv/GSE81861_CRC_NM_epithelial_cells_FPKM.csv")
row.names(nm_e) <- nm_e[,1]
nm_e <- nm_e[,-1]
nm_e[1:4,1:4]

tm_e <- read.csv("GSE81861_CRC_tumor_epithelial_cells_FPKM.csv/GSE81861_CRC_tumor_epithelial_cells_FPKM.csv")
row.names(tm_e) <- tm_e[,1]
tm_e <- tm_e[,-1]
tm_e[1:4,1:4]

test <- cbind(nm_e,tm_e)
test <- as.matrix(test)
#原来的基因名太长，就取其中的symbol格式
rownames(test) <- sapply(strsplit(rownames(test),"_"),"[",2)
test[1:4,1:4]

#分组信息
group_dat <- data.frame(group=c(rep('normal',ncol(nm_e)),
                                    rep('tumor',ncol(tm_e))))

rownames(group_dat) <- colnames(test)

#构建Seurat对象
library(Seurat)
scRNA <- CreateSeuratObject(counts=test, 
                        meta.data=group_dat)
dim(scRNA) #共有432个样本，57241个基因
#[1] 57241   432
table(scRNA@meta.data$group)
#normal  tumor 
#   160    272 

#质控
minGene=500
maxGene=4000
scRNA[["percent.mt"]] <- PercentageFeatureSet(scRNA, pattern = "^MT-")
pctMT=30
scRNA <- subset(scRNA, subset = nFeature_RNA > minGene & nFeature_RNA < maxGene & percent.mt < pctMT)
dim(scRNA) #还剩57个样本
table(scRNA@meta.data$group)

3、差异分析

scRNA <- NormalizeData(scRNA, normalization.method = "LogNormalize", scale.factor = 10000)

diff_dat <- FindMarkers(scRNA,ident.1="normal",ident.2="tumor",
                    group.by='group')

diff_dat <- diff_dat[diff_dat$p_val<0.05 & abs(diff_dat$avg_logFC)>0.5,]
test_diff <- test[match(rownames(diff_dat),rownames(test)),]
head(diff_dat)

如下图，其中中间三列的解释，帮助文档解释——

avg_logFC: log fold-chage of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group
即基因在两组样本组表达量平均值的log2后的差值，可以验证下

t.test(test_diff[1,]~group_dat$group)
#mean in group normal  mean in group tumor 
#           6436.4627             278.1733
t.test(test_diff["LYPD8",]~group_dat$group)

pct.1: The percentage of cells where the gene is detected in the first group
pct.2: The percentage of cells where the gene is detected in the second group

3

建议关注pct.1和pct.2之间表达差异较大的，以及较大的倍数变化的差异基因。

4、可视化

#boxplot
library(ggpubr)
bp <- function(i){
  df <- data.frame(gene=test_diff[i,], group=group_dat$group)
  ggboxplot(df,x="group",y="gene",
            color = "group",add = "jitter",
            ylab = rownames(test_diff)[i]) +
    theme_bw()
}
bp(2)

image.png

#heatmap
n <- t(scale(t(test_diff)))
n[n>2]=2;n[n< -2]= -2
library(pheatmap)
pheatmap(n, show_rownames = F,
         show_colnames = F,
         annotation_col = group_dat)

初学单细胞转录组差异分析

1、下载示例数据

2、构建对象

3、差异分析

4、可视化

猜你喜欢

热点阅读