基于Seurat v3 object 从头构建 monocle的

2020-03-18 本文已影响0人 CuteCurse

文章转自
作者：阿糖胞苷
链接：https://www.jianshu.com/p/41372e039194
来源：简书
简书著作权归作者所有，任何形式的转载都请联系作者获得授权并注明出处。

Monocle2直接输入Seurat object只适用于Seuratv2.0版本的Seurat object。Monocle3无法将Seurat Object 转为cds,
自己手动构建celldataset,--De novo construct monocle v2 的 celldataset

Although Monocle can be used with raw read counts, these are not directly proportional to expression values unless you normalize them by length, so some Monocle functions could produce nonsense results. If you don't have UMI counts, We recommend you load up FPKM or TPM values instead of raw read counts.

1.Generate the Required Format Files

a. expression matrix
(bulk-raw reads count < TPM; RPKM/FPKM < UMI)
b.featuredata (fd) 基因特征注释矩阵
c.phenodata (pd) 细胞特征注释矩阵

library(monocle)
library(Seurat)
data<-readRDS("../myo_0509.rds")

a. construct expr-matrix (细胞-基因表达矩阵)

Seurat object中的@assay中的@counts会存放单细胞测序的raw data (UMI)，所以选择将@counts转换为expression matrix

1.data@assays$RNA@data

存放 relative expression values （TPM, FPKM/RPKM）

2.data@assays$RNA@counts

存放 absolute transcript counts （TPM, FPKM/RPKM）

data_matrix<-as(as.matrix(data@assays$RNA@counts), 'sparseMatrix')

!#UMI counts 存储为稀疏矩阵 save more memeory
!#大多的matrix 都是sparseMatrix format(eg:MTX),DON'T convert it into dense matrix.

b. construct featuredata 基因特征注释矩阵

featuredata需要两个col，一个是gene_id,一个是gene_short_name,
row对应counts的rownames

feature_ann<-data.frame(gene_id=rownames(data_matrix),gene_short_name=rownames(data_matrix))
rownames(feature_ann)<-rownames(data_matrix)
data_fd<-new("AnnotatedDataFrame", data = feature_ann)

b. construct phenodata 细胞特征注释矩阵

Seurat object中的@meta.data一般会存放表型相关的信息如cluster、sample的来源、group等，所以选择将metadata转换为phenodata

sample_ann<-data@meta.data
rownames(sample_ann)<-colnames(data_matrix)
data_pd<-new("AnnotatedDataFrame", data =sample_ann)

2. Creat CDS object

create cds object --Use the right distribution! specify the appropriate model

data.cds<-newCellDataSet(data_matrix,phenoData =data_pd,featureData =data_fd,expressionFamily=negbinomial.size())

!#Converting TPM/FPKM values into mRNA counts （alternative:）
!#if you first convert your relative expression values to transcript counts using relative2abs().
!#This often leads to much more accurate results than using tobit()
!#UMIs /read counts- negbinomial.size()

查看phenodata、featuredata

head(pData(data.cds))

head(fData(data.cds))