单细胞

一网打尽scRNA矩阵格式读取和转化(h5 h5ad loom)

2023-09-01  本文已影响0人  生信云笔记

  scRNA矩阵存储的文件格式有10X单细胞测序数据、h5、h5ad、loom:10X单细胞测序数据经过cellranger处理后会得到矩阵的三个文件:matrix.mtx、barcodes.tsv 和genes.tsv;h5、h5ad常见于表达矩阵及注释信息的存储;loom格式更常见于RNA速率(velocyto)、转录因子(SCENIC)分析。

1、10X单细胞测序数据

library(Seuart)

list.files('case1/filtered_feature_bc_matrix')
[1] "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz"

count <- Read10X('case1/filtered_feature_bc_matrix')
obj <- CreateSeuratObject(counts = count, min.cells = 3, min.features = 100, project = "case1")
obj
An object of class Seurat
21966 features across 3267 samples within 1 assay
Active assay: RNA (21966 features, 0 variable features)

2、h5

count <- Read10X_h5('case1/filtered_feature_bc_matrix.h5')
obj <- CreateSeuratObject(counts = count, min.cells = 3, min.features = 100, project = "case1")
library(dior)

obj <- read_h5('fibo_rds.h5')
obj
An object of class Seurat
73202 features across 4257 samples within 2 assays
Active assay: RNA (36601 features, 0 variable features)
 1 other assay present: counts
 3 dimensional reductions calculated: pca, tsne, umap

3、h5ad
  read_h5ad函数需要依赖python的包scanpydiopy,使用前确保这两个包已经安装好,否则先安装一下:pip install scanpy diopy

library(dior)

obj <- read_h5ad('global_raw.h5ad', target.object = "seurat", assay.name = "RNA")
obj
An object of class Seurat
33538 features across 486134 samples within 1 assay
Active assay: RNA (33538 features, 0 variable features)
 2 dimensional reductions calculated: pca, umap

  diopypython版的dior,安装后可以在命令行直接使用:scdior --help查看软件参数,根据提示来使用。

4、loom

library(SCopeLoomR)
library(Seurat)

fibo_loom <- connect("fibo_count.loom")
count <- t(fibo_loom[['matrix']][,])
colnames(count) <- fibo_loom[['col_attrs']][['CellID']][]
rownames(count) <- fibo_loom[['row_attrs']][['Gene']][]

obj <- CreateSeuratObject(counts = count, min.cells = 3, min.features = 100, project = "case1")
obj
An object of class Seurat
21114 features across 4257 samples within 1 assay
Active assay: RNA (21114 features, 0 variable features)

  R包loomR也可以用来处理loom文件,安装devtools::install_github("mojaveazure/loomR", ref="develop"),感兴趣的可以自行尝试。

5、dior
  前面提到这个R包的两个功能,这里展示一下该包所有的功能,一个函数对应一个功能,基本上可以通过名称知道函数的用途。

library(dior)

ls('package:dior')
[1] "df_to_h5"        "h5_to_df"        "h5_to_matrix"    "matrix_to_h5"
[5] "read_h5"         "read_h5ad"       "read_h5part"     "seurat_write_h5"
[9] "write_h5"

6、sceasy
  这个R包也可以用于数据格式的转化,实际使用过程只需使用convertFormat函数即可,参数from = c("anndata", "seurat", "sce", "loom")指定了原始的格式,to = c("anndata", "loom", "sce", "seurat", "cds")指定需要转换为的格式,可以转换的格式组合见下面列表。

devtools::install_github("cellgeni/sceasy")
library(sceasy)

grep('2',ls(asNamespace('sceasy')), value=T)
[1] "anndata2cds"    "anndata2seurat" "loom2anndata"   "loom2sce"
[5] "sce2anndata"    "sce2loom"       "seurat2anndata" "seurat2sce"

convertFormat(obj, from='seurat', to='anndata', outFile='fibo.h5ad')

  这种转换可以是数据对象到文件的转换,也可以是文件到文件的转换。不过,这个包使用起来好像不是那么友好,比如上面从seurat对象想转换为anndata格式就没有成功,并且函数也没有帮助信息,github上面也是简单的介绍。

往期回顾

ggplot2 | 开发自己的画图函数
R包安装的4种姿势
clusterProfiler: No gene can be mapped | 怎么破?
R语言的碎碎念
linux入门学习指南

上一篇下一篇

猜你喜欢

热点阅读