单细胞转座子元件表达计算——scTE软件的使用

2022-04-04 本文已影响0人 SCU十一

1.软件介绍

这是一个计算单细胞转录组数据中细胞的转座子元件(transposable element, TE)表达量的一个软件，GITHUB网址：https://github.com/JiekaiLab/scTE

2.软件安装

git clone https://github.com/JiekaiLab/scTE.git

cd scTE

python setup.py install

安装完成，加入环境变量；

3.准备需要的文件

①需要Cellranger比对输出的bam文件，possorted_genome_bam.bam

②需要基因组的基因注释文件gtf(是基因，不是TE)。注意，实测最后一列需要加这样一个信息【gene_biotype "protein_coding";】，像这样

chr19 . exon 18372 19826 . + . transcript_id "pal_pou00001.t1"; gene_id "pal_pou00001"; gene_name "pal_pou00001";gene_biotype "protein_codin

chr19 . CDS 18372 19679 . + 0 transcript_id "pal_pou00001.t1"; gene_id "pal_pou00001"; gene_name "pal_pou00001";gene_biotype "protein_codin

chr19 . exon 37139 37785 . - . transcript_id "pal_pou00002.t1"; gene_id "pal_pou00002"; gene_name "pal_pou00002";gene_biotype "protein_codin

chr19 . exon 37871 37965 . - . transcript_id "pal_pou00002.t1"; gene_id "pal_pou00002"; gene_name "pal_pou00002";gene_biotype "protein_codin

chr19 . exon 38290 38355 . - . transcript_id "pal_pou00002.t1"; gene_id "pal_pou00002"; gene_name "pal_pou00002";gene_biotype "protein_codin

③需要一个基因组TE的注释信息，以6列的bed文件格式，如下：

chr19 5537 6168 LTR1 LTR1 -

chr19 7746 8276 LTR2 LTR2 -

chr19 9013 9530 LTR3 LTR3 -

chr19 10083 11948 LTR4 LTR4 +

chr19 12075 13197 LTR5 LTR5 +

chr19 13198 14515 LTR6 LTR6 +

chr19 14545 15114 LTR7 LTR7 +

4.开始运行

①建立索引。注意，整个文件中的染色体只能是chr1~chr50，否则报错。

scTE_build -te pal_chr_LTR.bed -gene pal_chr_gene.gft -o

pal_chr_LTR.bed是TE注释文件；pal_chr_gene.gft是基因的注释文件；pal为输出命名。

会产生一个叫“pal.exclusive.idx”的文件，用于下一步。

②计算TE的表达

scTE -i possorted_genome_bam.bam -o out -x pal.exclusive.idx --hdf5 True -CB CB -UMI UB

完成之后得到一个out.h5ad，整个文件就包含了基因+TE的表达量。

5.把TE表达数据读入到seurat中，需要R包SeuratDisk把h5ad文件转换为h5seurat格式，然后用seurat的函数LoadH5Seurat来载入。

library("SeuratDisk")

Convert('out.h5ad', "h5seurat",overwrite = TRUE,assay = "RNA")

te <- LoadH5Seurat("out.h5seurat")

ltr_name=read.table("all_LTR_name.txt",sep="\t",header=F)ltr_name=as.matrix(ltr_name)

ltr_name=as.character(ltr_name) ##读入你的全部TE的名字文件，方便从全部的文件里面取出TE，而不要基因的表达

te=te[ltr_name,] ##只要TE表达的行，不要基因的

te=te@assays$RNA

te <- CreateSeuratObject(counts = te, project ="te", min.cells =3, min.features =200)

最后的TE就是我们常用的Seurat的数据模式了，可以继续分析了。

单细胞转座子元件表达计算——scTE软件的使用

猜你喜欢

热点阅读