inferCNV——单细胞RNA中的拷贝数变异

2022-12-30 本文已影响0人生命数据科学

InferCNV 用于探索肿瘤单细胞 RNA-Seq 数据，以确定大规模染色体拷贝数变异的证据，例如整个染色体或染色体的大片段的获得或缺失。这是通过与平均或一组参考“正常”细胞相比，探索基因组各个位置的基因表达强度来完成的。生成一个热图，说明每条染色体的相对表达强度，并且很容易看出基因组的哪些染色体区域与正常细胞（或平均值，如果未提供参考正常细胞）相比过多或较少).

1. 安装

按顺序安装

JAGS安装：https://sourceforge.net/projects/mcmc-jags/（安装位置最好选择默认的C盘）
R软件包安装

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") 
BiocManager::install("rjags")
BiocManager::install("infercnv")

2. 准备文件

以下使用的文件均为inferCNV的示例文件

raw_counts_matrix：单细胞转录组矩阵文件

> raw_counts_matrix = read.table(system.file("extdata", "oligodendroglioma_expression_downsampled.counts.matrix.gz", package = "infercnv"))
> raw_counts_matrix[1:5,1:5]
       MGH54_P16_F12 MGH54_P12_C10 MGH54_P11_C11 MGH54_P15_D06 MGH54_P16_A03
A2M                0         0.000         0.000         0.000        0.0000
A4GALT             0         0.000         0.000         0.000        0.0000
AAAS               0        37.008        30.935        21.011        0.0000
AACS               0         0.000         0.000         0.000        1.8049
AADAT              0         0.000         0.000         0.000        0.0000

annotations_file：每个细胞的注释信息，可以为疾病/正常，也可以为具体细胞名称

> annotations_file = read.table(system.file("extdata", "oligodendroglioma_annotations_downsampled.txt", package = "infercnv"),sep = "\t")
> > annotations_file[1:5,]
            V1                   V2
1 MGH54_P2_C12 Microglia/Macrophage
2 MGH36_P6_F03 Microglia/Macrophage
3 MGH53_P4_H08 Microglia/Macrophage
4 MGH53_P2_E09 Microglia/Macrophage
5 MGH36_P5_E12 Microglia/Macrophage

gene_order_file：基因位置信息，这个文件相对固定，可以从网上下载文件处理后得到，也可后台回复获取

3.1 自己处理
- 1. 下载文件（http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_35/gencode.v35.annotation.gtf.gz）
- 1. 下载脚本（https://github.com/broadinstitute/infercnv/raw/master/scripts/gtf_to_position_file.py）
- 1. 在同一目录下运行脚本
```
python gtf_to_position_file.py --attribute_name "gene_name"  gencode.v35.annotation.gtf gene_pos.txt
```

gene_pos.txt即为第3步中所需的输入文件

3.2 示例数据（此处内容与上一步得到的文件不同，如果想使用3.1的数据直接将其读入并赋值给gene_order_file即可）

> gene_order_file = read.table(system.file("extdata", "gencode_downsampled.EXAMPLE_ONLY_DONT_REUSE.txt", package = "infercnv"),sep = "\t")
> gene_order_file[1:5,]
         V1   V2      V3      V4
1    WASH7P chr1   14363   29806
2 LINC00115 chr1  761586  762902
3     NOC2L chr1  879584  894689
4   MIR200A chr1 1103243 1103332
5      SDF4 chr1 1152288 1167411

3. 分析流程

inferCNV的分析有多种方式，有简单的有复杂的，本次推送先讲简单的，复现官方教程的代码

library(infercnv)
raw_counts_matrix = read.table(system.file("extdata", "oligodendroglioma_expression_downsampled.counts.matrix.gz", package = "infercnv"))
annotations_file = read.table(system.file("extdata", "oligodendroglioma_annotations_downsampled.txt", package = "infercnv"),sep = "\t")
gene_order_file = read.table(system.file("extdata", "gencode_downsampled.EXAMPLE_ONLY_DONT_REUSE.txt", package = "infercnv"),sep = "\t")

infercnv_obj = CreateInfercnvObject(raw_counts_matrix=system.file("extdata", "oligodendroglioma_expression_downsampled.counts.matrix.gz", package = "infercnv"),
                                    annotations_file=system.file("extdata", "oligodendroglioma_annotations_downsampled.txt", package = "infercnv"),
                                    delim="\t",
                                    gene_order_file=system.file("extdata", "gencode_downsampled.EXAMPLE_ONLY_DONT_REUSE.txt", package = "infercnv"),
                                    ref_group_names=c("Microglia/Macrophage","Oligodendrocytes (non-malignant)")) 

infercnv_obj = infercnv::run(infercnv_obj,
                             cutoff=1, # cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics
                             out_dir="./output", 
                             cluster_by_groups=TRUE, 
                             denoise=TRUE,
                             HMM=TRUE)

最终在output文件夹中得到分析结果，其中ref_group_names为单细胞数据集中正常细胞的name

4. 结果解释

主要的结果文件：

infercnv.preliminary.png ：初步inferCNV视图（去噪或HMM预测之前）
infercnv.png ：由 inferCNV 生成的最终热图，应用了去噪方法
infercnv.references.txt ：“正常”单元格矩阵数据值
infercnv.observations.txt ：肿瘤细胞矩阵数据值
infercnv.observation_groupings.txt ：集群的肿瘤细胞的组成员资格。
infercnv.observations_dendrogram.txt ：与热图匹配的肿瘤细胞的 newick 格式树状图
需要注意的、能放在文章中的就是infercnv.png
inferCNV_result
需要注意的是，横坐标代表不同细胞，纵坐标代表染色体位置，如果inferCNV结果较为可信的话，在上半部分，热图基本为白底，在下半部分，蓝色条带表示染色体基因低表达（缺失），红色表示高表达，其余文件可根据需要进行进一步分析。

inferCNV——单细胞RNA中的拷贝数变异

1. 安装

2. 准备文件

3. 分析流程

4. 结果解释

猜你喜欢

热点阅读