甲基化芯片分析（1）-conumee

2023-01-19 本文已影响0人 Insc

conumee是目前通过甲基化数据计算拷贝数变异（CNV）主流可视化方法之一，相较于ChAMP包内置的CNV可视化结果具有非常明显的优越性。但是目前对这个R包的教程较少，前期在参考官方指南的基础上，对R包的使用进行了一些探索。整理在此。

1.conumee包安装

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("conumee")

2.测试数据加载

library("minfi")
library("conumee")
#RGsetTCGA <- read.450k.exp(base = "~/conumee_analysis/jhu-usc.edu_BRCA.HumanMethylation450.Level_1.8.8.0")  # adjust path
RGsetTCGA <- read.450k.url()  # use default parameters for vignette examples
MsetTCGA <- preprocessIllumina(RGsetTCGA)
MsetTCGA

3.创建探针注释对象

data(exclude_regions)
data(detail_regions)
head(detail_regions, n = 2)
## GRanges object with 2 ranges and 5 metadata columns:
##       seqnames            ranges strand |        name             thick
##          <Rle>         <IRanges>  <Rle> | <character>         <IRanges>
##   [1]    chr11 69453874-69469242      + |       CCND1 68961558-69961557
##   [2]    chr19 30300902-30315215      + |       CCNE1 29808059-30808058
##           score probes_gene probes_promoter
##       <integer>   <integer>       <integer>
##   [1]        70          38              32
##   [2]        16          10               6
##   -------
##   seqinfo: 11 sequences from an unspecified genome; no seqlengths

anno <- CNV.create_anno(array_type = "450k", exclude_regions = exclude_regions, detail_regions = detail_regions)
## using genome annotations from UCSC
## getting 450k annotations
##  - 470870 probes used
## importing regions to exclude from analysis
## importing regions for detailed analysis
## creating bins
##  - 53891 bins created
## merging bins
##  - 15820 bins remaining

anno
## CNV annotation object
##    created  : Tue Nov  1 17:18:13 2022
##   @genome   : 22 chromosomes
##   @gap      : 313 regions
##   @probes   : 470870 probes
##   @exclude  : 3 regions (overlapping 570 probes)
##   @detail   : 20 regions (overlapping 768 probes)
##   @bins     : 15820 bins (min/avg/max size: 50/168.5/5000kb, probes: 15/29.7/478)

4.整合信号值矩阵

data(tcgaBRCA.sentrix2name)  # TCGA sample IDs are supplied with the conumee package
sampleNames(MsetTCGA) <- tcgaBRCA.sentrix2name[sampleNames(MsetTCGA)]
tcga.data <- CNV.load(MsetTCGA)
tcga.controls <- grep("-11A-", names(tcga.data))
names(tcga.data)
## [1] "TCGA-AR-A1AU-01A-11D-A12R-05" "TCGA-AR-A1AY-01A-21D-A12R-05"
tcga.data
## CNV data object
##    created   : 
##   @intensity : available (2 samples, 485512 probes)

minfi.data <- CNV.load(MsetEx)
minfi.controls <- pData(MsetEx)$status == "normal"

# controls.data <- CNV.load(MsetControls)

5.进行CNV分析

这部分是CNV分析的核心步骤，但是值得一提的是，目前CNV分析都是对单个样本进行展示

x <- CNV.fit(minfi.data["GroupB_1"], minfi.data[minfi.controls], anno)
# y <- CNV.fit(tcga.data["TCGA-AR-A1AU-01A-11D-A12R-05"], controls.data, anno)  # CopyNumber450kData package is deprecated
# z <- CNV.fit(tcga.data["TCGA-AR-A1AY-01A-21D-A12R-05"], controls.data, anno)
y <- CNV.fit(tcga.data["TCGA-AR-A1AU-01A-11D-A12R-05"], minfi.data[minfi.controls], anno)  # minfiData control samples
z <- CNV.fit(tcga.data["TCGA-AR-A1AY-01A-21D-A12R-05"], minfi.data[minfi.controls], anno)

x <- CNV.bin(x)
x <- CNV.detail(x)
x <- CNV.segment(x)

y <- CNV.segment(CNV.detail(CNV.bin(y)))
z <- CNV.segment(CNV.detail(CNV.bin(z)))

x
## CNV analysis object
##    created   : Tue Nov  1 17:18:41 2022
##   @name      : GroupB_1
##   @anno      : 22 chromosomes, 470870 probes, 15820 bins
##   @fit       : available (noise: 0.237)
##   @bin       : available (shift: -0.018)
##   @detail    : available (20 regions)
##   @seg       : available (48 segments)

6.结果可视化和文件保存

#pdf("~/conumee_analysis/CNVplots.pdf", height = 9, width = 18)
CNV.genomeplot(x)
###可以对染色体的CNV信息进行不同尺度的展示。
CNV.genomeplot(y)
CNV.genomeplot(y, chr = "chr6")

CNV.genomeplot(z)
CNV.genomeplot(z, chr = "chr10")
CNV.detailplot(z, name = "PTEN")
CNV.detailplot_wrap(z)
#dev.off()

head(CNV.write(y, what = "segments"), n = 5)
##                             ID chrom loc.start   loc.end num.mark    bstat
## 1 TCGA-AR-A1AU-01A-11D-A12R-05  chr1    635684  88100000      726 29.32946
## 2 TCGA-AR-A1AU-01A-11D-A12R-05  chr1  88675000 118475000      163 32.18477
## 3 TCGA-AR-A1AU-01A-11D-A12R-05  chr1 119000000 249195311      704       NA
## 4 TCGA-AR-A1AU-01A-11D-A12R-05 chr10    105000 135462374      840       NA
## 5 TCGA-AR-A1AU-01A-11D-A12R-05 chr11    130000  50516927      326 22.74454
##            pval seg.mean seg.median
## 1 2.906752e-186    0.009      0.015
## 2 1.991610e-224   -0.264     -0.254
## 3            NA    0.097      0.098
## 4            NA   -0.023     -0.021
## 5 5.343554e-112    0.141      0.136

#CNV.write(y, what = "segments", file = "~/conumee_analysis/TCGA-AR-A1AU.CNVsegments.seg")  # adjust path
#CNV.write(y, what = "bins", file = "~/conumee_analysis/TCGA-AR-A1AU.CNVbins.igv")
#CNV.write(y, what = "detail", file = "~/conumee_analysis/TCGA-AR-A1AU.CNVdetail.txt")
#CNV.write(y, what = "probes", file = "~/conumee_analysis/TCGA-AR-A1AU.CNVprobes.igv")

plot1.png

plot2.png

参考资料

The conumee vignette (bioconductor.org)
Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL, Fan JB, Shen R: High density DNA methylation array with single CpG site resolution. Genomics 2011, 98:288–295.
Sturm D, Witt H, Hovestadt V, Khuong-Quang D-A, Jones DT, Konermann C, Pfaff E, Toenjes M, Sill M, Bender S, Kool M, Zapatka M, Becker N, Zucknick M, Hielscher T, Liu XY, Fontebasso AM, Ryzhova M, Albrecht S, Jacob K, Wolter M, Ebinger M, Schuhmann MU, Meter T van, Fruehwald MC, Hauch H, Pekrun A, Radlwimmer B, Niehues T, Komorowski G von, et al.: Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. Cancer Cell 2012, 22:425–437.
Feber A, Guilhamon P, Lechner M, Fenton T, Wilson GA, Thirlwell C, Morris TJ, Flanagan AM, Teschendorff AE, Kelly JD, Beck S: Using high-density DNA methylation arrays to profile copy number alterations. Genome Biology 2014, 15.
Aryee MJ, Jaffe AE, Hector C, Christine L, Feinberg AP, Hansen KD, Irizarry RA: Minfi: A flexible and comprehensive bioconductor package for the analysis of infinium DNA methylation microarrays. Bioinformatics 2014, 30:1363–1369.
Olshen AB, Venkatraman E, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 2004, 5:557–572.

甲基化芯片分析（1）-conumee

1.conumee包安装

2.测试数据加载

3.创建探针注释对象

4.整合信号值矩阵

5.进行CNV分析

6.结果可视化和文件保存

参考资料

猜你喜欢

热点阅读