甲基化芯片分析(1)-conumee
2023-01-19 本文已影响0人
Insc
conumee是目前通过甲基化数据计算拷贝数变异(CNV)主流可视化方法之一,相较于ChAMP包内置的CNV可视化结果具有非常明显的优越性。但是目前对这个R包的教程较少,前期在参考官方指南的基础上,对R包的使用进行了一些探索。整理在此。
1.conumee包安装
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("conumee")
2.测试数据加载
library("minfi")
library("conumee")
#RGsetTCGA <- read.450k.exp(base = "~/conumee_analysis/jhu-usc.edu_BRCA.HumanMethylation450.Level_1.8.8.0") # adjust path
RGsetTCGA <- read.450k.url() # use default parameters for vignette examples
MsetTCGA <- preprocessIllumina(RGsetTCGA)
MsetTCGA
3.创建探针注释对象
data(exclude_regions)
data(detail_regions)
head(detail_regions, n = 2)
## GRanges object with 2 ranges and 5 metadata columns:
## seqnames ranges strand | name thick
## <Rle> <IRanges> <Rle> | <character> <IRanges>
## [1] chr11 69453874-69469242 + | CCND1 68961558-69961557
## [2] chr19 30300902-30315215 + | CCNE1 29808059-30808058
## score probes_gene probes_promoter
## <integer> <integer> <integer>
## [1] 70 38 32
## [2] 16 10 6
## -------
## seqinfo: 11 sequences from an unspecified genome; no seqlengths
anno <- CNV.create_anno(array_type = "450k", exclude_regions = exclude_regions, detail_regions = detail_regions)
## using genome annotations from UCSC
## getting 450k annotations
## - 470870 probes used
## importing regions to exclude from analysis
## importing regions for detailed analysis
## creating bins
## - 53891 bins created
## merging bins
## - 15820 bins remaining
anno
## CNV annotation object
## created : Tue Nov 1 17:18:13 2022
## @genome : 22 chromosomes
## @gap : 313 regions
## @probes : 470870 probes
## @exclude : 3 regions (overlapping 570 probes)
## @detail : 20 regions (overlapping 768 probes)
## @bins : 15820 bins (min/avg/max size: 50/168.5/5000kb, probes: 15/29.7/478)
4.整合信号值矩阵
data(tcgaBRCA.sentrix2name) # TCGA sample IDs are supplied with the conumee package
sampleNames(MsetTCGA) <- tcgaBRCA.sentrix2name[sampleNames(MsetTCGA)]
tcga.data <- CNV.load(MsetTCGA)
tcga.controls <- grep("-11A-", names(tcga.data))
names(tcga.data)
## [1] "TCGA-AR-A1AU-01A-11D-A12R-05" "TCGA-AR-A1AY-01A-21D-A12R-05"
tcga.data
## CNV data object
## created :
## @intensity : available (2 samples, 485512 probes)
minfi.data <- CNV.load(MsetEx)
minfi.controls <- pData(MsetEx)$status == "normal"
# controls.data <- CNV.load(MsetControls)
5.进行CNV分析
这部分是CNV分析的核心步骤,但是值得一提的是,目前CNV分析都是对单个样本进行展示
x <- CNV.fit(minfi.data["GroupB_1"], minfi.data[minfi.controls], anno)
# y <- CNV.fit(tcga.data["TCGA-AR-A1AU-01A-11D-A12R-05"], controls.data, anno) # CopyNumber450kData package is deprecated
# z <- CNV.fit(tcga.data["TCGA-AR-A1AY-01A-21D-A12R-05"], controls.data, anno)
y <- CNV.fit(tcga.data["TCGA-AR-A1AU-01A-11D-A12R-05"], minfi.data[minfi.controls], anno) # minfiData control samples
z <- CNV.fit(tcga.data["TCGA-AR-A1AY-01A-21D-A12R-05"], minfi.data[minfi.controls], anno)
x <- CNV.bin(x)
x <- CNV.detail(x)
x <- CNV.segment(x)
y <- CNV.segment(CNV.detail(CNV.bin(y)))
z <- CNV.segment(CNV.detail(CNV.bin(z)))
x
## CNV analysis object
## created : Tue Nov 1 17:18:41 2022
## @name : GroupB_1
## @anno : 22 chromosomes, 470870 probes, 15820 bins
## @fit : available (noise: 0.237)
## @bin : available (shift: -0.018)
## @detail : available (20 regions)
## @seg : available (48 segments)
6.结果可视化和文件保存
#pdf("~/conumee_analysis/CNVplots.pdf", height = 9, width = 18)
CNV.genomeplot(x)
###可以对染色体的CNV信息进行不同尺度的展示。
CNV.genomeplot(y)
CNV.genomeplot(y, chr = "chr6")
CNV.genomeplot(z)
CNV.genomeplot(z, chr = "chr10")
CNV.detailplot(z, name = "PTEN")
CNV.detailplot_wrap(z)
#dev.off()
head(CNV.write(y, what = "segments"), n = 5)
## ID chrom loc.start loc.end num.mark bstat
## 1 TCGA-AR-A1AU-01A-11D-A12R-05 chr1 635684 88100000 726 29.32946
## 2 TCGA-AR-A1AU-01A-11D-A12R-05 chr1 88675000 118475000 163 32.18477
## 3 TCGA-AR-A1AU-01A-11D-A12R-05 chr1 119000000 249195311 704 NA
## 4 TCGA-AR-A1AU-01A-11D-A12R-05 chr10 105000 135462374 840 NA
## 5 TCGA-AR-A1AU-01A-11D-A12R-05 chr11 130000 50516927 326 22.74454
## pval seg.mean seg.median
## 1 2.906752e-186 0.009 0.015
## 2 1.991610e-224 -0.264 -0.254
## 3 NA 0.097 0.098
## 4 NA -0.023 -0.021
## 5 5.343554e-112 0.141 0.136
#CNV.write(y, what = "segments", file = "~/conumee_analysis/TCGA-AR-A1AU.CNVsegments.seg") # adjust path
#CNV.write(y, what = "bins", file = "~/conumee_analysis/TCGA-AR-A1AU.CNVbins.igv")
#CNV.write(y, what = "detail", file = "~/conumee_analysis/TCGA-AR-A1AU.CNVdetail.txt")
#CNV.write(y, what = "probes", file = "~/conumee_analysis/TCGA-AR-A1AU.CNVprobes.igv")
plot1.png
plot2.png
参考资料
- The conumee vignette (bioconductor.org)
- Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL, Fan JB, Shen R: High density DNA methylation array with single CpG site resolution. Genomics 2011, 98:288–295.
- Sturm D, Witt H, Hovestadt V, Khuong-Quang D-A, Jones DT, Konermann C, Pfaff E, Toenjes M, Sill M, Bender S, Kool M, Zapatka M, Becker N, Zucknick M, Hielscher T, Liu XY, Fontebasso AM, Ryzhova M, Albrecht S, Jacob K, Wolter M, Ebinger M, Schuhmann MU, Meter T van, Fruehwald MC, Hauch H, Pekrun A, Radlwimmer B, Niehues T, Komorowski G von, et al.: Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. Cancer Cell 2012, 22:425–437.
- Feber A, Guilhamon P, Lechner M, Fenton T, Wilson GA, Thirlwell C, Morris TJ, Flanagan AM, Teschendorff AE, Kelly JD, Beck S: Using high-density DNA methylation arrays to profile copy number alterations. Genome Biology 2014, 15.
- Aryee MJ, Jaffe AE, Hector C, Christine L, Feinberg AP, Hansen KD, Irizarry RA: Minfi: A flexible and comprehensive bioconductor package for the analysis of infinium DNA methylation microarrays. Bioinformatics 2014, 30:1363–1369.
- Olshen AB, Venkatraman E, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 2004, 5:557–572.