甲基化

ChAMP 甲基化芯片数据分析(450 和 EPIC)

2020-03-02  本文已影响0人  我是爱哭虫小鱼

转载:https://biozx.top/ChAMP.html

ChAMP是用来分析illuminate甲基化数据的包,包括 HumanMethylation450 和 EPIC。

参考:
http://www.bioconductor.org/packages/release/bioc/vignettes/ChAMP/inst/doc/ChAMP.html

甲基化芯片简介

450K和850K采用了两种探针Infinium Ⅰ 和Infinium Ⅱ对甲基化进行测定,Infinium I采用了两种bead(甲基化M和非甲基化U,如图显示),而II只有一种bead(即甲基化和非甲基化在一起),这也导致了它们在后续荧光探测的不同,450K采用了两种荧光探测信号(红光和绿光)

[站外图片上传中...(image-c9437e-1583115824117)]

安装ChAMP

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("ChAMP")

#或者直接安装所有依赖包
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install(c("minfi","ChAMPdata","Illumina450ProbeVariants.db","sva","IlluminaHumanMethylation450kmanifest","limma","RPMM","DNAcopy","preprocessCore","impute","marray","wateRmelon","goseq","plyr","GenomicRanges","RefFreeEWAS","qvalue","isva","doParallel","bumphunter","quadprog","shiny","shinythemes","plotly","RColorBrewer","DMRcate","dendextend","IlluminaHumanMethylationEPICmanifest","FEM","matrixStats","missMethyl","combinat"))


#如果报错,试用
BiocManager::install("YourErrorPackage")

如果ChAMP 版本大于 2.8.3, 必须安装最新的 ChAMPdata package(version 2.8.1)。

如果版本R < 3.5.0,请用以下安装。

source("https://bioconductor.org/biocLite.R")
biocLite("ChAMP")

用自带的测试数据测试

ChAMP包自带两个测试数据集,HumanMethylation450 data (.idat) and simulated EPIC data。450k lung tumor data set包含8个样本(4个肿瘤样本和4个对照)。EPIC数据集包括16个样本,但是都是由一个样本修改DMP 和 DMR而来。

library("ChAMP")

#450测试数据
testDir=system.file("extdata",package="ChAMPdata")
myLoad <- champ.load(testDir,arraytype="450K")

#EPIC测试数据
data(EPICSimData)

ChAMP流程

image

一步跑完整个流程

champ.process(directory = testDir)

分步跑流程

450流程
myLoad <- cham.load(testDir)
# Or you may separate about code as champ.import(testDir) + champ.filter()
CpG.GUI()
champ.QC() # Alternatively: QC.GUI()
myNorm <- champ.norm()
champ.SVD()
# If Batch detected, run champ.runCombat() here.
myDMP <- champ.DMP()
DMP.GUI()
myDMR <- champ.DMR()
DMR.GUI()
myBlock <- champ.Block()
Block.GUI()
myGSEA <- champ.GSEA()
myEpiMod <- champ.EpiMod()
myCNA <- champ.CNA()

# If DataSet is Blood samples, run champ.refbase() here.
myRefbase <- champ.refbase()
EPIC流程

450K array and EPIC array的大多数function是一样的,不同的是EPIC要设置参数arraytype="EPIC"

# myLoad <- champ.load(directory = testDir,arraytype="EPIC")
# We simulated EPIC data from beta value instead of .idat file,
# but user may use above code to read .idat files directly.
# Here we we started with myLoad.

data(EPICSimData)
CpG.GUI(arraytype="EPIC")
champ.QC() # Alternatively QC.GUI(arraytype="EPIC")
myNorm <- champ.norm(arraytype="EPIC")
champ.SVD()
# If Batch detected, run champ.runCombat() here.This data is not suitable.
myDMP <- champ.DMP(arraytype="EPIC")
DMP.GUI()
myDMR <- champ.DMR()
DMR.GUI()
myDMR <- champ.DMR(arraytype="EPIC")
DMR.GUI(arraytype="EPIC")
myBlock <- champ.Block(arraytype="EPIC")
Block.GUI(arraytype="EPIC") # For this simulation data, not Differential Methylation Block is detected.
myGSEA <- champ.GSEA(arraytype="EPIC")
myEpiMod <- champ.EpiMod(arraytype="EPIC")

# champ.CNA(arraytype="EPIC")
# champ.CNA() function call for intensity data, which is not included in our Simulation data.

注意:8G内存的电脑最多可以跑200个样本

ChAMP Pipelines各步骤详细介绍

Loading Data

myLoad <- champ.load(testDir)
#或者从dataset对象导入数据
data(testDataSet)

#.idat files 为原始芯片文件,包括pd file (Sample_Sheet.csv)文件(表型,编号等)
myLoad$pd

##    Sample_Name Sample_Plate Sample_Group Pool_ID Project Sample_Well  Array
## C1          C1           NA            C      NA      NA         E09 R03C02
## C2          C2           NA            C      NA      NA         G09 R05C02
## C3          C3           NA            C      NA      NA         E02 R01C01
## C4          C4           NA            C      NA      NA         F02 R02C01
## T1          T1           NA            T      NA      NA         B09 R06C01
## T2          T2           NA            T      NA      NA         C09 R01C02
## T3          T3           NA            T      NA      NA         E08 R01C01
## T4          T4           NA            T      NA      NA         C09 R01C02
##         Slide                   Basename                  filenames
## C1 7990895118 Test450K/7990895118_R03C02 Test450K/7990895118_R03C02
## C2 7990895118 Test450K/7990895118_R05C02 Test450K/7990895118_R05C02
## C3 9247377086 Test450K/9247377086_R01C01 Test450K/9247377086_R01C01
## C4 9247377086 Test450K/9247377086_R02C01 Test450K/9247377086_R02C01
## T1 7766130112 Test450K/7766130112_R06C01 Test450K/7766130112_R06C01
## T2 7766130112 Test450K/7766130112_R01C02 Test450K/7766130112_R01C02
## T3 7990895118 Test450K/7990895118_R01C01 Test450K/7990895118_R01C01
## T4 7990895118 Test450K/7990895118_R01C02 Test450K/7990895118_R01C02

Filtering Data

新版本的ChAMP包中champ.load()函数已经包含了filter功能
ChAMP提供champ.filter()函数过滤低质量的探针数据等。

myimport <- champ.import(directory=system.file("extdata",package="ChAMPdata"))
myfilter <- champ.filter(beta=myImport$beta,pd=myImport$pd,detP=myImport$detP,beadcount=myImport$beadcount)
#参数介绍:
#beta:default = myImport$beta
#autoimpute参数:可以填补或保留由过滤导致的NA空缺值


#查看甲基化位点分布情况
CpG.GUI(CpG=rownames(myLoad$beta),arraytype="450K")

filter的过滤步骤如下

  1. First filter is for probes with detection p-value (default > 0.01)
  2. ChAMP will filter out probes with <3 beads in at least 5% of samples per probe.(参数filterBeads可修改 )
  3. ChAMP will by default filter out all non-CpG probes contained in your dataset.
  4. by default ChAMP will filter all SNP-related probes.
  5. by default setting, ChAMP will filter all multi-hit probes.
  6. ChAMP will filter out all probes located in chromosome X and Y.
image

Quality Control

champ.QC()函数会画3种图:mdsplot,densityPlot,dendrogram。使用QC.GUI()能画更多图,但耗内存。

image
image
image

Normalization

因为 Illumina beadarrays探针有两种设计type-I and type-II,两种探针化学反应不同,设计也不同,导致分布区域也不同。the type-II 分布 exhibit a reduced dynamic range,需要矫正。

champ.norm()提供四种方法做type-II Normalization。"PBC","BMIQ","SWAN" and "FunctionalNormalize"

myNorm <- champ.norm(beta=myLoad$beta,arraytype="450K",cores=5)

#标准化后可以再作图看看差异
QC.GUI(myNorm,arraytype="450K")

save(myNorm,file="myNorm.rda")

SVD Plot

SVD (singular value decomposition method)用来用于评估数据集中变量的主要成分。这些显著性位点可能与我们感兴趣的生物学现象,也可能与技术相关,如批次效应或群体效应。如果存在批次效应,就进行批次效应的矫正,矫正完之后可以再看看SVD plot。
champ.SVD()分析时会把协变量打印在屏幕上,结果是热图,保存为SVDsummary.pdf文件。黑色表示最显著的p值。

champ.SVD(beta=myNorm,pd=myLoad$pd)
image

差异甲基化分析(DMP & DMR & DMB)

差异分析是多数研究都要分析的,这里包括三种方法:DMP,DMR,DMB。DMP代表找出Differential Methylation Probe(差异化CpG位点),DMR代表找出Differential Methylation Region(差异化CpG区域),Block代表Differential Methylation Block(更大范围的差异化region区域

简单来说,DMP是找出一个一个的差异甲基化CpG位点,DMR就是一个连续不断都比较长的差异片段,科学家们觉得,这样的连续差异片段,对于基因的影响会更加明显,只找这样的片段,可以使得计算生物学的打击精度更为准确,也可以让最终找出来的结论数据更少,便于实验人员筛选。另外一个类似的东西就是DMB,那个东西出现的原因是,有的科学家觉得,DMR这样的区域还不够显著,DNA上的甲基化出现变化,可能是绵延几千位点的!而且只会在基因以外的区域,但是这些基因以外的区域发生变化,却会导致基因的表达发生变化。

DMP,DMR,DMB的结果都是基于的shiny的交互页面,左栏上方是 P-value 和 abs(logFC) ,可以选择想看的值,然后点submit, 右栏可以生成差异甲基化表,热图,feature&cgi, 左栏下方还有基因,CpG按钮,选择你想看的结果,submit, 右栏就会生成相应gene,CpG结果

#DMP差异化CpG位点
myDMP <- champ.DMP(beta = myNorm,pheno=myLoad$pd$Sample_Group)
save(myDMP,file="myDMP.rda")
DMP.GUI(DMP=myDMP[[1]],beta=myNorm,pheno=myLoad$pd$Sample_Group)

#DMR差异化CpG区域
myDMR <- champ.DMR(beta=myNorm,pheno=myLoad$pd$Sample_Group,method="Bumphunter")
save(myDMR,file="myDMR.rda")
DMR.GUI(DMR=myDMR)

#DMB
myBlock <- champ.Block(beta=myNorm,pheno=myLoad$pd$Sample_Group,arraytype="450K")
Block.GUI(Block=myBlock,beta=myNorm,pheno=myLoad$pd$Sample_Group,runDMP=TRUE,compare.group=NULL,arraytype="450K")

[站外图片上传中...(image-12947-1583115824117)]

Gene Set富集分析和网络分析

差异甲基化分析后,你可能想知道DMP,DMR中涉及到的基因是否可以富集到某个生物功能或通路,GSEA(Gene Set Enrichment Analysis)和EpiMod(Differential Methylated Interaction Hotspots)提供了可以寻找作用通路网络中的疾病关联小网络的功能。

#富集分析
myGSEA <- champ.GSEA(beta=myNorm,DMP=myDMP[[1]], DMR=myDMR, arraytype="450K",adjPval=0.05, method="fisher")
# myDMP and myDMR could (not must) be used directly.
head(myGSEA$DMP)
save(myGSEA,file="myGSEA.rda")


myEpiMod <- champ.EpiMod(beta=myNorm,pheno=myLoad$pd$Sample_Group)

save(myEpiMod,file="myEpiMod.rda")

[站外图片上传中...(image-45d3fe-1583115824117)]

拷贝数变异分析(CNA)

拷贝数变异就是有些基因片段被复制的此处过多或者过少,从而导致某些疾病。

myCNA <- champ.CNA(intensity=myLoad$intensity,pheno=myLoad$pd$Sample_Group)
save(myCNA,file=myCNA)

[站外图片上传中...(image-7a4570-1583115824117)]

上一篇下一篇

猜你喜欢

热点阅读