WGBS 下游处理-Package methylKit
1.1 我们要将上游处理得到的the methylation call files转换为methylKit这个R包可以识别的格式
需转换成为的格式该文作者写了三种命令来转换格式,对应不同的cytosine context
在服务器上完成1.2 下一步就是将数据导入R中
分为两步#Paths
首先,由于该实验有六个分组,因此第一步是将六个分组集合成一个列表(列表内容对应文件所在的位置)
#Read files
treatment参数用于规定每个实验组的数据的起点是以1或0为基准的
1.3 去除低计数片段和高计数片段
1.4 The getMethylationStats and getCoverageStats functions
The getMethylationStats and getCoverageStats functions can print the percentage methylation distribution and read coverage per base statistics to the R console, as well as generate histograms to visualize these statistics.
通过这两个函数所做的图来理解会更加直白
【CpG甲基化-百分数分布图】一个典型的柱状图会在两边有峰而中间没有——意味一个位点要么甲基化要么非甲基化
未处理低/高计数片段 已处理低/高计数片段【每个CpG位点碱基的读数的百分比】若有PCR重复导致的偏差,则会在右边出现第二个峰
1.5 Before comparing the methylation profiles between samples, the unite function is used to merge and retain only bases that are covered in all samples.
用unite函数将在所有样本中都出现过的碱基提出来
meth.cpg. unite(filt.mobj.cpg) # CpG sites covered in all samples.
1.6 The getCorrelation function prints the Pearson matrix of correlation coefficients
计算各样本之间的相关性
1.6 The clusterSamples function uses hclust function to performs hierarchical clustering samples based on the similarity of their methylation profiles and produce a dendrogram.
根据相似性聚类
Additionally, the PCASamples function performs principal component (PC) analysis using the prcomp function.
也可以选择PCA分析法
其中第二个图是用screeplot函数作的,也称为崖低碎石图
此图用来确定主成分个数,虚线之上的主成分保留,特征值大于1的保留,最大拐点之上的保留。
这篇文章
http://www.360doc.com/content/19/0818/16/57890290_855677956.shtml
对PCA主成分分析做了很简洁而有用的说明
1.7 The calculateDiffMeth function is called to calculate differential methylation and it produces a methylDiff object.
这里是先用calculateDiffMeth function计算差异化,后续步骤再获得。
mDiff.cpg. calculateDiffMeth(meth.cpg)
1.8 get.methylDiff function can be used to retrieve differentially methylated bases that satisfy the Q-value and percent methylation difference cutoffs.
意义:P值=假设是正确但是被拒绝的概率=阴性个数/总个数,是对与样本数据的一个检验概率;Q值=被拒绝但却是正确的概率=假阳性/推测为阳性的个数,是对你得到的推论的一种检验概率,是基于P值计算出来的。可以说Q值是对P值的再统计。
mDiff25p.cpg. get.methylDiff(mDiff.cpg, difference.25, qvalue.0.01)
这一步可以同时得到differential hypomethylated and hypermethylated bases
1.9 The diffMethPerChr function can be used to print the proportion of differentially methylated CpG bases
低甲基化和高甲基化之间的比例2.0 准备注释文件
the gene annotation GRangesList object is build from a TxDB object
这一步比较有条理,可以配图“食用”
那么,显而易见,在从TxDb文件中提取出的promoter exon intron intergenic等信息都被整理到了GRangesList这个对象中,这个对象包含的内容:哪些序列对应的是启动子,哪些序列对应的是外显子等等
2.1
这里同时有几种函数,但表达的意思都差不多
annotate.WithGenicParts function is called to calculate the percentage of the differentially methylated CpG bases in promoter, exon, intron and intergenic regions.
(百分比)
The getTargetAnnotationStats function returns the number or percentage of target features overlapping with gene annotations.
The target features in this case are the differentially methylated CpG bases.
(数目)
The getFeatsWithTargetsStats function returns the number or percentage of promoters, exons and introns that overlap with differentially methylated CpG bases.
(数目)
2.2 这一步需要对有差异化的甲基化区域进行处理
the tileMethylCounts function is used to summarize the methylation information over tiling windows specify size.
因为
It is also useful to identify differentially methylated regions in the genome where hypermethylated or hypomethylated cytosines occurs consecutively.
tiles.cpg = tileMethylCounts(meth.cpg, win.size.200, step. size.100)
得到了这个tiles.cpg文件后,重复上述甲基化差异分析步骤
总结:看起来methylKit这个包能做的基本上就是这些,最后得到的主要结果就是:启动子、外显子、内含子和基因间区这几个位置上有差异性的甲基化位点或甲基化区域有多少个