AI生物信息学与算法生物信息学

DNA甲基化测序数据处理(二):差异分析

2018-07-22  本文已影响119人  面面的徐爷

前言

衔接上一篇数据比对后的结果,使用R包DSS进行处理。

Input文件准备

我们先来复习一下上一节课得到的数据结果:
*.bismark.cov.gz 文件

$head test_data_bismark_bt2.deduplicated.bismark.cov
1   975476  975476  100 1   0
1   975488  975488  100 1   0
1   975490  975490  100 1   0
1   2224487 2224487 100 1   0
1   2224489 2224489 100 1   0
1   2224514 2224514 100 1   0
1   2224520 2224520 100 1   0
1   2313220 2313220 0   0   1
1   9313902 9313902 100 1   0
1   9313914 9313914 100 1   0

# 第一列代表chromosome
# 第二,三列代表location
# 第四列代表甲基化百分比
# 第五列代表甲基化数目
# 第六列代表未甲基化数目

这里我们使用的R包为DSS,使用Bioconductor进行安装。
这个包可以对甲基化数据做两件事:

DSS包对差异甲基化的检测基于β负二项分布的严格沃尔德检验。

DSS包可以对无重复数据进行处理

输入数据的格式如下:

chr pos N X
chr18 3014904 26 2
chr18 3031032 33 12
chr18 3031044 33 13
chr18 3031065 48 24

# 根据这个特性,我们可以将上面的输出文件通过Linux或者R进行简单的处理得到input文件。
# 非常简单,所以这里请自行转化,凡事都需要自己摸索。

Call DML or DMR

DML是甲基化差异位点,DMR为甲基化差异区域。
使用DSS包自带的数据进行演示

1. 载入两个包DSS和bsseq(需要该包构建obj对象)

library(DSS)
require(bsseq)
path <- file.path(system.file(package="DSS"), "extdata")
dat1.1 <- read.table(file.path(path, "cond1_1.txt"), header=TRUE)
dat1.2 <- read.table(file.path(path, "cond1_2.txt"), header=TRUE)
dat2.1 <- read.table(file.path(path, "cond2_1.txt"), header=TRUE)
dat2.2 <- read.table(file.path(path, "cond2_2.txt"), header=TRUE)
BSobj <- makeBSseqData( list(dat1.1, dat1.2, dat2.1, dat2.2),
                           c("C1","C2", "N1", "N2") )[1:1000,]
# 查看BSobj的类型
BSobj 
An object of type 'BSseq' with
  1000 methylation loci
  4 samples
has not been smoothed
All assays are in-memory、

2. 利用DMLtest函数call DML,分为一下几个步骤:

在第一步过程中,smoothing可以更好帮助估算甲基化(针对whole-genome BS-seq)。
而RRBS不需要smoothing。

根据甲基化水平进行loci的差异分析

dmlTest <- DMLtest(BSobj, group1=c("C1", "C2"), group2=c("N1", "N2"))
head(dmlTest)
    chr     pos       mu1       mu2        diff    diff.se       stat        phi1       phi2       pval       fdr
1 chr18 3014904 0.3817233 0.4624549 -0.08073162 0.24997034 -0.3229648 0.300542998 0.01706260 0.74672190 0.9985094
2 chr18 3031032 0.3380579 0.1417008  0.19635711 0.11086362  1.7711592 0.008911745 0.04783892 0.07653423 0.6792127
3 chr18 3031044 0.3432172 0.3298853  0.01333190 0.12203116  0.1092500 0.010409029 0.01994821 0.91300423 0.9985094
4 chr18 3031065 0.4369377 0.3649218  0.07201587 0.10099395  0.7130711 0.010320888 0.01603200 0.47580174 0.9985094
5 chr18 3031069 0.2933572 0.5387464 -0.24538920 0.13178800 -1.8619996 0.012537553 0.02320887 0.06260315 0.6158797
6 chr18 3031082 0.3526311 0.3905718 -0.03794068 0.07847999 -0.4834440 0.007665696 0.01145531 0.62878051 0.9985094

dmlTest.sm <- DMLtest(BSobj, group1=c("C1", "C2"), group2=c("N1", "N2"), smoothing=TRUE)
head(dmlTest.sm)
    chr     pos       mu1       mu2        diff    diff.se       stat       phi1       phi2      pval       fdr
1 chr18 3014904 0.3693669 0.4566563 -0.08728939 0.29967322 -0.2912819 0.30054300 0.01706260 0.7708357 0.9656515
2 chr18 3031032 0.3433882 0.3679732 -0.02458503 0.03970109 -0.6192533 0.03177894 0.28323422 0.5357495 0.8639036
3 chr18 3031044 0.3412867 0.3678807 -0.02659404 0.04032823 -0.6594397 0.02536938 0.02080295 0.5096134 0.8596522
4 chr18 3031065 0.3358830 0.3511983 -0.01531533 0.04799161 -0.3191252 0.01123412 0.01621926 0.7496316 0.9652417
5 chr18 3031069 0.3358830 0.3511983 -0.01531533 0.03205500 -0.4777830 0.02832889 0.05857316 0.6328047 0.8968029
6 chr18 3031082 0.3358830 0.3511983 -0.01531533 0.05846593 -0.2619531 0.01682981 0.01368466 0.7933576 0.9745116

3. 根据dmlTest来call DML

dmls <- callDML(dmlTest.sm, p.threshold=0.001)
dmls
      chr     pos       mu1       mu2      diff    diff.se     stat       phi1       phi2         pval          fdr postprob.overThreshold
447 chr18 3973699 0.8530694 0.3432547 0.5098147 0.05726793 8.902272 0.03662109 0.98947516 5.471481e-19 4.962634e-16              1.0000000
709 chr18 4564190 0.7773858 0.1977036 0.5796822 0.10294267 5.631116 0.21725415 0.02952656 1.790467e-08 5.413180e-06              0.9999984
710 chr18 4564237 0.7773858 0.1977036 0.5796822 0.15431085 3.756587 0.02931516 0.26238558 1.722462e-04 7.439396e-03              0.9990652
dmls <- callDML(dmlTest, p.threshold=0.001)
dmls
      chr     pos        mu1         mu2       diff    diff.se       stat        phi1       phi2         pval          fdr postprob.overThreshold
450 chr18 3976129 0.01027497 0.939033927 -0.9287590 0.06544340 -14.191789 0.052591567 0.02428826 1.029974e-45 2.499403e-43              1.0000000
451 chr18 3976138 0.01027497 0.939033927 -0.9287590 0.06544340 -14.191789 0.052591567 0.02428826 1.029974e-45 2.499403e-43              1.0000000
638 chr18 4431501 0.01331553 0.943056638 -0.9297411 0.09273779 -10.025483 0.053172411 0.07746835 1.177826e-23 1.429096e-21              1.0000000
639 chr18 4431511 0.01327049 0.943056638 -0.9297862 0.09270080 -10.029969 0.053121697 0.07746835 1.125518e-23 1.429096e-21              1.0000000
710 chr18 4564237 0.91454619 0.011930005  0.9026162 0.05260037  17.159883 0.009528898 0.04942849 5.302004e-66 3.859859e-63              1.0000000
782 chr18 4657576 0.98257334 0.067835497  0.9147378 0.06815000  13.422418 0.010424723 0.06755651 4.468885e-41 8.133371e-39              1.0000000
582 chr18 4340682 0.95398081 0.030390730  0.9235901 0.10935874   8.445508 0.085494283 0.04540643 3.027264e-17 2.754810e-15              1.0000000
583 chr18 4340709 0.95398081 0.030390730  0.9235901 0.10935874   8.445508 0.085494283 0.04540643 3.027264e-17 2.754810e-15              1.0000000
340 chr18 3542732 0.95023554 0.034383112  0.9158524 0.11937407   7.672122 0.089137013 0.04474741 1.691739e-14 1.368429e-12              1.0000000
395 chr18 3723448 0.06570765 0.751990744 -0.6862831 0.09825286  -6.984866 0.011958092 0.01646418 2.851278e-12 2.075730e-10              1.0000000
188 chr18 3370113 0.01488553 0.787980174 -0.7730946 0.11380172  -6.793347 0.054190769 0.02752024 1.095611e-11 7.250956e-10              1.0000000
400 chr18 3785543 0.79337804 0.118353679  0.6750244 0.12183251   5.540593 0.017720841 0.02442007 3.014493e-08 1.688116e-06              0.9999988
683 chr18 4494490 0.77615275 0.104235359  0.6719174 0.12407577   5.415380 0.009219023 0.08210742 6.115879e-08 3.180257e-06              0.9999980
783 chr18 4657592 0.96858371 0.266894590  0.7016891 0.13077255   5.365722 0.064228633 0.07721117 8.062618e-08 3.668491e-06              0.9999979
642 chr18 4431618 0.43034706 0.965539700 -0.5351926 0.09578178  -5.587625 0.012064922 0.07497569 2.301966e-08 1.396526e-06              0.9999972
189 chr18 3370141 0.01488553 0.676293642 -0.6614081 0.12554112  -5.268458 0.054190769 0.02138845 1.375746e-07 5.891428e-06              0.9999961
738 chr18 4635185 0.44671015 0.007342128  0.4393680 0.08148745   5.391849 0.010069744 0.05154034 6.973627e-08 3.384534e-06              0.9999844
330 chr18 3542403 0.02875655 0.714168320 -0.6854118 0.15294560  -4.481409 0.041892077 0.03368033 7.415192e-06 2.841190e-04              0.9999354
185 chr18 3347936 0.49560648 0.017291187  0.4783153 0.10184167   4.696656 0.012173893 0.04533502 2.644552e-06 1.069574e-04              0.9998983
92  chr18 3217241 0.03876653 0.767643956 -0.7288774 0.17547605  -4.153714 0.038095316 0.03753383 3.271214e-05 9.602621e-04              0.9998319
396 chr18 3723468 0.23043341 0.951779714 -0.7213463 0.18220738  -3.958930 0.173291593 0.07965750 7.528626e-05 1.838513e-03              0.9996786
40  chr18 3047980 0.07206183 0.658535570 -0.5864737 0.14396467  -4.073734 0.012517547 0.02615351 4.626536e-05 1.247451e-03              0.9996373
614 chr18 4353399 0.01355584 0.547298573 -0.5337427 0.12855458  -4.151876 0.053422013 0.02055735 3.297603e-05 9.602621e-04              0.9996300
99  chr18 3226251 0.97443783 0.217343279  0.7570945 0.19807591   3.822244 0.012911947 0.27481269 1.322425e-04 3.105566e-03              0.9995532
613 chr18 4353379 0.01355584 0.529341634 -0.5157858 0.13033377  -3.957422 0.053422013 0.02080043 7.576289e-05 1.838513e-03              0.9992902
894 chr18 4921874 0.35815864 0.009172558  0.3489861 0.07955244   4.386868 0.009774839 0.05010151 1.149944e-05 3.986472e-04              0.9991255
895 chr18 4921881 0.35815864 0.009172558  0.3489861 0.07955244   4.386868 0.009774839 0.05010151 1.149944e-05 3.986472e-04              0.9991255

注意,这里使用smoothing进行操作,事实上使用的示例数据我也不清楚是RRBS还是WGBS,但是使用dmlTest call DML会有更多的结果。

当然,用户也可以指定差异的阈值,只有差异大于阈值的才会被call出来。

dmls2 <- callDML(dmlTest, delta=0.1, p.threshold=0.001)
head(dmls2)
      chr     pos        mu1       mu2       diff    diff.se      stat        phi1       phi2         pval          fdr postprob.overThreshold
450 chr18 3976129 0.01027497 0.9390339 -0.9287590 0.06544340 -14.19179 0.052591567 0.02428826 1.029974e-45 2.499403e-43                      1
451 chr18 3976138 0.01027497 0.9390339 -0.9287590 0.06544340 -14.19179 0.052591567 0.02428826 1.029974e-45 2.499403e-43                      1
638 chr18 4431501 0.01331553 0.9430566 -0.9297411 0.09273779 -10.02548 0.053172411 0.07746835 1.177826e-23 1.429096e-21                      1
639 chr18 4431511 0.01327049 0.9430566 -0.9297862 0.09270080 -10.02997 0.053121697 0.07746835 1.125518e-23 1.429096e-21                      1
710 chr18 4564237 0.91454619 0.0119300  0.9026162 0.05260037  17.15988 0.009528898 0.04942849 5.302004e-66 3.859859e-63                      1
782 chr18 4657576 0.98257334 0.0678355  0.9147378 0.06815000  13.42242 0.010424723 0.06755651 4.468885e-41 8.133371e-39                      1

4. 根据dmlTest Call DMR

dmrs <- callDMR(dmlTest.sm, p.threshold=0.01)
dmrs
     chr   start     end length nCG meanMethy1 meanMethy2 diff.Methy areaStat
23 chr18 4921637 4922059    423  27 0.11002940 0.01809674 0.09193266 86.29588
7  chr18 3507919 3508022    104  10 0.07524915 0.03294316 0.04230600 30.55943
15 chr18 4340682 4340753     72   4 0.89237955 0.35052968 0.54184987 10.83526
dmrs <- callDMR(dmlTest, p.threshold=0.01)
dmrs
     chr   start     end length nCG meanMethy1 meanMethy2 diff.Methy areaStat
27 chr18 4657576 4657639     64   4   0.506453   0.318348   0.188105 14.34236

我们可以发现,smoothing前后得到的结果差异还是很大,所以针对不同的实验类型我们需要注意是否使用smoothing。
同理,也可以使用的delta参数以及调整p.threshold得到合适的结果。

dmrs2 <- callDMR(dmlTest, delta=0.1, p.threshold=0.05)
dmrs2
     chr   start     end length nCG meanMethy1 meanMethy2 diff.Methy areaStat
31 chr18 4657576 4657639     64   4  0.5064530  0.3183480   0.188105 14.34236
19 chr18 4222533 4222608     76   4  0.7880276  0.3614195   0.426608 12.91667

5. 可视化
DSS包提供了一个不是很美观的可视化函数,用户其实可以使用coverage结果在R里面作图。

showOneDMR(dmrs[1,], BSobj)

结语

分析到此就告一段落了,随后就是介绍对差异甲基化区域的注释以及可视化文献作图。下一篇教程随后发。

上一篇下一篇

猜你喜欢

热点阅读