ChIP-seq 分析------原理

2019-02-22 本文已影响11人 liu_ll

一：背景介绍

之前一直在死磕ChIP-seq的实验，接下来要逐步过渡到ChIP-seq的上手分析了。在进行ChIP-seq在分析之前，明确一下几个问题：

1.1：what is ChIP-seq?

chromatin immunoprecipitation (ChIP), a technique that enriches DNA fragments to which a specific protein or a certain class of nucleosomes is bound. [1], ChIP-seq tech mainly apply in CTCF binding， Histone modification and binding DNA, other protein (or motif) and binding DNA.(染色质免疫共沉淀主要是应用于分析可以和特定蛋白或和特定核小体结合的DNA，它主要可以有分析一些转录因子结合，组蛋白修饰以及一些和特定蛋白如一些特定的motifs结合的DNA序列)

1.2： why choose ChIP-seq？

ChIP-seq offers higher resolution, less noise, and greater coverage than array-based predecessor ChIP-chip, With the decreasing cost of sequencing.[2]（简单翻译一下就是ChIP-seq比ChIP-chip的分辨率高，覆盖率高，还降低了测序的费用）

ChIP-seq VS ChIP-chip

1.3：what is ChIP-seq experiment workflow? 大佬手绘overview!

简单的回顾一下ChIP-seq的基本步骤包括：
1：通过甲醛将染色质上的DNA和蛋白交联结合在一起（一般交联的话有X-ChIP 是指交联ChIP, 可以通过甲醛交联蛋白质和DNA;N-ChIP:是指有天然（native）交联，可以通过微球菌核酸酶消化，然后可以得到约150bp的小片段）
2：通过超声或酶解的方法将染色质片段化:(一般200-600bp合适)
3：通过抗体对目标区域进行富集
4：解交联纯化DNA:
5：通过PCR或者是qPCR检测目标区域的DNA是否被靶蛋白富集到，然后再进行测序分析。

1.4:What is weakness of ChIP-seq:

bias toward high GC-rich conent in fragment selection, both in library preparation and in amplification prior to sequencing, but this has been improved (高CG区域富集区的测序)
when an unsufficient number of reads are generated there is loss of sensitivity or specificity （当reads数目不多的时候可能导致IP不敏感）
to little sample will result in too few tags(but with low cell ChIP-seq tech improvement this can be solved) （样本少导致IP下来的东西少）
the experiment issue（实验技术问题）

1.5： Issues in experiment

Antibody quality (抗体质量，个人感觉用Abcam的比较好点)
Sample quantity(样品的数量，比如说细胞数量等，一般实验最好在10^6 - 10^7,最后建库的话需要的DNA的量在10-50ng)
Control experiment
(必须有一个对照，对于一个区域来说，如果这个峰想体现是IP富集，必须得有control来显示和对比，比如说input, mockIP(比如说空beads,无抗体拉)，IgG. 这里一般不推荐空beads，因为一般空beads结合的DNA很少，可能会导致建库失败，Input用的多)

1.6:About depth of sequencing

这个图横坐标是从测序的数据当中抽reads出来进行peak的统计，上面的有鼓包的曲线说的是只有富集倍数的peak，底下的peaks是说具有统计学意义的peaks，如何看饱和呢？当这个曲线进行特别平缓的时候，会达到一个临界值（MSER Minimal Sarturation Enrichment Ratio）

depth & peak calculate

--------------------------------------------分割线---------------------------------------------------------

二：拿到ChIP-seq数据如何分析？

话不多说，上图看一下workflow

ChIP-Seq的分析流程

这张图包含的信息很多，主要是我们看到主要的分析流程有一下（补充了一下）

1) 拿到测序的数据 
2) 将数据进行比对，BWA,Bowtie2（比bowtie好点）
3) 进行数据过滤，去掉PCR重复等（看情况）
4) 用MACS2 call peak
5) bedtools merge
6) deeptools画图

此外，还有我们拿到了富集区域的话，可以干什么？

1 我们对富集结果进行可视化
2 我们可以进行motif 的探寻
3 和基因的结构关系探索
4 和基因表达相挂钩
5 基因集分析
6 其他的富集分析

2.1 How to identifie the enrichment region

这个图很经典了！！！
从DNA测序的角度来说，因为测序都是5'端的reads，对于一个DNA序列来说（有正负链的），它mapping的位置正负链都有的（也就是红色和蓝色的reads都有），对这些reads位置进行统计画图可以看到一个红色的peak，一个蓝色的peak。这两个peak说明的是一个事情，就是这个地方有富集。最后对这两个peak进行merge，最后变成了一个富集区域。灰色的peak!

enrich sites analysis

那么ChIP的峰型有哪些呢？
不同类型的蛋白或者组蛋白修饰会得到不同的峰形。三种主要数据分布类型：
sharp binding sites, CTCF (red);
a mixture of shapes, RNA Polymerase II (orange);
medium size, H3K36me3 (green);
large domains, H3K27me3 (blue)

ChIP-seq的不同峰

2.2 About enrich ratio

一般来说富集倍数要在5以上才算是显著，如何计算富集比率呢？看图~

富集比率计算

下一篇记录代码学习！

Reference:
1:Solomon, Mark J , Larsen, et al. Mapping protein-DNA interactions in vivo with formaldehyde: evidence that histone H4 is retained on a highly transcribed gene.[J]. Cell, 1988, 53(6):937-947.
2：Park P J . Park, P.J.: ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669-680[J]. Nature Reviews Genetics, 2009, 10(10):669.
3:Park P J . Park, P.J.: ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669-680[J]. Nature Reviews Genetics, 2009, 10(10):669.