ChIP-seq项目（EMBL-EBI_NGS-Practice

2019-02-26 本文已影响49人 liu_ll

pre:

（最近在准备托福考试，出现的英文比较多，纯手打，为了训练一下，祝我好运呀~）

--------------------------------------------分割线-----------------------------------------------------
This project is based on EMBL-EBL a NGS-Practice:
(这个项目实战来自于EMBL-EBI的一个practice):
https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/gene-regulation/chip-seq-analysis
本次实战ChIP-seq 的pdf：
chrome-extension://cdonnmffkdaoajfknoeeecmchibpmkmg/static/pdf/web/viewer.html?file=https%3A%2F%2Fwww.ebi.ac.uk%2Ftraining%2Fonline%2Fsites%2Febi.ac.uk.training.online%2Ffiles%2Fuser%2F18%2Fprivate%2Fchipseq_loos.pdf
-----------------------------------------分割线---------------------------------------------------------
The aim of this note is focus on how to run the pipeline of ChIP-seq（I think the basic step is to run the pipeline and the next step is to repeat the figure of paper, just step by step）
(从这个笔记开始，我们来看一下ChIP-seq的分析是如何实现的，主要还是想先把真个流程跑下来，然后在流程跑通的基础上，我们再对文章的结果进行重复，一步步来。首先我们来看一下做ChIP分析主要有哪些流程：)

主要的分析流程（来自上述ChIP-seq的pdf）
these are softwares we need to analysis, please install before the analysis(在分析之前先把软件安装一下~)

本次分析需要的软件

一：Data access (数据获取)

click the link to download (可以点击这里进行下载：）
https://www.ebi.ac.uk/~emily/Online%20courses/NGS/ChIP-seq.zip
（貌似需要翻墙，如果有需要的小伙伴私信一下我，给你发~）
this zip file contains fastq ，bowtie-index，and .gtf for annotation (这个压缩包已经包含了原始的fq文件以及bowtie需要建立的index文件以及后续需要进行peak注释的gtf文件)

压缩包内容

这次分析的数据，来自于一篇08年的cell文章：Chen X , Xu H , Yuan P , et al. Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells[J]. Cell, 2008, 133(6):0-1117.

about how to download the raw data：如何根据GSE/SRA/SRR号进行原始的数据下载

二： sequence mapping (序列比对)

2.1 first use bowtie2 to build the mapping index (先利用bowtie2进行index的建立)

mkdir bowtie_index
cd bowtie_index 
bowtie-build bowtie_index/mm10.fa bowtie_index/mm10

the way to use bowtie-build to build index
2.2 then mapping the fastq to genome （将fastq比对到参考基因组上）

bowtie -m 1 -S bowtie_index/mm10 Oct4.fastq > Oct4.sam | samtools view -Sb Oct4.sam > Oct4.unsort.bam
bowtie -m 1 -S bowtie_index/mm10 gfp.fastq > gfp.sam | samtools view -Sb gfp.sam > gfp.unsort.bam

then use samtools sort and index (需要提醒的是samtools 版本不一样，用法可能不一样，具体看看帮助文档)

samtools sort -o ./Oct4.bam ./Oct4.unsort.bam
samtools sort -o ./gfp4.bam ./gfp.unsort.bam
samtools index ./Oct4.bam 
samtools index ./gfp.bam

三： visualize the mapping outcome (比对结果可视化)

step1：首先生成bedgraph文件

 genomeCoverageBed -bg -ibam /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/1_bowtie/Oct4.bam -g /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/mouse.mm10.genome > Oct4.bedgraph

step2:生成bigwig文件

 bedGraphToBigWig Oct4.bedgraph /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/mouse.mm10.genome Oct4.bw

这个可以去IGV里面可视化看看peak了

四： call peak

macs14 -t /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/1_bowtie/Oct4.bam \
    -c /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/control/gfp.sort.bam \
    --format=BAM -g mm --name=Oct4  \
    --tsize=26 --diag --wig

-t : treatment (Oct4)
-c :control (gfp)
--format :BAM (可以是SAM等其他格式)
--name: output document name
-g mm(mm指的是老鼠，hs ：human，默认是人)
--tsize 指的是标签的大小，分析的时候会跳过(默认是25)
--wig 如果这个参数存在，会生成一个文件夹Oct4_MACS_wiggle/ 里面有treat/ control/两个文件夹，里面的各有每个染色体配对的情况。

outcome1

outcome2

五： peak annotation (peak注释)

这里利用的是Homer 的annotatePeak.pl
5.1 first download Homer and add PATH to bashrc(首先下载和加载环境变量)

HOMER安装很简单，主要是通过configureHomer.pl脚本来安装和管理HOMER
wget http://homer.salk.edu/homer/configureHomer.pl
vi ~/.bashrc
export PATH=/liull/software/homer/bin:$PATH
source .bashrc
#下载mm10的包
perl configureHomer.pl -install mm10

5.2 then use annotatePeaks.pl to annotate

perl   annotatePeaks.pl Oct4_peaks.bed mm10 > ./peaks/Oct4.output.txt

注释结果截图

六： peak motif finding


这个代码不仅可以找到motif，还可以进行peak的注释~

awk '{print $4"\t"$1"\t"$2"\t"$3"\t+"}' /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/3_macs/Oct4_peaks.bed > O

findMotifsGenome.pl Oct4_homer_peaks.bed mm10 /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/3_macs/peak-2  -siz

annotatePeaks.pl Oct4_homer_peaks.bed  mm10 1>Oct4.peakAnn.xls 2>Oct4.annLog.txt

在找peak 和注释方面，有很多不同的方法都可以做到：比如说CEAS,peakannotator,MEME等等，方法真的很多！看习惯用那种方法。

目前的分析就到这里，接下来还会贴一些结果，推荐一些比较好的笔记博客，帮助很大~
1：https://ming-lian.github.io/2019/02/17/MeRIP-seq/ （实名感谢大神指点@UnderStorm
）
2：http://www.biologie.ens.fr/~mthomas/other/chip-seq-training/#
3：https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/gene-regulation/chip-seq-analysis
4：http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003326

Reference:
1：http://homer.ucsd.edu/homer/index.html
2：https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/gene-regulation/chip-seq-analysis