ChIP-seq项目(EMBL-EBI_NGS-Practice
pre:
(最近在准备托福考试,出现的英文比较多,纯手打,为了训练一下,祝我好运呀~)
--------------------------------------------分割线-----------------------------------------------------
This project is based on EMBL-EBL a NGS-Practice:
(这个项目实战来自于EMBL-EBI的一个practice):
https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/gene-regulation/chip-seq-analysis
本次实战ChIP-seq 的pdf:
chrome-extension://cdonnmffkdaoajfknoeeecmchibpmkmg/static/pdf/web/viewer.html?file=https%3A%2F%2Fwww.ebi.ac.uk%2Ftraining%2Fonline%2Fsites%2Febi.ac.uk.training.online%2Ffiles%2Fuser%2F18%2Fprivate%2Fchipseq_loos.pdf
-----------------------------------------分割线---------------------------------------------------------
The aim of this note is focus on how to run the pipeline of ChIP-seq(I think the basic step is to run the pipeline and the next step is to repeat the figure of paper, just step by step)
(从这个笔记开始,我们来看一下ChIP-seq的分析是如何实现的,主要还是想先把真个流程跑下来,然后在流程跑通的基础上,我们再对文章的结果进行重复,一步步来。首先我们来看一下做ChIP分析主要有哪些流程:)
these are softwares we need to analysis, please install before the analysis(在分析之前先把软件安装一下~)
本次分析需要的软件
一:Data access (数据获取)
click the link to download (可以点击这里进行下载:)
https://www.ebi.ac.uk/~emily/Online%20courses/NGS/ChIP-seq.zip
(貌似需要翻墙,如果有需要的小伙伴私信一下我,给你发~)
this zip file contains fastq ,bowtie-index,and .gtf for annotation (这个压缩包已经包含了原始的fq文件以及bowtie需要建立的index文件以及后续需要进行peak注释的gtf文件)
- 这次分析的数据,来自于一篇08年的cell文章:Chen X , Xu H , Yuan P , et al. Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells[J]. Cell, 2008, 133(6):0-1117.
- about how to download the raw data:如何根据GSE/SRA/SRR号进行原始的数据下载
二: sequence mapping (序列比对)
2.1 first use bowtie2 to build the mapping index (先利用bowtie2进行index的建立)
mkdir bowtie_index
cd bowtie_index
bowtie-build bowtie_index/mm10.fa bowtie_index/mm10
the way to use bowtie-build
to build index
2.2 then mapping the fastq to genome (将fastq比对到参考基因组上)
bowtie -m 1 -S bowtie_index/mm10 Oct4.fastq > Oct4.sam | samtools view -Sb Oct4.sam > Oct4.unsort.bam
bowtie -m 1 -S bowtie_index/mm10 gfp.fastq > gfp.sam | samtools view -Sb gfp.sam > gfp.unsort.bam
then use samtools sort and index (需要提醒的是samtools 版本不一样,用法可能不一样,具体看看帮助文档)
samtools sort -o ./Oct4.bam ./Oct4.unsort.bam
samtools sort -o ./gfp4.bam ./gfp.unsort.bam
samtools index ./Oct4.bam
samtools index ./gfp.bam
三: visualize the mapping outcome (比对结果可视化)
step1: 首先生成bedgraph文件
genomeCoverageBed -bg -ibam /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/1_bowtie/Oct4.bam -g /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/mouse.mm10.genome > Oct4.bedgraph
step2:生成bigwig文件
bedGraphToBigWig Oct4.bedgraph /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/mouse.mm10.genome Oct4.bw
这个可以去IGV里面可视化看看peak了
四: call peak
macs14 -t /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/1_bowtie/Oct4.bam \
-c /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/control/gfp.sort.bam \
--format=BAM -g mm --name=Oct4 \
--tsize=26 --diag --wig
-t : treatment (Oct4)
-c :control (gfp)
--format :BAM (可以是SAM等其他格式)
--name: output document name
-g mm(mm指的是老鼠,hs :human,默认是人)
--tsize 指的是标签的大小,分析的时候会跳过(默认是25)
--wig 如果这个参数存在,会生成一个文件夹Oct4_MACS_wiggle/ 里面有treat/ control/两个文件夹,里面的各有每个染色体配对的情况。
outcome1
outcome2
五: peak annotation (peak注释)
这里利用的是Homer 的annotatePeak.pl
5.1 first download Homer and add PATH to bashrc(首先下载和加载环境变量)
HOMER安装很简单,主要是通过configureHomer.pl脚本来安装和管理HOMER
wget http://homer.salk.edu/homer/configureHomer.pl
vi ~/.bashrc
export PATH=/liull/software/homer/bin:$PATH
source .bashrc
#下载mm10的包
perl configureHomer.pl -install mm10
5.2 then use annotatePeaks.pl to annotate
perl annotatePeaks.pl Oct4_peaks.bed mm10 > ./peaks/Oct4.output.txt
注释结果截图
六: peak motif finding
这个代码不仅可以找到motif,还可以进行peak的注释~
awk '{print $4"\t"$1"\t"$2"\t"$3"\t+"}' /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/3_macs/Oct4_peaks.bed > O
findMotifsGenome.pl Oct4_homer_peaks.bed mm10 /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/3_macs/peak-2 -siz
annotatePeaks.pl Oct4_homer_peaks.bed mm10 1>Oct4.peakAnn.xls 2>Oct4.annLog.txt
在找peak 和注释方面,有很多不同的方法都可以做到:比如说CEAS,peakannotator,MEME等等,方法真的很多!看习惯用那种方法。
目前的分析就到这里,接下来还会贴一些结果,推荐一些比较好的笔记博客,帮助很大~
1:https://ming-lian.github.io/2019/02/17/MeRIP-seq/ (实名感谢大神指点@UnderStorm
)
2:http://www.biologie.ens.fr/~mthomas/other/chip-seq-training/#
3:https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/gene-regulation/chip-seq-analysis
4:http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003326
Reference:
1:http://homer.ucsd.edu/homer/index.html
2:https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/gene-regulation/chip-seq-analysis