ChIP-seqChIP-seq生物信息学与算法

ChIP-seq项目(EMBL-EBI_NGS-Practice

2019-02-26  本文已影响49人  liu_ll
pre:

(最近在准备托福考试,出现的英文比较多,纯手打,为了训练一下,祝我好运呀~)

--------------------------------------------分割线-----------------------------------------------------
This project is based on EMBL-EBL a NGS-Practice:
(这个项目实战来自于EMBL-EBI的一个practice):
https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/gene-regulation/chip-seq-analysis
本次实战ChIP-seq 的pdf:
chrome-extension://cdonnmffkdaoajfknoeeecmchibpmkmg/static/pdf/web/viewer.html?file=https%3A%2F%2Fwww.ebi.ac.uk%2Ftraining%2Fonline%2Fsites%2Febi.ac.uk.training.online%2Ffiles%2Fuser%2F18%2Fprivate%2Fchipseq_loos.pdf
-----------------------------------------分割线---------------------------------------------------------
  The aim of this note is focus on how to run the pipeline of ChIP-seq(I think the basic step is to run the pipeline and the next step is to repeat the figure of paper, just step by step)
(从这个笔记开始,我们来看一下ChIP-seq的分析是如何实现的,主要还是想先把真个流程跑下来,然后在流程跑通的基础上,我们再对文章的结果进行重复,一步步来。首先我们来看一下做ChIP分析主要有哪些流程:)

主要的分析流程(来自上述ChIP-seq的pdf)
  these are softwares we need to analysis, please install before the analysis(在分析之前先把软件安装一下~)
本次分析需要的软件

一:Data access (数据获取)

click the link to download (可以点击这里进行下载:)
https://www.ebi.ac.uk/~emily/Online%20courses/NGS/ChIP-seq.zip
(貌似需要翻墙,如果有需要的小伙伴私信一下我,给你发~)
  this zip file contains fastq ,bowtie-index,and .gtf for annotation (这个压缩包已经包含了原始的fq文件以及bowtie需要建立的index文件以及后续需要进行peak注释的gtf文件)

压缩包内容
  1. 这次分析的数据,来自于一篇08年的cell文章:Chen X , Xu H , Yuan P , et al. Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells[J]. Cell, 2008, 133(6):0-1117.
  2. about how to download the raw data:如何根据GSE/SRA/SRR号进行原始的数据下载

二: sequence mapping (序列比对)

2.1 first use bowtie2 to build the mapping index (先利用bowtie2进行index的建立)

mkdir bowtie_index
cd bowtie_index 
bowtie-build bowtie_index/mm10.fa bowtie_index/mm10

the way to use bowtie-build to build index
2.2 then mapping the fastq to genome (将fastq比对到参考基因组上)

bowtie -m 1 -S bowtie_index/mm10 Oct4.fastq > Oct4.sam | samtools view -Sb Oct4.sam > Oct4.unsort.bam
bowtie -m 1 -S bowtie_index/mm10 gfp.fastq > gfp.sam | samtools view -Sb gfp.sam > gfp.unsort.bam

then use samtools sort and index (需要提醒的是samtools 版本不一样,用法可能不一样,具体看看帮助文档)

samtools sort -o ./Oct4.bam ./Oct4.unsort.bam
samtools sort -o ./gfp4.bam ./gfp.unsort.bam
samtools index ./Oct4.bam 
samtools index ./gfp.bam

三: visualize the mapping outcome (比对结果可视化)

step1: 首先生成bedgraph文件

 genomeCoverageBed -bg -ibam /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/1_bowtie/Oct4.bam -g /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/mouse.mm10.genome > Oct4.bedgraph

step2:生成bigwig文件

 bedGraphToBigWig Oct4.bedgraph /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/mouse.mm10.genome Oct4.bw

这个可以去IGV里面可视化看看peak了

四: call peak

macs14 -t /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/1_bowtie/Oct4.bam \
    -c /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/control/gfp.sort.bam \
    --format=BAM -g mm --name=Oct4  \
    --tsize=26 --diag --wig

-t : treatment (Oct4)
-c :control (gfp)
--format :BAM (可以是SAM等其他格式)
--name: output document name
-g mm(mm指的是老鼠,hs :human,默认是人)
--tsize 指的是标签的大小,分析的时候会跳过(默认是25)
--wig 如果这个参数存在,会生成一个文件夹Oct4_MACS_wiggle/ 里面有treat/ control/两个文件夹,里面的各有每个染色体配对的情况。


outcome1
outcome2

五: peak annotation (peak注释)

这里利用的是Homer 的annotatePeak.pl
5.1 first download Homer and add PATH to bashrc(首先下载和加载环境变量)

HOMER安装很简单,主要是通过configureHomer.pl脚本来安装和管理HOMER
wget http://homer.salk.edu/homer/configureHomer.pl
vi ~/.bashrc
export PATH=/liull/software/homer/bin:$PATH
source .bashrc
#下载mm10的包
perl configureHomer.pl -install mm10 

5.2 then use annotatePeaks.pl to annotate

perl   annotatePeaks.pl Oct4_peaks.bed mm10 > ./peaks/Oct4.output.txt
注释结果截图

六: peak motif finding


这个代码不仅可以找到motif,还可以进行peak的注释~

awk '{print $4"\t"$1"\t"$2"\t"$3"\t+"}' /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/3_macs/Oct4_peaks.bed > O

findMotifsGenome.pl Oct4_homer_peaks.bed mm10 /share_bio/unisvx1/sunyl_group/liull/test/embl-chip-practice/3_macs/peak-2  -siz

annotatePeaks.pl Oct4_homer_peaks.bed  mm10 1>Oct4.peakAnn.xls 2>Oct4.annLog.txt

在找peak 和注释方面,有很多不同的方法都可以做到:比如说CEAS,peakannotator,MEME等等,方法真的很多!看习惯用那种方法。

目前的分析就到这里,接下来还会贴一些结果,推荐一些比较好的笔记博客,帮助很大~
1:https://ming-lian.github.io/2019/02/17/MeRIP-seq/ (实名感谢大神指点@UnderStorm

2:http://www.biologie.ens.fr/~mthomas/other/chip-seq-training/#
3:https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/gene-regulation/chip-seq-analysis
4:http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003326

Reference:
1:http://homer.ucsd.edu/homer/index.html
2:https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/gene-regulation/chip-seq-analysis

上一篇 下一篇

猜你喜欢

热点阅读