一篇PNAS Fig.1的复现
2019-09-27 本文已影响0人
大吉岭猹
- 文章链接:https://www.pnas.org/content/115/8/E1839.long
- GSE编号:GSE102339
- 特别提示:如果只是复现Fig1,不需要下载RNA-seq数据和非WT的ChIP-seq数据
- 注1:运行本文中的脚本时建议均采用
nohup bash name.sh &
的方式 - 注2:本文脚本默认存放在
epi/script
文件夹下 - 注3:所有循环脚本运行前一定要用echo检查一下变量
1. 安装软件和下载数据(sra文件)
略
2. 建文件夹
mkdir sra raw clean align bw mergeBam peaks script
3. 准备config
# SraRunTable.txt是下载的,file.list是自己复制的
awk '{print $11,$13}' SraRunTable.txt |grep -v 'Run'|sort -k2,2|paste - file.list |awk '{print $1,$4}' > config
-
config做好了长这样
config
4. sra转fq
- 脚本
analysis_dir=../raw
sra=.sra
cat config|while read id;
do echo $id
arr=($id)
srr=${arr[0]}$sra
sample=${arr[1]}
fastq-dump -A $sample -O $analysis_dir --gzip --split-3 /trainee/ZJU/epi/sra/$srr
done
5. ChIP-seq过滤fq
- 脚本
ls ../raw/ChIP*.gz | while read id;
do
trim_galore -q 25 --phred33 --length 25 -e 0.1 --stringency 4 -o ../clean $id
done
6. 准备trim_config_RNA
cd ../clean
ls ../raw/RNA*.fq.gz | paste - - > trim_config_RNA
7. RNA-seq过滤fq
- 脚本
cat trim_config_RNA | while read sample;
do
arr=$sample
fq1=${arr[0]}
fq2=${arr[1]}
trim_galore -q 25 --phred33 --length 35 -e 0.1 --stringency 4 --paired -o ../clean $fq1 $fq2
done
8. ChIP-seq比对
- 脚本
bowtie2_index=/trainee/ZJU/reference/index_bowtie2/dm6
ls ../clean/ChIP-Seq_*.gz | while read id;
do
file=$(basename $id)
sample=${file%%.*}
#echo $file $sample
bowtie2 -p 6 -x $bowtie2_index -U $id | samtools sort -O bam -@ 6 -o - > ../align/${sample}.bam
done
9. 准备RNA_bowtie2_config
cd ../align
ls ../clean/RNA*.fq.gz | paste - - > RNA_bowtie2_config
10. RNA-seq比对
- 脚本
bowtie2_index=/trainee/ZJU/reference/index_bowtie2/dm6
cat RNA_bowtie2_config | while read sample;
do
arr=($sample)
fq1=${arr[0]}
fq2=${arr[1]}
file=$(basename $sample )
tmp=$(basename $fq1)
sample=${tmp%%_1_trimmed.fq.gz}
#echo $file $sample
bowtie2 -p 6 -x $bowtie2_index -1 $fq1 -2 $fq2 | samtools sort -O bam -@ 6 -o - > ../align/${sample}.bam
done
11. ChIP-seq比对去重
- 脚本
ls ../align/ChIP*.bam | while read id
do
sample=${id%%.bam}
#echo $id $sample
samtools rmdup -s $id $sample.rmdup.bam
done
cd ../align
ls *.rmdup.bam | xargs -i samtools index {}
ls *.rmdup.bam | while read id ;do samtools flagstat $id > $(basename $id ".bam").stat ;done #basename的神奇用法
- 自此,RNA-seq数据被我抛弃了
12. bam merge
cd align
ls ChIP*.rmdup.bam|sed 's/_[0-9]_trimmed.rmdup.bam//g'|sort -u |while read id;do samtools merge ../mergeBam/$id.merge.bam $id*.rmdup.bam ;done
13. call peaks
cd align
ls *_WT.merge.bam |cut -d"." -f 1 |while read id;
do
if [ ! -s ${id}_summits.bed ];
then
echo $id
nohup macs2 callpeak -c ChIP-Seq_Input_WT.merge.bam -t $id.merge.bam -f BAM -B -g dm -n $id --outdir ../peaks 2> $id.log &
fi
done
14. peak转bed(for Fig.1A)
- H3K27me3_peakdomain.bed从文章补充材料中获得
- 脚本
intersectBed -a ChIP-Seq_Spps_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Spps_inH3K27.bed
intersectBed -a ChIP-Seq_Pho_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Pho_inH3K27.bed
intersectBed -a ChIP-Seq_Ph_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Ph_inH3K27.bed
intersectBed -a ChIP-Seq_Psc_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Psc_inH3K27.bed
intersectBed -a ChIP-Seq_Pc_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Pc_inH3K27.bed
intersectBed -a ChIP-Seq_Ez_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Ez_inH3K27.bed
- 下一步在R中完成
library(regioneR)
library(ChIPpeakAnno)
Psc <- toGRanges("Psc_inH3K27.bed")
Spps <- toGRanges("Spps_inH3K27.bed")
Pho <- toGRanges("Pho_inH3K27.bed")
ol <- findOverlapsOfPeaks(Psc, Spps, Pho, maxgap=1,connectedPeaks="min")
pdf("Psc_Spps_Pho_overlap.pdf",width=5,height=5)
makeVennDiagram(ol, totalTest=2500)
dev.off()
-
成品
Fig.1A
15. bam转bw(for Fig.1C)
- 脚本
ls ../mergeBam/*_WT.merge.bam | while read fq1;
do
sample=${fq1%%.bam}
samtools index $fq1
bamCoverage -b $fq1 -o $sample.bw --normalizeUsing RPKM -p 5
done
#第一次运行这个脚本没有在bw文件夹里看到预期的输出结果,其实是代码有一个小问题,一开始没有发现这个问题就是因为没有完全养成echo检查的习惯
-
载入IGV后查看基因Abd-B,成品
Fig.1B
16. deeptools可视化(for Fig.1B)
16.1. step1
- 按文章要求进行log2处理的脚本
ls ../mergeBam/*_WT.merge.bam | while read fq1;
do
tmp=$(basename $fq1)
sample=${tmp%%.bam}
samtools index $fq1
bamCompare -b1 $fq1 -b2 ChIP-Seq_Input_WT.merge.bam --operation log2 -of bigwig -o ../bw/$sample.log2.bw -p 2
done
16.2. step2
- 思路是把Spps,Cg,Pho的peak merge到一起,形成一个reference,然后分别将上面三个peak和这个reference做intersectBed,也就是把三个的peak对应成一个标准模式
mergePeaks ChIP-Seq_Spps_WT_peaks.narrowPeak ChIP-Seq_Pho_WT_peaks.narrowPeak ChIP-Seq_Psc_WT_peaks.narrowPeak | cut -f2- > reference.bed
intersectBed -a reference.bed -b ChIP-Seq_Spps_WT_peaks.narrowPeak -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Spps_standard.bed
intersectBed -a reference.bed -b ChIP-Seq_Pho_WT_peaks.narrowPeak -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Pho_standard.bed
intersectBed -a reference.bed -b ChIP-Seq_Psc_WT_peaks.narrowPeak -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Psc_standard.bed
16.3. step3
- 使用网页工具把每一个区域的位点信息做成一个bed文件
16.4. step4
computeMatrix reference-point --referencePoint center -p 6 -b 5000 -a 5000 -R ../peaks/Fig1/veen_result/all_3.bed ../peaks/Fig1/veen_result/Pho_Spps.bed ../peaks/Fig1/veen_result/Spps_Psc.bed ../peaks/Fig1/veen_result/Pho_Psc.bed ../peaks/Fig1/veen_result/only_Spps.bed ../peaks/Fig1/veen_result/only_Pho.bed ../peaks/Fig1/veen_result/only_Psc.bed -S ChIP-Seq_Spps_WT.merge.log2.bw ChIP-Seq_Pho_WT.merge.log2.bw ChIP-Seq_Ez_WT.merge.log2.bw ChIP-Seq_Ph_WT.merge.log2.bw ChIP-Seq_Psc_WT.merge.log2.bw ChIP-Seq_Pc_WT.merge.log2.bw --skipZeros -o ../peaks/Fig1/b/all_center.gz
plotHeatmap -m all_center.gz -out all_Heatmap.png --colorList "blue,white,red" --zMin -2 --zMax 2 --whatToShow "heatmap only" --regionsLabel G1 G2 G3 G4 G5 G6 G7 --samplesLabel Spps Pho Ez Ph Psc Pc
-
成品
Fig.1B
最后,向大家隆重推荐生信技能树的一系列干货!
- 生信技能树全球公益巡讲:https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g
- B站公益74小时生信工程师教学视频合辑:https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw
- 招学徒:https://mp.weixin.qq.com/s/K8WiX5C7BYC3WonM2VcdHQ