用pheatmap画单细胞read 分布的track图
2020-06-14 本文已影响0人
caokai001
参考:
单细胞ChIP-seq 和单细胞ATAC-seq 常常出现这个图,展示单细胞信号和bulk ChIP-seq 相似性。之前一直不知道这图如何画,参考上面链接,大佬写的博文才知道了.
- 选取感兴趣区间,seq 函数创建区间
- 用bedtools 计算每个细胞里面 每一个bin的count 数目
- 将不同细胞的count 结果合并一起
- 按照细胞reads count 进行排序
- 使用pheatmap 进行画图展示
image.png整理的脚本如下:
- 数据:存放
单细胞bed
文件的目录 - 脚本:数据整理的
shell
脚本 + 画图的R脚本
- 运行:两个脚本进行画图
## Run1 :
bash run_signal_cell_tract.sh /home/kcao/work_dir/Cut_Tag/Fig6/test/sc_H3K27me3_rawdata chr7 132600000 133100000 1500
# 存放单细胞bed 文件的目录: /home/kcao/work_dir/Cut_Tag/Fig6/test/sc_H3K27me3_rawdata
# 感兴趣的区间染色体编号:chr7
# 感兴趣的区间start:132600000
# 感兴趣的区间end : 133100000
# 区间的binsize大小:1500
## Run2 :
Rscript run_sc_heatmap.R "cell_sort_count_matrix" "cell_sort_list.txt" "heatmap.pdf"
# 统计的cell_bin count matrix : "cell_sort_count_matrix"
# 每个细胞read 数目,排序后的文件 : "cell_sort_list.txt"
# 输出的热图文件名 : "heatmap.pdf"
image.png运行后产生的文件
image.png和IGV
bulk ChIP-seq
track 拼接 效果
代码如下:
- run_signal_cell_tract.sh
#!/usr/bin/env bash
set -ue
if [ $# -lt 5 ]
then
echo "Usage : bash run_signal_cell_tract.sh <path_to_signal_cell_bed_directory> <region_chr> <region_start> <region_end> <binsize>
Example: bash run_signal_cell_tract.sh /home/kcao/work_dir/Cut_Tag/Fig6/test/sc_H3K27me3_rawdata chr7 132600000 133100000 1500
Description of input fields:
Filed1: The directory where the single cell Chip-seq bed file is stored
Filed2: The interval of interest, the chromosome number
Filed3: The interval of interest, start
Filed4:The interval of interest, end
Filed5: Divide the interval into bin statistics and set specific binsize.
Select the corresponding region according to the width of the selected region.
Setting the appropriate window size is very important for the visual effect of the drawing.
For the 200KB area, you can choose 600-800bp window.
ref:https://liuyang0681.github.io/2019/04/12/paper0412/
"
exit 1
fi
sc_beddir=$1
region_chr=$2
region_start=$3
region_end=$4
binsize=$5
echo -e "\t\t" ;date
echo -e "run:bash run_signal_cell_tract.sh $1 $2 $3 $4 $5"
echo "----------------------------------------------------"
#######################
# 1.Generate regions of interest:
cd `dirname $sc_beddir`
seq -w $region_start $binsize $region_end |awk -v region_chr=$2 '{print region_chr"\t"$1"\t"$1+499"\t""bin_"NR"\t""500"}' > region_bin.bed
# 2. 对细胞进行排序,reads 数目越多,track 越高
## 2.1 统计每个细胞reads数目,并排序
if [ ! -d $sc_beddir'_count' ];then
mkdir $sc_beddir'_count'
echo "creat "$sc_beddir'_count'
cd $sc_beddir
wc -l *.bed |sed '$d' | sort -k 1,1n | sed '1!G;h;$!d' > ../cell_sort_list.txt
## 2.2 计算每个cell 的bin区域count数目
for id in *.bed ;
do
echo "$id" >$sc_beddir'_count'/${id/bed/count.bed};
bedtools intersect -a ../region_bin.bed -b $id -c -bed |awk '{print $4"\t"$5"\t"$6}' |cut -f 3 >>$sc_beddir'_count'/${id/bed/count.bed} ;echo "---$id is ok---";
done
else
echo $sc_beddir'_count' "已经存在"
fi
# 3.Merge multiple files
cd $sc_beddir'_count'
paste -d "\t" *.bed > ../cell_sort_count_matrix
echo -e "\t\t" ;date
echo -e "*** bin_cell_matrix finished ***"
echo "----------------------------------------------------"
- run_sc_heatmap.R
### Step0:解析参数
args <- commandArgs(trailingOnly = TRUE)
print(args)
bin_cell_matrix_file=args[1]
cell_sort_file=args[2]
output_file=args[3]
print(bin_cell_matrix_file)
### Step1:加载包
if (!requireNamespace("pheatmap", quietly = TRUE))
install.packages("pheatmap")
library(pheatmap)
### Syep2 : 读入文件
bin_cell_matrix <- read.table(bin_cell_matrix_file,header = T)
rownames(bin_cell_matrix)<- paste0("bin_",1:nrow(bin_cell_matrix) )
head(bin_cell_matrix[1:5,1:5])
cell_sort <- read.table(cell_sort_file,header = F,col.names = c("count","cell_name"))
head(cell_sort)
## Step3.sort matrix by sample reads count
cell_bin_matrix <- t( bin_cell_matrix[ ,cell_sort[,2]] )
table(rowSums(cell_bin_matrix))
## Step4: pheatmap画热图
cell_bin_matrix[cell_bin_matrix>2]=2
sum(cell_bin_matrix[2,])
bk <-seq(0,3,by=1)
pdf(file=output_file)
pheatmap(cell_bin_matrix,cluster_cols = FALSE,cluster_rows = FALSE,
scale = "none",
show_rownames=F,
show_colnames = F,
legend = FALSE,
color = colorRampPalette(colors = c("white","blue"))(length(bk)),
legend_breaks=seq(0,3,1),
breaks=bk)
dev.off()