重点

计算相关性,查看数据质量

2020-04-25  本文已影响0人  余绕

首先把全基因组分成1kb一个bins,然后计算每个bins的reads数目
具体命令如下:

#BSUB -J blast
#BSUB -n 10
#BSUB -R span[hosts=1]
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q normal

cd /public/home/xxx/Practice/test1

bedtools multicov -bams  D1-xx_MSU7.0_rmp.bam D2-xx_MSU7.0_rmp.bam  -bed bins.txt >Counts.txt

获得counts文件导入R中,命令如下:

setwd('C:/Users/xx/Desktop')
data=read.csv('Counts.txt',sep = '\t')
data=as.data.frame(data)
colnames(data)=c("Chr","start","end","D1","D2")

ggplot(data=data,mapping = aes(x=log2(data$D1),y = log2(data$D2)))+geom_point(size=1.3)

cor(data$D1,data$D2,method = "pearson")

结果如下:


image.png

高级散点图

data=read.csv('C:/Users/TAO/Documents/ACL_K5_Counts_1kb.txt',sep = '\t')
data=as.data.frame(data)
colnames(data)=c("Chr","start","end","D1","D2")

df<-data.frame(data)
ggplot(df,mapping = aes(x=log2(data$D1),y = log2(data$D2)))+geom_hex(bins=100,na.rm=TRUE)+ #scale_fill_gradientn(colours=colormap)+theme_bw()
  scale_fill_gradient2(low="mediumturquoise",mid="LightGoldenrod1",high = "mediumorchid1",midpoint =1200)+theme_bw()+  #自定??????色,注意midpoint???要正???,否??????色???示不全。
  xlim(0,15)+ylim(0,15)+geom_smooth(method=lm,color="gray51")  

结果如下:


image.png
上一篇 下一篇

猜你喜欢

热点阅读