R语言绘图-CRISPR sgRNA文库质量检测
2024-06-20 本文已影响0人
火卫控
R语言绘图-CRISPR sgRNA文库质量检测、
cumsum
均一度
分布图
数据如下:
> head(data)
sgRNA Gene Mock MEM1st
1 44636017 YIPF7 5 0
2 37207246 SSTR3 18 7
3 19814803 RNF186 44 0
4 30645429 SMIM18 69 0
5 Non-Targeting Control552 Non-Targeting Control 111 0
6 67512306 DSEL 113 0
完整代码如下:
library(ggplot2)
#setwd("D:\\Coding\\R_gzlab_docu\\CrisprNGS")
# 文件路径
# file = "E:\\big_data\\CRISPR-HHN\\test\\HHN-Mock_Crispr.count.csv"
# file = "E:\\big_data\\CRISPR-HHN\\2024.5.15-CRISPR-2nd\\00.CleanData\\toHHN-CRISPR-2024.5.15-2nd-R-\\mock_vs_top30.count.txt"
# file="E:\\big_data\\CRISPR-MKX\\2024.6.3-MKX-mem\\分析结果\\mock_vs_MEM2nd.count.txt"
file="E:\\big_data\\CRISPR-MKX\\2024.6.3-MKX-mem\\分析结果\\mock_vs_MEM1st.count.txt"
data <- read.csv(file,header = T,sep="\t")
head(data)
#file = "E:\\big_data\\CRISPR-HHN\\test\\count\\mock_vs_top5.count.txt"
#data <- read.table(file,header=T)
#data <- read.csv("aqyISG.count.csv",header = T)
# 定义数据列
col = data$Mock
# col = data$Top10
# col = data$Top30
# col = data$Mock
# col = data$MEM2nd
# col=data$MEM1st
# countsummary = read.delim(file,check.names = FALSE)
# head(countsummary)
p1<-ggplot(data.frame(x = log2(col)), aes(x = x)) + geom_density(fill = "#69b3a2", alpha = 0.8) + labs(title = "Readcounts distribution", x = "log2 normalized sgRNA read counts", y = "Density") + theme_minimal()
p2<-ggplot(data.frame(x = log2(col)), aes(x = x))+stat_ecdf(col = "#13e3a2", linewidth=1.2) + labs(title = "Readcounts distribution", x = "log2 normalized sgRNA read counts", y = "Cumulative Frequence")+theme_minimal()
#ggThemeAssistGadget(p2)
p1
p2
p <- cowplot::plot_grid(p1, p2, nrow = 2, labels = LETTERS[1:2])#将p1-p2组合成一幅图,按照两行排列,标签分别为A、B。(LETTERS[1:4] 意为提取26个大写英文字母的前两个:A、B)
p
结果如下图:

read.delim
read.delim和read.table的区别为:读取数据速度不同、要求不同、空串不同。
一、读取数据速度不同
1、read.delim:read.delim的读取数据速度比read.table的读取数据速度更快。
2、read.table:read.table的读取数据速度比read.delim的读取数据速度更慢。
二、要求不同
1、read.delim:read.delim不要求所有列都对等,会按最大列,或指定的列数填充。
2、read.table:read.table严格要求所有列都对等。
代码如下:
countsummary = read.delim(file,check.names = FALSE)
head(countsummary)
部分读取结果如下:
> head(countsummary)
sgRNA Gene Mock MEM1st
1 44636017 YIPF7 5 0
2 37207246 SSTR3 18 7
3 19814803 RNF186 44 0
4 30645429 SMIM18 69 0
5 Non-Targeting Control552 Non-Targeting Control 111 0
6 67512306 DSEL 113 0