R可视化R学习与可视化

R语言基础绘图详解Volcano plot

2020-02-16  本文已影响0人  谢俊飞

前言:
此文参考自B站孟浩巍的系列视频[1]代码,利用个人测序数据绘图,纯练手,代码偏多,尽量给出解释;

1. 数据载入及预处理

> rm(list=ls())
> setwd("c:/Users/Administrator/Documents/data_analysis/")
# 载入基因计数csv文件
> sign.gene <- read.csv(file = "c:/Users/Administrator/Documents/data_analysis/2018-10-06J.F.XIE_sequencing_data/Sugar_A_vs_Yeast_A_diff_exp.csv", header = T)
#选择处理组数据
> sugar_A_TPM <- sign.gene$Sugar_A_mean_TPM
#选择对照组数据
> Yeast_A_TPM <- sign.gene$Yeast_A_mean_TPM
#横坐标,计算log2FC
> log2_foldchange <- log2(sugar_A_TPM / Yeast_A_TPM)
#将无效值替换为0
> log2_foldchange[sugar_A_TPM == 0] <- 0
> log2_foldchange[Yeast_A_TPM == 0] <- 0
#“纵坐标”,计算-log10_pvalue
> log10_p_value <- log10(sign.gene$pvalue)* (-1)

关于无效数据的处理,参看文章R语言中特殊值NaN、Inf 、NA、NULL的处理[2]

2. Plot图形绘制

> plot(x = log2_foldchange, y = log10_p_value, xlim = c(-4, 4), ylim = c(0.01, 2))
20181005-1.png
> log10_p_value.filter <- log10_p_value[log10_p_value >= 0.001]
> log2_foldchange.filter <- log2_foldchange[log10_p_value >= 0.001]
> plot(x = log2_foldchange.filter, y = log10_p_value.filter,
+      xlim = c(-4, 4), ylim = c(0, 2))
20181005-2.png
plot(x = log2_foldchange.filter, y = log10_p_value.filter,
     xlim = c(-4, 4), ylim = c(0, 2),
     col = rgb(0,0,1,0.1), pch =16
     )
20181005-3.png
#判断多少点
> length(log2_foldchange.filter)
[1] 21987
#储存颜色的向量
> col_vector = rep(rgb(0,0,1,0.1), length(log2_foldchange.filter))
#找出筛选的条件,然后赋值为红色
> col_vector[log10_p_value.filter >= -1*log10(0.01)] = rgb(1,0,0)
> plot(x = log2_foldchange.filter, y = log10_p_value.filter,
+      xlim = c(-4, 4), ylim = c(0.01, 2),
+      col = col_vector, pch =16)
20181005-4.png

设置筛选条件:

  1. p_value <= 0.05, #统计学显著性
  2. sugar or yeast FPKM > 0 #均一化后计数值
  3. foldchange >2 or < 0.5 # 差异倍数
#按照上述三个筛选标准筛选,用于颜色
> select_sign_vector =  (sign.gene$pvalue <= 0.05) & (sign.gene$Sugar_A_mean_TPM > 50) & (sign.gene$Yeast_A_mean_TPM > 50) & (sign.gene$Sugar_A_mean_TPM >= 100 | sign.gene$Yeast_A_mean_TPM >= 100) & (abs(log2_foldchange) >= 1)
#查看筛选后的结果
> table(select_sign_vector)
select_sign_vector
FALSE  TRUE 
57862   244 
#x-y筛选;颜色筛选
> log10_p_value.filter = log10_p_value[log10_p_value >= 0.001]
> log2_foldchange.filter = log2_foldchange[log10_p_value >= 0.001]
> select_sign_vector.filter = select_sign_vector[log10_p_value >= 0.001]
#颜色赋值
> col_vector = rep(rgb(0,0,1,0.1), length(log2_foldchange.filter))
> col_vector[select_sign_vector.filter >= -1*log10(0.1)] = rgb(1,0,0)
#筛选条件,两者一一对应
> length(select_sign_vector.filter)
[1] 21987
> length(col_vector)
[1] 21987
> length(select_sign_vector.filter >= -1*log10(0.01))
[1] 21987
#绘制图形
> plot(x = log2_foldchange.filter, y = log10_p_value.filter,
+      xlim = c(-4, 4), ylim = c(0.01, 2),
+      col = col_vector, pch =16)
> abline(h = -1*log10(0.05), lwd = 3, lty = 3, col = "#4C5B61")
20181005-5.png

公司给出的图形是用ggplot2绘制,筛选的标准就不得而知了。
相比,我们的出图是有点丑,但是整体操作流程就是这样。


Sugar_A_vs_Yeast_A_volcano.png
  1. R语言基础绘图详解heatmap与volcano plot

  2. R语言中特殊值NaN、Inf 、NA、NULL

上一篇 下一篇

猜你喜欢

热点阅读