全解TOmicsVis完美应用于转录组可视化R包

2023-07-15 本文已影响0人 benben_miao

1. TOmicsVis R包简介

TOmicsVis v1.2.2
Title: Transcriptomics Visualization
Website:
https://benben-miao.github.io/TOmicsVis/

Github:
https://github.com/benben-miao/TOmicsVis/
Latest version: recommend
devtools::install_github("benben-miao/TOmicsVis")
CRAN:
https://cloud.r-project.org/web/packages/TOmicsVis/
Stable version:
install.packages("TOmicsVis")

TOmicsVis 包功能描述:
TOmicsVis (Transcriptomics Visualization)专注于整合并提供给生物学科研者转录组学从样品性状统计到转录组学基因挖掘的完整可视化方案，递进地包含6大分类：①Samples Statistics -> ②Traits Analysis -> ③Differential Expression Analyais -> ④Advanced Analysis -> ⑤GO and KEGG Enrichment -> ⑥Tables Operations。

TOmicsVis R包已经在CRAN上发布，为了获取最新的功能函数建议通过Github方式安装。下面我们将以超长文章详细介绍30+个可视化函数，并为每个函数提供标准的示例数据。

致谢R包及作者：ggplot2, ggpubr, ggcorrplot, venn, circlize, ComplexHeatmap, EnhancedVolcano, GGally, survminer, clusterProfiler, enrichplot, WGCNA, pheatmap.

Authors:
benben-miao (厦门大学博士，Hiplot(https://hiplot.cn), BioSciTools开发者)
https://github.com/benben-miao/
dongwei1220 (中山大学博士后, Hiplot(https://hiplot.cn), ShinySeurat开发者)
https://github.com/dongwei1220/

2. TOmicsVis R包生态

2.1 Github 源代码仓库

https://github.com/benben-miao/TOmicsVis/

Figure 2.1 Github 源代码仓库

2.2 CRAN 官方版本

https://cloud.r-project.org/web/packages/TOmicsVis/

Figure 2.2 CRAN 官方版本

2.3 基于Pkgdown构建的API文档

https://benben-miao.github.io/TOmicsVis/

2.3 基于Pkgdown构建的API文档

3. TOmicsVis 可视化函数

3.1 样本统计Samples Statistics

3.1.1 Quantile plot for visualizing data distribution.

可视化数据分布的分位数图。

# 1. Load box_data example datasets
data(quantile_data)

# 2. Run quantile_plot plot function
quantile_plot(
  quantile_data,
  my_shape = "fill_circle",
  point_size = 1.5,
  conf_int = TRUE,
  conf_level = 0.95,
  split_panel = "One_Panel",
  legend_pos = "right",
  legend_dir = "vertical",
  sci_fill_color = "Sci_AAAS",
  sci_color_alpha = 0.75,
  ggTheme = "theme_light"
)

Figure 3.1.1 可视化数据分布的分位数图

3.1.2 Correlation Heatmap for samples/groups based on Pearson algorithm.

基于Pearson算法的样品/组的相关热图。

# 1. Load gene_exp example dataset
data(gene_exp)

# 2. Run corr_heatmap plot function
corr_heatmap(
  gene_exp,
  corr_method = "pearson",
  cell_shape = "square",
  fill_type = "full",
  lable_size = 3,
  lable_digits = 3,
  color_low = "blue",
  color_mid = "white",
  color_high = "red",
  ggTheme = "theme_light"
)

Figure 3.1.2 基于Pearson算法的样品/组的相关热图

3.1.3 PCA dimensional reduction visualization for RNA-Seq.

RNA-Seq的PCA尺寸还原可视化。

# 1. Load pca_sample_gene and pca_group_sample example datasets
data(pca_sample_gene)
data(pca_group_sample)

# 2. Run pca_plot plot function
pca_plot(
  pca_sample_gene,
  pca_group_sample,
  point_size = 5,
  text_size = 5,
  ellipse_alpha = 0.3,
  legend_pos = "right",
  legend_dir = "vertical",
  ggTheme = "theme_light"
)

Figure 3.1.3 RNA-Seq的PCA尺寸还原可视化

3.1.4 Dendrograms for multiple samples/groups clustering.

多个样品/组聚类的树状图。

# 1. Load example datasets
data(gene_exp)

# 2. Run plot function
dendro_plot(
  gene_exp,
  dist_method = "euclidean",
  hc_method = "average",
  tree_type = "rectangle",
  k_num = 3,
  palette = "npg",
  color_labels_by_k = TRUE,
  horiz = TRUE,
  label_size = 0.8,
  line_width = 0.7,
  rect = TRUE,
  rect_fill = TRUE,
  title = "Cluster Dendrogram",
  xlab = "",
  ylab = "Height"
)

Figure 3.1.4 多个样品/组聚类的树状图

3.2 性状分析Traits Analysis

3.2.1 Box plot support two levels and multiple groups with P value.

盒子图支持两个级别和多个具有P值的级别。

# 1. Load box_data example datasets
data(box_data)

# 2. Run box_plot plot function
box_plot(
  box_data,
  test_method = "t.test",
  test_label = "p.format",
  notch = TRUE,
  group_level = "Three_Column",
  add_element = "dotplot",
  my_shape = "fill_circle",
  sci_fill_color = "Sci_AAAS",
  sci_fill_alpha = 0.5,
  sci_color_alpha = 1,
  legend_pos = "right",
  legend_dir = "vertical",
  ggTheme = "theme_light"
)

Figure 3.2.1 盒子图支持两个级别和多个具有P值的级别

3.2.2 Violin plot support two levels and multiple groups with P value.

小提琴绘图支持两个级别和多个组的p值。

# 1. Load box_data example datasets
data(box_data)

# 2. Run violin_plot plot function
violin_plot(
  box_data,
  test_method = "t.test",
  test_label = "p.format",
  group_level = "Three_Column",
  violin_orientation = "vertical",
  add_element = "boxplot",
  element_alpha = 0.5,
  my_shape = "plus_times",
  sci_fill_color = "Sci_AAAS",
  sci_fill_alpha = 0.5,
  sci_color_alpha = 1,
  legend_pos = "right",
  legend_dir = "vertical",
  ggTheme = "theme_light"
)

Figure 3.2.2 小提琴绘图支持两个级别和多个组的p值

3.2.3 Survival plot for analyzing and visualizing survival data.

分析和可视化生存数据的生存图。

# 1. Load survival_plot example datasets
data(survival_data)

# 2. Run survival_plot plot function
survival_plot(
  survival_data,
  curve_function = "pct",
  conf_inter = TRUE,
  interval_style = "ribbon",
  risk_table = TRUE,
  num_censor = TRUE,
  sci_palette = "aaas",
  ggTheme = "theme_light",
  x_start = 0,
  y_start = 0,
  y_end = 100,
  x_break = 100,
  y_break = 25
)

Figure 3.2.3 分析和可视化生存数据的生存图

3.2.4 TSNE plot for analyzing and visualizing TSNE algorithm.

TSNE图，用于分析和可视化TSNE算法。

# 1. Load tsne_plot example datasets
data(tsne_data)

# 2. Run tsne_plot plot function
tsne_plot(
  tsne_data,
  seed = 5,
  point_size = 4,
  point_alpha = 0.8,
  text_size = 2,
  text_alpha = 0.8,
  ci_level = 0.95,
  ellipse_alpha = 0.3,
  sci_fill_color = "Sci_JAMA",
  sci_color_alpha = 0.9,
  legend_pos = "right",
  legend_dir = "vertical",
  ggTheme = "theme_light"
)

Figure 3.2.4 TSNE图，用于分析和可视化TSNE算法

3.3 差异表达基因Differential Expression Analyais

3.3.1 Venn plot for stat common and unique gene among multiple sets.

多组中的Venn图统计图和独特基因。

# 1. Load venn_data example datasets
data(venn_data)

# 2. Run venn_plot plot function
venn_plot(
  venn_data,
  line_type = "blank",
  ellipse_shape = "circle",
  sci_fill_color = "Sci_AAAS",
  sci_fill_alpha = 0.65
)

Figure 3.3.1 多组中的Venn图统计图和独特基因

3.3.2 Venn plot for stat common and unique gene among multiple sets.

多组中的Venn图统计图和独特基因。

# 1. Load example datasets
data(venn_data)

# 2. Run plot function
flower_plot(
  venn_data,
  angle = 90,
  a = 0.5,
  b = 2,
  r = 1,
  ellipse_col_pal = "Spectral",
  circle_col = "white",
  label_text_cex = 1
)

Figure 3.3.2 多组中的Venn图统计图和独特基因

3.3.3 Circos heatmap plot for visualizing gene expressing in multiple samples.

Circos热图图可视化多个样品中表达的基因。

# 1. Load circos_heatmap_data example datasets
data(circos_heatmap_data)

# 2. Run circos_heatmap plot function
circos_heatmap(
  circos_heatmap_data,
  low_color = "#0000ff",
  mid_color = "#ffffff",
  high_color = "#ff0000",
  gap_size = 10,
  cluster_method = "complete",
  distance_method = "euclidean",
  dend_height = 0.2,
  rowname_size = 0.8
)

Figure 3.3.3 Circos热图图可视化多个样品中表达的基因

3.3.4 Volcano plot for visualizing differentailly expressed genes.

可视化不同表达基因的火山图。

# 1. Load deg_data example datasets
data(deg_data)

# 2. Run volcano_plot plot function
volcano_plot(
  deg_data,
  log2fc_cutoff = 1,
  pq_value = "pvalue",
  pq_cutoff = 0.005,
  cutoff_line = "longdash",
  point_shape = "large_circle",
  point_size = 1,
  point_alpha = 0.5,
  color_normal = "#888888",
  color_log2fc = "#008000",
  color_pvalue = "#0088ee",
  color_Log2fc_p = "#ff0000",
  label_size = 3,
  boxed_labels = FALSE,
  draw_connectors = FALSE,
  legend_pos = "right"
)

Figure 3.3.4 可视化不同表达基因的火山图

3.3.5 MversusA plot for visualizing differentially expressed genes.

Mversusa图可视化差异表达的基因。

# 1. Load deg_data example datasets
data(deg_data2)

# 2. Run volcano_plot plot function
ma_plot(
  deg_data2,
  foldchange = 2,
  fdr_value = 0.05,
  point_size = 0.5,
  color_up = "#FF0000",
  color_down = "#008800",
  color_alpha = 0.5,
  top_method = "fc",
  top_num = 20,
  label_size = 8,
  label_box = TRUE,
  title = "Group1 -versus- Group2",
  xlab = "Log2 mean expression",
  ylab = "Log2 fold change",
  ggTheme = "theme_minimal"
)

Figure 3.3.5 Mversusa图可视化差异表达的基因

3.3.6 Heatmap group for visualizing grouped gene expression data.

用于可视化分组基因表达数据的热图组。

# 1. Load example datasets
data(heatmap_group_data)

# 2. Run heatmap_group plot function
heatmap_group(
  data = heatmap_group_data,
  scale_data = "none",
  clust_method = "complete",
  border_show = TRUE,
  value_show = TRUE,
  low_color = "#00880088",
  mid_color = "#ffffff",
  high_color = "#ff000088",
  na_color = "#ff8800",
  x_angle = 45
)

Figure 3.3.6 用于可视化分组基因表达数据的热图组

3.4 高级分析Advanced Analysis

3.4.1 Trend plot for visualizing gene expression trend profile in multiple traits.

可视化多个性状中基因表达趋势概况的趋势图。

# 1. Load chord_data example datasets
data(trend_data)

# 2. Run trend_plot plot function
trend_plot(
  trend_data,
  scale_method = "globalminmax",
  miss_value = "exclude",
  line_alpha = 0.5,
  show_points = TRUE,
  show_boxplot = TRUE,
  num_column = 2,
  xlab = "Traits",
  ylab = "Genes Expression",
  sci_fill_color = "Sci_AAAS",
  sci_fill_alpha = 0.8,
  sci_color_alpha = 0.8,
  legend_pos = "right",
  legend_dir = "vertical",
  ggTheme = "theme_light"
)

Figure 3.4.1 可视化多个性状中基因表达趋势概况的趋势图

3.4.2 Gene cluster trend plot for visualizing gene expression trend profile in multiple samples.

基因簇趋势图，用于可视化多个样品中的基因表达趋势谱。

# 1. Load example datasets
data(gene_cluster_data)

# 2. Run plot function
gene_cluster_trend(
  gene_cluster_data,
  thres = 0.25,
  min_std = 0.2,
  palette = "PiYG",
  cluster_num = 4
)

Figure 3.4.2 基因簇趋势图，用于可视化多个样品中的基因表达趋势谱

3.4.3 Gene ranking dotplot for visualizing differentailly expressed genes.

用于可视化不同表达基因的基因排名点。

# 1. Load example datasets
data(deg_data)

# 2. Run plot function
gene_rank_plot(
  data = deg_data,
  log2fc = 1,
  palette = "Spectral",
  top_n = 10,
  genes_to_label = NULL,
  label_size = 5,
  base_size = 12,
  title = "Gene ranking dotplot",
  xlab = "Ranking of differentially expressed genes",
  ylab = "Log2FoldChange"
)

Figure 3.4.3 用于可视化不同表达基因的基因排名点

3.4.4 WGCNA analysis pipeline for RNA-Seq.

RNA-Seq的WGCNA分析管道。

# 1. Load wgcna_pipeline example datasets
data(wgcna_gene_exp)
data(wgcna_sample_group)

# 2. Run wgcna_pipeline plot function
# wgcna_pipeline(wgcna_gene_exp, wgcna_sample_group)

3.4.5 Network plot for analyzing and visualizing relationship of genes.

分析和可视化基因关系的网络图。

# 1. Load example datasets
data(network_data)
head(network_data)

# 2. Run network_plot plot function
network_plot(
  network_data,
  calcBy = "degree",
  degreeValue = 0.05,
  nodeColorNormal = "#00888888",
  nodeBorderColor = "#FFFFFF",
  nodeColorFrom = "#FF000088",
  nodeColorTo = "#00880088",
  nodeShapeNormal = "circle",
  nodeShapeSpatial = "csquare",
  nodeSize = 10,
  labelSize = 0.5,
  edgeCurved = TRUE,
  netLayout = "layout_on_sphere"
)

Figure 3.4.5 分析和可视化基因关系的网络图

3.4.6 Heatmap cluster plot for visualizing clustered gene expression data.

热图群集图可视化聚类基因表达数据。

# 1. Load example datasets
data(gene_exp)
head(gene_exp)

# 2. Run network_plot plot function
heatmap_cluster(
  data = gene_exp,
  dist_method = "euclidean",
  hc_method = "average",
  k_num = 5,
  palette = "Spectral",
  cluster_pal = "Set1",
  gaps_col = NULL,
  angle_col = 45,
  label_size = 10,
  base_size = 12
)

Figure 3.4.6 热图群集图可视化聚类基因表达数据

3.5 富集分析GO & KEGG Enrichment

3.5.1 Chord plot for visualizing the relationships of pathways and genes.

和弦图可视化途径和基因的关系。

# 1. Load chord_data example datasets
data(chord_data)

# 2. Run chord_plot plot function
chord_plot(
  chord_data,
  multi_colors = "RainbowColors",
  color_alpha = 0.5,
  link_visible = TRUE,
  link_dir = -1,
  link_type = "diffHeight",
  sector_scale = "Origin",
  width_circle = 3,
  dist_name = 3,
  label_dir = "Vertical",
  dist_label = 0.3
)

Figure 3.5.1 和弦图可视化途径和基因的关系

3.5.2 GO enrichment analysis and stat plot based on GO annotation results (None/Exist Reference Genome).

基于GO注释结果（无参考基因组）的GO富集分析和Stat图。

# 1. Load example datasets
data(go_anno)
# head(go_anno)

data(go_deg_fc)
# head(go_deg_fc)

# 2. Run go_enrich_stat analysis function
go_enrich_stat(
  go_anno,
  go_deg_fc,
  padjust_method = "fdr",
  pvalue_cutoff = 0.5,
  qvalue_cutoff = 0.5,
  max_go_item = 15,
  strip_fill = "#CDCDCD",
  xtext_angle = 45,
  sci_fill_color = "Sci_AAAS",
  sci_fill_alpha = 0.8,
  ggTheme = "theme_light"
)

Figure 3.5.2 基于GO注释结果（无参考基因组）的GO富集分析和Stat图

3.5.3 GO enrichment analysis and bar plot based on GO annotation results (None/Exist Reference Genome).

基于GO注释结果（无参考基因组）的GO富集分析和条图。

# 1. Load example datasets
data(go_anno)
# head(go_anno)

data(go_deg_fc)
# head(go_deg_fc)

# 2. Run go_enrich_bar analysis function
go_enrich_bar(
  go_anno,
  go_deg_fc,
  padjust_method = "fdr",
  pvalue_cutoff = 0.5,
  qvalue_cutoff = 0.5,
  sign_by = "p.adjust",
  category_num = 30,
  font_size = 12,
  low_color = "#ff0000aa",
  high_color = "#008800aa",
  ggTheme = "theme_light"
)

Figure 3.5.3 基于GO注释结果（无参考基因组）的GO富集分析和条图

3.5.4 GO enrichment analysis and dot plot based on GO annotation results (None/Exist Reference Genome).

基于GO注释结果（不存在参考基因组）的GO富集分析和点图。

# 1. Load example datasets
data(go_anno)
# head(go_anno)

data(go_deg_fc)
# head(go_deg_fc)

# 2. Run go_enrich_dot analysis function
go_enrich_dot(
  go_anno,
  go_deg_fc,
  padjust_method = "fdr",
  pvalue_cutoff = 0.5,
  qvalue_cutoff = 0.5,
  sign_by = "p.adjust",
  category_num = 30,
  font_size = 12,
  low_color = "#ff0000aa",
  high_color = "#008800aa",
  ggTheme = "theme_light"
)

Figure 3.5.4 基于GO注释结果（不存在参考基因组）的GO富集分析和点图

3.5.5 GO enrichment analysis and tree plot based on GO annotation results (None/Exist Reference Genome).

基于GO注释结果（无参考基因组）的GO富集分析和树图。

# 1. Load example datasets
data(go_anno)
# head(go_anno)

data(go_deg_fc)
# head(go_deg_fc)

# 2. Run go_enrich_tree analysis function
go_enrich_tree(
  go_anno,
  go_deg_fc,
  padjust_method = "fdr",
  pvalue_cutoff = 0.5,
  qvalue_cutoff = 0.5,
  sign_by = "p.adjust",
  category_num = 20,
  font_size = 4,
  low_color = "#ff0000aa",
  high_color = "#008800aa",
  hclust_method = "complete",
  ggTheme = "theme_light"
)

Figure 3.5.5 基于GO注释结果（无参考基因组）的GO富集分析和树图

3.5.6 GO enrichment analysis and net plot based on GO annotation results (None/Exist Reference Genome).

基于GO注释结果（无参考基因组）的GO富集分析网络图。

# 1. Load example datasets
data(go_anno)
# head(go_anno)

data(go_deg_fc)
# head(go_deg_fc)

# 2. Run go_enrich_net analysis function
go_enrich_net(
  go_anno,
  go_deg_fc,
  padjust_method = "fdr",
  pvalue_cutoff = 0.5,
  qvalue_cutoff = 0.5,
  category_num = 20,
  net_layout = "circle",
  net_circular = TRUE,
  low_color = "#ff0000aa",
  high_color = "#008800aa"
)

Figure 3.5.6 基于GO注释结果（无参考基因组）的GO富集分析网络图

3.6 表格操作Tables Operations

3.6.1 Table split used for splitting a grouped column to multiple columns.
表拆分用于将分组的列分解为多列。

# 1. Load table_split_data example datasets
data(table_split_data)
head(table_split_data)

# 2. Run table_split plot function
res <- table_split(table_split_data, 
                  grouped_var = "variable", 
                  miss_drop = TRUE
                  )
head(res)

3.6.2 Table merge used to merge multiple variables to on variable.

表合并用于将多个变量合并到变量上。

# 1. Load example datasets
data(table_merge_data)
head(table_merge_data)

# 2. Run function
res <- table_merge(
  table_merge_data,
  merge_vars = c("Ozone", "Solar.R", "Wind", "Temp"),
  new_var = "Variable",
  new_value = "Value",
  na_remove = FALSE
)
head(res)

3.6.3 Table filter used to filter row by column condition.

表滤波器用于按列条件过滤行。

# 1. Load example datasets
data(table_filter_data)
head(table_filter_data)

# 2. Run function
res <- table_filter(table_filter_data, 
                    height > 100 & eye_color == "black"
                    )
head(res)

3.6.4 Table cross used to cross search and merge results in two tables.

表交叉用于跨搜索和合并结果，结果有两个表。

# 1. Load example datasets
data(table_cross_data1)
head(table_cross_data1)

data(table_cross_data2)
head(table_cross_data2)

# 2. Run function
res <- table_cross(
  table_cross_data1,
  table_cross_data2,
  inter_var = "geneID",
  left_index = TRUE,
  right_index = FALSE
)
head(res)

致谢：CRAN和Bioconductor提供的一些列分析和可视化R包。欢迎大家即时反馈和加入贡献。

全解TOmicsVis完美应用于转录组可视化R包

1. TOmicsVis R包简介

2. TOmicsVis R包生态

2.1 Github 源代码仓库

2.2 CRAN 官方版本

2.3 基于Pkgdown构建的API文档

3. TOmicsVis 可视化函数

3.1 样本统计Samples Statistics

3.2 性状分析Traits Analysis

3.3 差异表达基因Differential Expression Analyais

3.4 高级分析Advanced Analysis

3.5 富集分析GO & KEGG Enrichment

3.6 表格操作Tables Operations

猜你喜欢

热点阅读