有些软件总能以某种方式劝退你 | SpectralTAD 之 T

2023-11-24  本文已影响0人  生信云笔记

序 言

  有些软件总能以某种形式劝退你,让你毫不犹豫的投入其他软件的怀抱,外加产生一种不吐不快的情愫,激发一种唯恐他人也会“遭此毒手”的正义感。软件之所以叫做软件,大抵是因为经过包装与测试,比较成熟且有很大的兼容性,至少按照文档来运行应该不会问题,如果动不动就出错那与脚本何异?
  生信分析软件很大一部都属于小众软件,有些软件也许本身的出发点就是科研与文章,而好不好用能不能让更多的人使用并不重要。遇到这样的软件,那就启用咱们的万能躺平招式:分析软件千千万,不行咱就换一换。同时,对此类软件应该保持敬畏之心,那句话怎么说来着:明知山有虎,猛敲退堂鼓。不然浪费的可是自己宝贵的时间,下面咱就一起来看看今天的主角。

缘 起

  为什么会用到SpectralTAD呢?这都源自于做TAD差异分析时使用的TADCompare软件,关于该软件也写过两篇帖子:[TADCompare:差异TAD分析],[TADCompare 差异分析要留心]。TADCompare分析时可以接受预定于的TAD文件,而软件文档中call TAD用的就是SpectralTAD,这便有了下面的事情。
  SpectralTAD一个专门用于从HiC接触矩阵中做TAD calling的R包,其使用一种基于滑动窗口的改进版光谱聚类方法来快速检测TAD。SpectralTAD将接触矩阵作为输入,并输出BED格式的数据框,其中包含TAD相对应的坐标位置。该工具包含SpectralTADSpectralTAD_Par两个函数,分别为单CPU和多CPU模式。另外,该包接受多种格式,如n × nn × ( n + 3 )、3列等。
  至此,感觉SpectralTAD一切还挺正常,跟着文档代码走一遭才知该软件非常人所能驾驭也,一起来看看怎么回事:

文档示例:


自己测试:

devtools::install_github("dozmorovlab/SpectralTAD")
library(SpectralTAD)

data("rao_chr20_25_rep")
tads = SpectralTAD(rao_chr20_25_rep, chr = "chr20", levels = 2, qual_filter = FALSE)
Converting to n x n matrix
Matrix dimensions: 2517x2517
Error in `vec_init()`:
! `x` must be a vector, not `NULL`.
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: Unknown or uninitialised column: `Group`.
2: Unknown or uninitialised column: `Group`.

  这错误来的触不及防,也让人有点摸不到头脑。咋滴,这软件还认人不成?当然,软件不可能因人而异,但会因环境而异。经过一番倒腾,发现下面这一段代码:

else {
    gap_order = order(-point_dist)
    sil_score = c()
    for (cluster in clusters) {
        k = 1
        partition_found = 0
        first_run = TRUE
        cutpoints = c()
        while (partition_found == 0) {
          new_gap = gap_order[k]
          cutpoints = c(cutpoints, new_gap)
          diff_points = which(abs(new_gap - cutpoints[-length(cutpoints)]) <=
            min_size)
          if (length(diff_points) > 0) {
            cutpoints = cutpoints[-length(cutpoints)]
          }
          if (length(cutpoints) == cluster) {
            partition_found = 1
          }
          else {
            k = k + 1
          }
        }
        if (any(is.na(cutpoints))) {
          next
        }
        cutpoints = cutpoints[order(cutpoints)]
        cutpoints = c(1, cutpoints, length(non_gaps_within) +
          1)
        group_size = diff(cutpoints)
        memberships = c()
        for (i in seq_len(length(group_size))) {
          memberships = c(memberships, rep(i, times = group_size[i]))
        }
        sil = summary(cluster::silhouette(memberships,
          dist_sub))
        sil_score = c(sil_score, sil$si.summary[4])
        Group_mem[[cluster]] = memberships
    }
    end_group = Group_mem[[which(diff(sil_score) < 0)[1]]]
    if (length(end_group) == 0) {
        end_group = dplyr::bind_rows()
    }
    else {
        end_group = data.frame(ID = as.numeric(colnames(sub_filt)),
          Group = end_group)
        end_group = end_group %>% dplyr::mutate(group_place = Group) %>%
          dplyr::group_by(group_place) %>% dplyr::mutate(Group = max(ID)) %>%
          ungroup() %>% dplyr::select(ID, Group)
    }
}
if (end == nrow(cont_mat)) {
    Group_over = dplyr::bind_rows(Group_over, end_group)
    end_loop = 1
}
else {
    end_IDs = which(end_group$Group == last(end_group$Group))
    start = end - length(end_IDs) + 1
    if (length(start) == 0) {
        start = end
    }
    end = start + window_size
    end_group = end_group[-end_IDs, ]
    Group_over = dplyr::bind_rows(Group_over, end_group)
    if ((end + (2e+06/resolution)) > nrow(cont_mat)) {
        end = nrow(cont_mat)
    }
}

  运行文档的代码之所以出错是由于end_group没有结果,dplyr::bind_rows(Group_over, end_group)合并数据时引发了错误,也由此找到了相关联的参数min_size

tads = SpectralTAD(rao_chr20_25_rep, chr = "chr20", levels = 2, qual_filter = FALSE, min_size=4)
Error in `vec_init()`:
! `x` must be a vector, not `NULL`.
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: Unknown or uninitialised column: `Group`.
2: Unknown or uninitialised column: `Group`.

tads = SpectralTAD(rao_chr20_25_rep, chr = "chr20", levels = 2, qual_filter = FALSE, min_size=3)
Error in `vec_init()`:
! `x` must be a vector, not `NULL`.
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: Unknown or uninitialised column: `Group`.
2: Unknown or uninitialised column: `Group`.

tads = SpectralTAD(rao_chr20_25_rep, chr = "chr20", levels = 2, qual_filter = FALSE, min_size=2)
 head(tads$Level_1)
# A tibble: 6 × 4
  chr     start     end Level
  <chr>   <dbl>   <dbl> <dbl>
1 chr20   50000  325000     1
2 chr20  325000  525000     1
3 chr20  525000  775000     1
4 chr20  775000 1200000     1
5 chr20 1200000 1450000     1
6 chr20 1450000 1775000     1

  当自以为可以正常运行时,软件又告诉你没那么简单。帮助信息中虽提示可以通过out_format参数修改输出格式:juiceboxbedpehicexplorerbed,但是一运行又掉入另外一个坑:

tads = SpectralTAD(rao_chr20_25_rep, chr = "chr20", levels = 2, qual_filter = FALSE, min_size=2, out_format='juicebox')
New names:
• `chr` -> `chr...1`
• `start` -> `start...2`
• `end` -> `end...3`
• `chr` -> `chr...4`
• `start` -> `start...5`
• `end` -> `end...6`
Error in `mutate()`:
ℹ In argument: `start = format(start, scientific = FALSE)`.
Caused by error:
! `start` must be size 1709 or 1, not 14.

  算了,真心累了,不再折腾了,这个时候只想对软件说一句:就此别过,后会无期!

结 语

  分析时还是挑知名度好,出场率高的软件吧,毕竟这样的软件更加成熟,即使出了问题也能快速找到相应的解决办法,用不着花费大量的时间在软件上面。除非没有更好的选择,否则还是不要啃硬骨头的好,不然时间浪费不说还可能消化不良,不如一开始就避而远之。
  当然,不可否认,造成软件无法正常运行的原因很可能是与开发者的环境不同。所以,分析时使用软件更多的时候都是在与软件的环境和参数作斗争,躺不平的时候还要支棱起来:生信路不平,还需要缝缝补补。

往期回顾

生信分析不只是跑个软件 | TADCompare 差异分析要留心
HiC术语图解与分析软件汇总
R语言揭秘 | $符鲜为人知的秘密,避坑预警
scRNA-seq稀疏矩阵图解,格式转换的核心
scRNAseq | h5文件转化为matrix表达矩阵

上一篇 下一篇

猜你喜欢

热点阅读