三个buff都不能让你成功getGEO吗

2023-02-16 本文已影响0人小洁忘了怎么分身

问题

学员遇到一个报错：

第一个buff：timeout设置

我一瞅，time out 啊。迅速发了一个链接，并告诉他要主动搜索。。。

核心解决方案是一句代码

options(timeout=100000)
getOption("timeout")
## [1] 1e+05

就是把下载时间60s的限制解除。之前用来解决一个类似的报错：

Timeout of 60 seconds was reached

反馈没用，报错不变。

第二个buff：加速

options( 'download.file.method.GEOquery' = 'libcurl' )

这个是设置用libcurl加快访问速度。加速完了还是慢，那有啥办法，憋着吧。

反馈没用，报错还是不变。

第三个buff：geoChina

既然是geo的芯片数据，那么曾老板的镜像怎么可以没有镜头。

#install.packages("AnnoProbe")
library(AnnoProbe)
a = geoChina("GSE148601",destdir = ".")

## Error in geoChina("GSE148601", destdir = "."): Your GSE may not be expression by array, or even not a GSE

反馈报错，说这不是一个芯片。

其实是因为这个数据太新，2022年9月的，比我儿子还小。

所以没有被AnnoProbe收录啊！别急别急，你看，他在搞了。

如果是一个被收录了的数据，他的打开方式是这样的：

b = geoChina("GSE42872",destdir = ".")
class(b[[1]])

## [1] "ExpressionSet"
## attr(,"package")
## [1] "Biobase"

然后就可以对接提取矩阵和临床信息的代码啦。

或者也可以使用tinyarray进一步简化

#install.packages("tinyarray")
library(tinyarray)
geo = geo_download("GSE42872",by_annopbrobe = T)
names(geo)

## [1] "exp" "pd"  "gpl"

一步到位拿到表达矩阵临床信息 GPL编号哦。

第四个buff：改包！

既然前三个buff都不能解决，只能让神奇的小洁老师自己上手了。

一番搜索，发现是GEOquery更新过后，downloadFile这个函数做了改动。

2.62版本，它是这样

function (url, destfile, mode, quiet = TRUE) 
{
  h <- curl::new_handle()
  curl::handle_setheaders(h, `accept-encoding` = "gzip")
  result = tryCatch({
    curl::curl_download(url, destfile, mode = mode, quiet = quiet, 
      handle = h)
    return(TRUE)
  }, error = function(e) {
    message(e)
    return(FALSE)
  })
  message("File stored at:")
  message(destfile)
  if (!result) {
    if (file.exists(destfile)) {
      file.remove(destfile)
    }
    stop(sprintf("Failed to download %s!", destfile))
  }
  return(0)
}

2.66版本它成了这样：

function (url, destfile, mode, quiet = TRUE) 
{
    h <- curl::new_handle()
    curl::handle_setheaders(h, `accept-encoding` = "gzip")
    timeout_seconds <- 120
    curl::handle_setopt(h, timeout_ms = timeout_seconds * 1000)
    result = tryCatch({
        curl::curl_download(url, destfile, mode = mode, quiet = quiet, 
            handle = h)
        return(TRUE)
    }, error = function(e) {
        message(e)
        return(FALSE)
    })
    message("File stored at:")
    message(destfile)
    if (!result) {
        if (file.exists(destfile)) {
            file.remove(destfile)
        }
        stop(sprintf("Failed to download %s!", destfile))
    }
    return(0)
}

简单来说就是timeout变成固定的120s了，这也就是为什么学员一开始的报错里出现了120000毫秒，换算过来就是120s。

so,解决办法是编辑R包。github上面已经有人提交了解决办法，但还没有被作者团队接纳。

从github下载这个包的zip，解压，找到R/getGEOfile.R，downloadFile函数，把第三句代码改掉

timeout_seconds <- max(getOption("timeout"), 120)

这样，timeout的设置就可以用起来了，。

devtools::install_local("GEOquery-master.zip")
library(GEOquery)
options(timeout=100000)
getOption("timeout")
options( 'download.file.method.GEOquery' = 'libcurl' )
a = getGEO("GSE148601",destdir = ".")

不报错啦！

https://github.com/seandavi/GEOquery/pull/139

第五个buff：想办法外部下载。

无非是网络限制而已，你可以想办法的。或者，把表达矩阵的链接复制下来，求助也是很快的。

image.png

点进去，右键复制链接

image.png

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE148nnn/GSE148601/matrix/GSE148601_series_matrix.txt.gz

自己去浏览器下载，或者发出来求助，拿到文件后，放在工作目录下，代码就可以调用本地文件，而不去网页下载，这样就可以正常运行啦！

请注意，降低别人帮助你的门槛是很重要的，发链接别人可以直接下载，发编号别人却需要一下下搜索和点开哦。《为回答你的人着想一下》