RNA-seq便捷工具生信入门之Linux基础与R基础

grabseqs——批量下载sra数据并直接转换为fastq文件

2022-06-26  本文已影响0人  嘿嘿嘿嘿哈

文章:grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories | Bioinformatics | Oxford Academic (oup.com)
GitHub:louiejtaylor/grabseqs: A utility for easy downloading of reads from next-gen sequencing repositories like NCBI SRA (github.com)


grabseqs是一个可以从NCBI SRA, MG-RAST和iMicrobe数据库批量下载数据的工具,2020年发表在Bioinformatics 杂志,可下载sra数据并直接转换为fastq文件
其转化依赖于fasterq-dump或fastq-dump,因此安装前注意要下载sra-tools:conda install -c bioconda sra-tools
还要注意其他依赖条件有python3环境、sra-tools版本大于2.9、pigz和wget

1 下载安装

conda安装:

conda install grabseqs -c louiejtaylor -c bioconda -c conda-forge

或者pip安装:

pip install grabseqs

2 使用

2.1 详尽参数:

grabseqs sra [-h] [-m METADATA] [-o OUTDIR] [-r RETRIES] [-t THREADS]
             [-f] [-l] [--no_parsing] [--parse_run_ids]
             [--use_fastq_dump]
             id [id ...]

positional arguments:
  id                One or more BioProject, ERR/SRR or ERP/SRP number(s)

optional arguments:
  -h, --help        show this help message and exit
  -m METADATA       filename in which to save SRA metadata (.csv format,
                    relative to OUTDIR)
  -o OUTDIR         directory in which to save output. created if it doesn't
                    exist
  -r RETRIES        number of times to retry download
  -t THREADS        threads to use (for fasterq-dump/pigz)
  -f                force re-download of files
  -l                list (but do not download) samples to be grabbed
  --parse_run_ids   parse SRR/ERR identifers (do not pass straight to fasterq-
                    dump)
  --custom_fqdump_args CUSTOM_FQD_ARGS
                    "string" containing args to pass to fastq-dump
  --use_fastq_dump  use legacy fastq-dump instead of fasterq-dump (no
                    multithreaded downloading)

2.2 示例如下:

# use 10 threads, save metadata to proj/metadata.csv, download to the dir proj/, retry failed downloads 3x, get all samples from SRP#######)
grabseqs sra -t 10 -m metadata.csv -o proj/ -r 3 SRP*********
# If you'd like to pass your own arguments to fasterq-dump to get data in a slightly different format, you can do so like this
grabseqs sra SRP*******  -r 0 --custom_fqdump_args="--split-spot --progress"

其他常用命令的简单示例:

#Download all samples from a single SRA Project:
grabseqs sra SRP********
#Or any combination of projects (S/ERP), runs (S/ERR), BioProjects (PRJNA):
grabseqs sra SRR******** ERP******** PRJNA******** ERR********
#If you'd like to do a dry run and just get a list of samples that will be downloaded, pass -l:
grabseqs sra -l SRP********
#Similar syntax works for MG-RAST:
grabseqs mgrast mgp****** mgm*******

#And iMicrobe (prefixing the sample numbers with "s" and project numbers with "p"):
grabseqs imicrobe p4 s3
上一篇下一篇

猜你喜欢

热点阅读