从NCBI、ENA等公共数据库下载数据-kingfisher
Kingfisher
Kingfisher(翠鸟?) is a fast and flexible program for procurement of sequence files (and their metadata annotations) from public data sources, including the European Nucleotide Archive (ENA), NCBI SRA, Amazon AWS and Google Cloud. It's input is one or more "Run" accessions e.g. DRR001970, or a BioProject accessions e.g. PRJNA621514 or SRP260223
.
- wwood/kingfisher-download: Easier download/extract of FASTA/Q read data and metadata from the ENA, NCBI, AWS or GCP. (github.com)
- documentation
简介
Kingfisher 有两个模式:
-
get
在get子命令中,kingfisher从一系列源下载数据,并按顺序进行尝试,直到一个成功为止。然后根据需要将下载的数据转换为输出SRA/FASTQ/FASTA/GZIP文件格式。下载和提取阶段都可以比使用NCBI的SRA工具包更快。特别是,从ENA下载意味着直接下载FASTQ文件,因此不需要提取步骤。 -
annotate
在annotation子命令中,有关运行的 metadata 从NCBI下载,并以几种格式之一输出,例如human-readable, CSV, TSV, JSON, feather or parquet。默认情况下,会下载少量 metadata 可以使用-all-columns
输出更多详细信息。
下载
conda
conda create -n kingfisher -c conda-forge -c bioconda kingfisher
conda activate kingfisher
kingfisher get -r SRR12118866 -m ena-ftp
Optionally, to use the ena-ascp method, an Aspera connect client is also required. Seehttps://www.ibm.com/aspera/connect/ or https://www.biostars.org/p/325010/.
Usage
kingfisher get --full-help
kingfisher annotate --full-help
kingfisher get -r ERR1739691 -m ena-ascp aws-http prefetch
kingfisher extract --sra ERR1739691.sra -t 16 -f fastq.gz
kingfisher annotate -r ERR1739691
run | bioproject | Gbp | library_strategy | library_selection | model | sample_name | taxon_name
---------- | ---------- | ----- | ---------------- | ----------------- | ------------------- | ----------- | ----------
ERR1739691 | PRJEB15706 | 2.382 | WGS
![](https://img.haomeiwen.com/i27913461/09c8c1a76e6ddb65.png)
kingfisher(get) kingfisher(get)
NAME
kingfisher get
SYNOPSIS
kingfisher get [-h] [-r RUN_IDENTIFIERS [RUN_IDENTIFIERS ...]] [--run-
identifiers-list RUN_IDENTIFIERS_LIST] [-p BIOPROJECTS [BIOPROJECTS
...]] -m {aws-http,prefetch,aws-cp,gcp-cp,ena-ascp,ena-ftp} [{aws-
http,prefetch,aws-cp,gcp-cp,ena-ascp,ena-ftp} ...] [--download-threads
DOWNLOAD_THREADS] [--hide-download-progress] [--ascp-ssh-key
ASCP_SSH_KEY] [--ascp-args ASCP_ARGS] [--allow-paid] [--allow-paid-
from-aws] [--aws-user-key-id AWS_USER_KEY_ID] [--aws-user-key-secret
AWS_USER_KEY_SECRET] [--guess-aws-location] [--allow-paid-from-gcp]
[--gcp-project GCP_PROJECT] [--gcp-user-key-file GCP_USER_KEY_FILE]
[--prefetch-max-size PREFETCH_MAX_SIZE] [--check-md5sums] [-f
{sra,fastq,fastq.gz,fasta,fasta.gz}
[{sra,fastq,fastq.gz,fasta,fasta.gz} ...]] [--force] [--unsorted]
[--stdout] [-t EXTRACTION_THREADS] [--debug] [--version] [--quiet]
[--full-help] [--full-help-roff]