基因组实战02: 软件安装和GATK数据下载

2024-05-16  本文已影响0人  生信探索

download the genomics data of GATK

FTP

https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle

two slow in China (23.0K/s)

# install lftp

sudo apt -y install lftp

# login into the ftp server; no password (just enter)

lftp ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/

# download all the hg38 directory

mirror hg38

use google cloud

35M/s

https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/

micromamba create -n gsutil

micromamba activate gsutil

micromamba install -y -c conda-forge python=3.4 gsutil

mkdir -p ~/DataHub/Genomics/GATK

cd ~/DataHub/Genomics/GATK

gsutil -m cp -r \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf" \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.idx" \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dict" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.fai" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals" \

  "gs://genomics-public-data/resources/broad/hg38/v0/wgs_calling_regions.hg38.interval_list" \

  .

BWA的索引文件

Homo_sapiens_assembly38.fasta

Homo_sapiens_assembly38.fasta.64.amb

Homo_sapiens_assembly38.fasta.64.ann

Homo_sapiens_assembly38.fasta.64.bwt

Homo_sapiens_assembly38.fasta.64.pac

Homo_sapiens_assembly38.fasta.64.sa

Homo_sapiens_assembly38.fasta.dict

prepare the environment

python 2

micromamba create -n dna2 python=2

micromamba activate dna2

micromamba install -y -c bioconda bwa samtools bcftools vcftools snpeff fastqc qualimap gatk4 tabix multiqc

python 3

micromamba create -n dna3

micromamba activate dna3

micromamba install -y -c conda-forge python=3.10 python_abi xopen

micromamba install -y -c bioconda cutadapt=4.3 trim-galore

上一篇 下一篇

猜你喜欢

热点阅读