NGSplot database 构建

2022-03-23  本文已影响0人  余绕

Add ./bin to $PATH, then makedir a new folder, then run:

genDB.sh db_list.txt [animal|plant|bacteria].json

1. Modify plant.json

First find the ftp links of ensemble plant for the gtf files, such as:

http://ftp.ebi.ac.uk/ensemblgenomes/pub/release-52/plants/gtf/zea_mays/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.52.chr.gtf.gz

Then modify the <plant.json> file to as follows:

The provided ftp links in the file should be modified before use, cause the FTP links of ensemble has changed a little. But the provided link was old version.
Only modified the FTP links, keep other fields unchanged.

ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/release-*VERSION*/plants/gtf/*SPECIES*/*SPECIES_NAME*.*GENOME*.*VERSION*.chr.gtf.gz

2. Modify the db_list_plant.txt (tab sperated)

B73   Zm-B73-REFERENCE-NAM-5.0     zea_mays    52
#B73 is the genome name used for data analysis. 
#zay_mays is  *SPECIES*/*SPECIES_NAME* in json file
#52 is *VERSION* in json file

3. Create the genome library

genDB.sh ./data/db_list_plant.txt   ./data/json/plant.json
### following was the the running code generated by above command:
B73
Processing ensembl annotation...
/public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/ensembl.db.gen.sh B73 Zm-B73-REFERENCE-NAM-5.0 zea_mays       52 ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/release-*VERSION*/plants/gtf/*SPECIES*/*SPECIES_NAME*.*GENOME*.*VERSION*.chr.gtf.gz ./data/json/plant.json 0 ./data/db_list_plant.txt /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin ./output
Zea_mays
wget -q -O ./output/tmp/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.52.gtf.gz ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/release-52/plants/gtf/zea_mays/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.52.chr.gtf.gz
perl -I /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/gtf2txt_plot.pl ensembl     ./output/tmp/B73.ensembl.gtf    ./output/tmp/B73.ensembl.txt
/public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/filter_ensembl.py ./output/tmp/B73.ensembl.gtf    ./output/tmp/B73.ensembl.txt  ./output/tmp/B73.ensembl.biotype.txt B73
Rscript /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/genDB.R B73 ensembl   ./output/tmp/B73.ensembl.biotype.txt
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ?.iocGenerics?

The following objects are masked from ?.ackage:parallel?.

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ?.ackage:stats?.

    IQR, mad, sd, var, xtabs

The following objects are masked from ?.ackage:base?.

    anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
    colnames, colSums, dirname, do.call, duplicated, eval, evalq,
    Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
    lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
    rowSums, sapply, setdiff, sort, table, tapply, union, unique,
    unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ?.4Vectors?

The following object is masked from ?.ackage:base?.

    expand.grid

Loading required package: foreach
Processing refseq annotation...
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e 'select * from refGene;' B73  > ./output/tmp/B73.refseq.gp
/public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/refseq.db.gen.sh: line 8: mysql: command not found
No B73 genome annotation in UCSC!
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N  -e select chrom, chromStart as start, chromEnd as end, name from cpgIslandExt; B73 > ./output/tmp/B73.cgi.gp
bash: line 34: mysql: command not found
No CGI annotation of B73 genome in UCSC! Skip the CGI annotation step!
B73/
B73/B73.ensembl.exon.variant.misc.RData
B73/B73.ensembl.exon.altBoth.misc.RData
B73/B73.ensembl.exon.altBoth.protein_coding.RData
B73/B73.ensembl.exon.variant.protein_coding.RData
B73/B73.ensembl.exon.promoter.misc.RData
B73/B73.ensembl.exon.polyA.protein_coding.RData
B73/B73.ensembl.exon.canonical.misc.RData
B73/B73.ensembl.exon.promoter.protein_coding.RData
B73/B73.ensembl.exon.altAcceptor.misc.RData
B73/.chrnames.ensembl
B73/B73.ensembl.exonmodel.RData
B73/B73.ensembl.exon.altDonor.protein_coding.RData
B73/B73.ensembl.genebody.misc.RData
B73/B73.ensembl.genebody.protein_coding.RData
B73/B73.ensembl.exon.polyA.misc.RData
B73/B73.ensembl.exon.altDonor.misc.RData
B73/B73.ensembl.exon.canonical.protein_coding.RData
B73/B73.ensembl.exon.altAcceptor.protein_coding.RData
B73/.metainfo

Install the genomes

The generated genome was in ./output/database.

ngsplotdb.py install  ngsplotdb_B73_52_3.00.tar.gz
list the installed genomes
ngsplotdb.py list 

ID       Assembly                 Species              EnsVer   NPVer    InstalledFeatures            
AGPv3    AGPv3                    zea_mays             21.0     3.0      exon,genebody,tss,tes        
B73      Zm-B73-REFERENCE-NAM-5.0 zea_mays             52.0     3.0      exon,genebody,tss,tes        
hg19     GRCh37                   homo_sapiens         75.0     3.0      cgi,exon,genebody,tss,tes    
IRGSP-1  IRGSP-1.0                oryza_sativa         21.0     3.0      exon,genebody,tss,tes        
MSUv7    MSUv7                    oryza_sativa         0.0      3.0      exon,genebody,tss,tes        
Tair10   TAIR10                   arabidopsis_thaliana 21.0     3.0      exon,genebody,tss,tes 

If you want to remove a genome just use the following command:

ngsplotdb.py    remove   hg19     
示例:安装RAPDB 基因组
  1. modify the db_list_plant.txt as follows:
Rapdb   IRGSP-1.0   oryza_sativa    52
  1. Create the genome files
genDB.sh ./data/db_list_plant.txt   ./data/json/plant.json
# This command should be run in non-conda environment, as the conda condition reportes errors.
  1. Install the genome files
ngsplotdb.py install  ngsplotdb_Rapdb_52_3.00.tar.gz
  1. Running example
#BSUB -J blast
#BSUB -n 10
#BSUB -R span[hosts=1]
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q normal

cd /public/home/qtxu/Data/Chenxiaoyang/Ribo_seq_Redox/rRNA_tRNA_removed_clean_data/align/Metaplot_by_NGSplot

ngs.plot.r -G  Rapdb   -R genebody -C config.example.txt  -O  H4_K5_rep1_dup_rm_genebody  -T H4K5  -FL 300
  1. Results:
    This is a test example using ribo-seq data.
image.png

Ref to :GitHub - shenlab-sinai/ngsplotdb: Genome databases generation pipeline for ngs.plot and region_analysis.

上一篇 下一篇

猜你喜欢

热点阅读