Diamond 比对软件使用

2018-01-23  本文已影响0人  热爱大自然的小和尚

Diamond是一个用于比对query蛋白和数据库蛋白(blastp)或query核苷酸序列和数据库蛋白(blastx)的软件,官方测试其性能为blast+的500~20000倍,好像很厉害的样子:

  • Pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST.
  • Frameshift alignments for long read analysis.
  • Low resource requirements and suitable for running on standard desktops or laptops.
  • Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.

(DIAMOND is a new alignment tool for aligning short DNA sequencing reads to a protein reference database such as NCBI-NR. On Illumina reads of length 100-150bp, in fast mode, DIAMOND is about 20,000 times faster than BLASTX, while reporting about 80-90% of all matches that BLASTX finds, with an e-value of at most 1e-5. In sensitive mode, DIAMOND ist about 2,500 times faster than BLASTX, finding more than 94% of all matches.)

所以,还是学一学吧

Bitcoin-Diamond.jpg

下面是对diamond官方manual的简短介绍:

  1. 有四个主要的程序:
    • makedb
    • blastp
    • blastx
    • view
  2. -b 默认参数为2,指定使用32G的运行内存,如果计算机RAM充足,可以使用更大的数值
    -k 指定返回的比对数目
    -e 指定E-value,默认0.001,比blast+的默认值更加严格
    -f 输出格式,我比较喜欢6,和blast+一样,可以选择输出哪些fields
    -o 输出到哪个文件中
    -p 指定使用的核心数目
    --min-score 指定blast的score值,该值在指定时-e便会无效
    --seg mask的参数,指定时会把query的低复杂度的区段给mask掉
    -c 设置chunk的数目,值越大内存消耗和性能都会下降,设置为1时分析最快,不可以设置大于4(默认值)
    -q query序列,fasta和fastq格式都可以
  3. 下面是Diamond在Github上的一点使用说明:
  • The program may use quite a lot of memory and also temporary disk space. Should the program fail due to running out of either one, you need to set a lower value for the block size parameter -b (see the manual).
  • The default (fast) mode was mainly designed for short reads. For longer sequences, the sensitive modes (options --sensitive or --more-sensitive) are recommended.
  • The runtime of the program is not linear in the size of the query file and it is much more efficient for large query files (> 1 million sequences) than for smaller ones.
  • Low complexity masking is applied to the query and reference sequences by default. Masked residues appear in the output as X.
  • The default e-value cutoff of DIAMOND is 0.001 while that of BLAST is 10, so by default the program will search a lot more stringently than BLAST and not report weak hits.
上一篇 下一篇

猜你喜欢

热点阅读