生信工具

NCBI Blast+的本地安装与使用

2019-01-15  本文已影响43人  爱折腾的大懒猪

NCBI Blast是常用的序列查找工具, 包括蛋白, 核酸. 一般使用网页进行查询即可, 但有时候开发则需要本地的数据库以及程序. NCBI提供Blast+工具包, 内含多种blast工具, 介绍可以参考NCBI提供的两份文档(书):

下载与安装

Blast+的下载

Program Function
blastdbcheck Checks the integrity of a BLAST database
blastdbcmd Retrieves sequences or other information from a BLAST database
blastdb_aliastool Creates database alias (to tie volumes together for example)
Blastn Searches a nucleotide query against a nucleotide database
blastp Searches a protein query against a protein database
blastx Searches a nucleotide query, dynamically translated in all six frames, against a protein database
blast_formatter Formats a blast result using its assigned request ID (RID) or its saved archive
convert2blastmask Converts lowercase masking into makeblastdb readable data
deltablast Searches a protein query against a protein database, using a more sensitive algorithm
dustmasker Masks the low complexity regions in the input nucleotide sequences
legacy_blast.pl Converts a legacy blast search command line into blast+ counterpart and execute it
makeblastdb Formats input FASTA file(s) into a BLAST database
makembindex Indexes an existing nucleotide database for use with megablast
makeprofiledb Creates a conserved domain database from a list of input position specific scoring matrix (scoremats) generated by psiblast
psiblast Finds members of a protein family, identifies proteins distantly related to the query, or builds position specific scoring matrix for the query
rpsblast Searches a protein against a conserved domain database to identify functional domains present in the query
rpstblastn Searches a nucleotide query, by dynamically translating it in all six-frames first, against a conserved domain database
segmasker Masks the low complexity regions in input protein sequences
tblastn Searches a protein query against a nucleotide database dynamically translated in all six frames
tblastx Searches a nucleotide query, dynamically translated in all six frames, against a nucleotide database similarly translated
update_blastdb.pl Downloads preformatted blast databases from NCBI
windowmasker Masks repeats found in input nucleotide sequences

executables 除了提供 Blast+, 还提供其他工具:


配置

  1. 将BLAST按照目录export到PATH, 例如 export PATH=$PATH:$HOME/ncbi-blast-2.8.1+/bin. 这可保证直接执行.
  2. 管理数据库:

数据库的下载

NCBI FTP服务器提供一个BLAST的专门文件夹 : ftp://ftp.ncbi.nlm.nih.gov/blast/, 含有BLAST程序以及数据库. 内含以下子文件夹:

配置

可执行文件路径加入到环境变量. 将blast内bin的文件夹路径加入到PATH环境变量即可, 请自行搜索具体方法. 例如Bash: export PATH=$PATH:/usr/local/ncbi/blast/bin

另外一个重要的配置是BLASTDB环境变量, 即blast进行搜索时数据库所在. 根据数据库位置进行设置, 例如 : export BLASTDB=$HOME/blastdb

示例

官方简单示例1

Standalone BLAST Setup for Unix 内的例子

$ blastdbcmd -db refseq_rna.00 -entry nm_000122 -out test_query.fa
$ blastn -query test_query.fa -db refseq_rna.00 -task blastn -dust no -outfmt "7 qseqid sseqid evalue bitscore" -max_target_seqs 2
# BLASTN 2.2.29+
# Query: gi|263191547|ref|NM_000122.3| Homo sapiens mutL homolog 1 (MLH1), transcript variant 1, mRNA
# Database: refseq_rna.00
# Fields: query id, subject id, evalue, bit score
# 2 hits found
gi|263191547|ref|NM_000122.3|   gi|263191547|ref|NM_000122.3|   0.0      4801
gi|263191547|ref|NM_000122.3|   gi|332816398|ref|XM_001170433.2|        0.0      4758
# BLAST processed 1 queries
上一篇下一篇

猜你喜欢

热点阅读