TransDecoder、blast+、hmmer安装 & Pf

2022-02-23  本文已影响0人  vicLeo

安装TransDecoder,blast+,hmmer

在conda的python3环境下,下载transdecoder

conda activate python3
conda install -y -c bioconda/label/cf201901 transdecoder
下载速度比较慢,下好后,如果直接运行TransDecoder.Predict,会显示command not found。

在whereis transdecoder时,会提醒没有安装URI/Escape模块,使用Perl安装一下
###启动 Perl CPAN

perl -MCPAN -e shell

###安装 Perl URI/Escape 模块:

install URI::Escape

### 退出:
q

perl模块安装完成之后,再次输入TransDecoder.Predict,成功!

(python3) [u20111230014@cpu10 ~]$ TransDecoder.Predict
########################################################################################
#             ______                 ___                  __
#            /_  __/______ ____ ___ / _ \___ _______  ___/ /__ ____
#             / / / __/ _ `/ _\(_-</ // / -_) __/ _ \/ _  / -_) __/
#            /_/ /_/ \_,_/_//_/___/____/\__/\__/\___/\_,_/\__/_/   .Predict
#
########################################################################################
#
#  Transdecoder.LongOrfs|http://transdecoder.github.io> - Transcriptome Protein Prediction
#
#
#  Required:
#
#   -t <string>                            transcripts.fasta
#
#  Common options:
#
#
#   --retain_long_orfs_mode <string>        'dynamic' or 'strict' (default: dynamic)
#                                        In dynamic mode, sets range according to 1%FDR in random sequence of same GC content.
#
# 
#   --retain_long_orfs_length <int>         under 'strict' mode, retain all ORFs found that are equal or longer than these many nucleotides even if no other evidence 
#                                         marks it as coding (default: 1000000) so essentially turned off by default.)
#
#   --retain_pfam_hits <string>            domain table output file from running hmmscan to search Pfam (see transdecoder.github.io for info)     
#                                        Any ORF with a pfam domain hit will be retained in the final output.
# 
#   --retain_blastp_hits <string>          blastp output in '-outfmt 6' format.
#                                        Any ORF with a blast match will be retained in the final output.
#
#   --single_best_only                     Retain only the single best orf per transcript (prioritized by homology then orf length)
#
#   --output_dir | -O  <string>            output directory from the TransDecoder.LongOrfs step (default: basename( -t val ) + ".transdecoder_dir")
#
#   -G <string>                            genetic code (default: universal; see PerlDoc; options: Euplotes, Tetrahymena, Candida, Acetabularia, ...)
#
#   --no_refine_starts                     start refinement identifies potential start codons for 5' partial ORFs using a PWM, process on by default.
#
##  Advanced options
#
#    -T <int>                            Top longest ORFs to train Markov Model (hexamer stats) (default: 500)
#                                        Note, 10x this value are first selected for removing redundancies,
#                                        and then this -T value of longest ORFs are selected from the non-redundant set.
#  Genetic Codes
#
#
#   --genetic_code <string>                Universal (default)
#
#        Genetic Codes (derived from: https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi)
#
#
Acetabularia
Candida
Ciliate
Dasycladacean
Euplotid
Hexamita
Mesodinium
Mitochondrial-Ascidian
Mitochondrial-Chlorophycean
Mitochondrial-Echinoderm
Mitochondrial-Flatworm
Mitochondrial-Invertebrates
Mitochondrial-Protozoan
Mitochondrial-Pterobranchia
Mitochondrial-Scenedesmus_obliquus
Mitochondrial-Thraustochytrium
Mitochondrial-Trematode
Mitochondrial-Vertebrates
Mitochondrial-Yeast
Pachysolen_tannophilus
Peritrich
SR1_Gracilibacteria
Tetrahymena
Universal
#
#  --version                           show version (5.5.0)
#
#########################################################################################

##安装blast+
conda install -y blast
启动blastp -h
(python3) [u20111230014@cpu10 ~]$ blastp -h
USAGE
  blastp [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
    [-negative_taxidlist filename] [-ipglist filename]
    [-negative_ipglist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-qcov_hsp_perc float_value] [-max_hsps int_value]
    [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value] [-seg SEG_options]
    [-soft_masking soft_masking] [-matrix matrix_name]
    [-threshold float_value] [-culling_limit int_value]
    [-best_hit_overhang float_value] [-best_hit_score_edge float_value]
    [-subject_besthit] [-window_size int_value] [-lcase_masking]
    [-query_loc range] [-parse_deflines] [-outfmt format] [-show_gis]
    [-num_descriptions int_value] [-num_alignments int_value]
    [-line_length line_length] [-html] [-sorthits sort_hits]
    [-sorthsps sort_hsps] [-max_target_seqs num_sequences]
    [-num_threads int_value] [-mt_mode int_value] [-ungapped] [-remote]
    [-comp_based_stats compo] [-use_sw_tback] [-version]

DESCRIPTION
   Protein-Protein BLAST 2.12.0+

Use '-help' to print detailed descriptions of command line arguments
(python3) [u20111230014@cpu10 ~]$ whereis blastp -h
blastp: /home/u20111230014/miniconda3/envs/python3/bin/blastp /opt/app/anaconda3/bin/blastp

Usage:
 whereis [options] file

Options:
 -b         search only for binaries
 -B <dirs>  define binaries lookup path
 -m         search only for manuals
 -M <dirs>  define man lookup path
 -s         search only for sources
 -S <dirs>  define sources lookup path
 -f         terminate <dirs> argument list
 -u         search for unusual entries
 -l         output effective lookup paths

For more details see whereis(1).

##安装hmmer
conda install -y hmmer
启动hmmbuild --help
(python3) [u20111230014@cpu10 ~]$ hmmer -h
-bash: hmmer: command not found
(python3) [u20111230014@cpu10 ~]$ hmmbuild --help
Failed to parse command line:
No such option "--help".
Usage: hmmbuild [-options] <hmmfile_out> <msafile>

where basic options are:
  -h     : show brief help on version and usage
  -n <s> : name the HMM <s>
  -o <f> : direct summary output to file <f>, not stdout
  -O <f> : resave annotated, possibly modified MSA to file <f>

To see more help on other available options, do:
  hmmbuild -h

Pfam search

Pfam 数据库中每个编号代表一个蛋白质家族。Pfam 分 A 和 B 两个数据库,其中 A 数据库是经过手工校正的高质量数据库, B 数据库虽然质量低些,依然可以用来寻找蛋白质家族的保守位点。

下载 PFAM 数据库(最新版本为35,这里使用版本33.1)

ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam33.1/Pfam-A.hmm.gz 

解压缩 gunzip Pfam-A.hmm.gz

得到 PFAM 数据库的 HMM 文件。HMM 文件是文本文件,需要将其变成二进制格式,以加快运算速度,同时进行压缩
# 建立索引数据库
hmmpress Pfam-A.hmm
[u20111230014@workstation Pfam-A]$ ll
total 3036508
-rw-r--r-- 1 u20111230014 u20111230014 1459135873 Feb 23 12:17 Pfam-A.hmm
-rw-rw-r-- 1 u20111230014 u20111230014  334380860 Feb 23 17:09 Pfam-A.hmm.h3f
-rw-rw-r-- 1 u20111230014 u20111230014    1259976 Feb 23 17:09 Pfam-A.hmm.h3i
-rw-rw-r-- 1 u20111230014 u20111230014  604042224 Feb 23 17:09 Pfam-A.hmm.h3m
-rw-rw-r-- 1 u20111230014 u20111230014  710553501 Feb 23 17:09 Pfam-A.hmm.h3p
lrwxrwxrwx 1 u20111230014 u20111230014         69 Feb 23 20:24 uniprot_sprot_index.fasta -> /home/u20111230014/workspace/genome/uniprot/uniprot_sprot_index.fasta
上一篇下一篇

猜你喜欢

热点阅读