TransDecoder、blast+、hmmer安装 & Pf
2022-02-23 本文已影响0人
vicLeo
安装TransDecoder,blast+,hmmer
在conda的python3环境下,下载transdecoder
conda activate python3
conda install -y -c bioconda/label/cf201901 transdecoder
下载速度比较慢,下好后,如果直接运行TransDecoder.Predict,会显示command not found。
在whereis transdecoder时,会提醒没有安装URI/Escape模块,使用Perl安装一下
###启动 Perl CPAN
perl -MCPAN -e shell
###安装 Perl URI/Escape 模块:
install URI::Escape
### 退出:
q
perl模块安装完成之后,再次输入TransDecoder.Predict,成功!
(python3) [u20111230014@cpu10 ~]$ TransDecoder.Predict
########################################################################################
# ______ ___ __
# /_ __/______ ____ ___ / _ \___ _______ ___/ /__ ____
# / / / __/ _ `/ _\(_-</ // / -_) __/ _ \/ _ / -_) __/
# /_/ /_/ \_,_/_//_/___/____/\__/\__/\___/\_,_/\__/_/ .Predict
#
########################################################################################
#
# Transdecoder.LongOrfs|http://transdecoder.github.io> - Transcriptome Protein Prediction
#
#
# Required:
#
# -t <string> transcripts.fasta
#
# Common options:
#
#
# --retain_long_orfs_mode <string> 'dynamic' or 'strict' (default: dynamic)
# In dynamic mode, sets range according to 1%FDR in random sequence of same GC content.
#
#
# --retain_long_orfs_length <int> under 'strict' mode, retain all ORFs found that are equal or longer than these many nucleotides even if no other evidence
# marks it as coding (default: 1000000) so essentially turned off by default.)
#
# --retain_pfam_hits <string> domain table output file from running hmmscan to search Pfam (see transdecoder.github.io for info)
# Any ORF with a pfam domain hit will be retained in the final output.
#
# --retain_blastp_hits <string> blastp output in '-outfmt 6' format.
# Any ORF with a blast match will be retained in the final output.
#
# --single_best_only Retain only the single best orf per transcript (prioritized by homology then orf length)
#
# --output_dir | -O <string> output directory from the TransDecoder.LongOrfs step (default: basename( -t val ) + ".transdecoder_dir")
#
# -G <string> genetic code (default: universal; see PerlDoc; options: Euplotes, Tetrahymena, Candida, Acetabularia, ...)
#
# --no_refine_starts start refinement identifies potential start codons for 5' partial ORFs using a PWM, process on by default.
#
## Advanced options
#
# -T <int> Top longest ORFs to train Markov Model (hexamer stats) (default: 500)
# Note, 10x this value are first selected for removing redundancies,
# and then this -T value of longest ORFs are selected from the non-redundant set.
# Genetic Codes
#
#
# --genetic_code <string> Universal (default)
#
# Genetic Codes (derived from: https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi)
#
#
Acetabularia
Candida
Ciliate
Dasycladacean
Euplotid
Hexamita
Mesodinium
Mitochondrial-Ascidian
Mitochondrial-Chlorophycean
Mitochondrial-Echinoderm
Mitochondrial-Flatworm
Mitochondrial-Invertebrates
Mitochondrial-Protozoan
Mitochondrial-Pterobranchia
Mitochondrial-Scenedesmus_obliquus
Mitochondrial-Thraustochytrium
Mitochondrial-Trematode
Mitochondrial-Vertebrates
Mitochondrial-Yeast
Pachysolen_tannophilus
Peritrich
SR1_Gracilibacteria
Tetrahymena
Universal
#
# --version show version (5.5.0)
#
#########################################################################################
##安装blast+
conda install -y blast
启动blastp -h
(python3) [u20111230014@cpu10 ~]$ blastp -h
USAGE
blastp [-h] [-help] [-import_search_strategy filename]
[-export_search_strategy filename] [-task task_name] [-db database_name]
[-dbsize num_letters] [-gilist filename] [-seqidlist filename]
[-negative_gilist filename] [-negative_seqidlist filename]
[-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
[-negative_taxidlist filename] [-ipglist filename]
[-negative_ipglist filename] [-entrez_query entrez_query]
[-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
[-subject subject_input_file] [-subject_loc range] [-query input_file]
[-out output_file] [-evalue evalue] [-word_size int_value]
[-gapopen open_penalty] [-gapextend extend_penalty]
[-qcov_hsp_perc float_value] [-max_hsps int_value]
[-xdrop_ungap float_value] [-xdrop_gap float_value]
[-xdrop_gap_final float_value] [-searchsp int_value] [-seg SEG_options]
[-soft_masking soft_masking] [-matrix matrix_name]
[-threshold float_value] [-culling_limit int_value]
[-best_hit_overhang float_value] [-best_hit_score_edge float_value]
[-subject_besthit] [-window_size int_value] [-lcase_masking]
[-query_loc range] [-parse_deflines] [-outfmt format] [-show_gis]
[-num_descriptions int_value] [-num_alignments int_value]
[-line_length line_length] [-html] [-sorthits sort_hits]
[-sorthsps sort_hsps] [-max_target_seqs num_sequences]
[-num_threads int_value] [-mt_mode int_value] [-ungapped] [-remote]
[-comp_based_stats compo] [-use_sw_tback] [-version]
DESCRIPTION
Protein-Protein BLAST 2.12.0+
Use '-help' to print detailed descriptions of command line arguments
(python3) [u20111230014@cpu10 ~]$ whereis blastp -h
blastp: /home/u20111230014/miniconda3/envs/python3/bin/blastp /opt/app/anaconda3/bin/blastp
Usage:
whereis [options] file
Options:
-b search only for binaries
-B <dirs> define binaries lookup path
-m search only for manuals
-M <dirs> define man lookup path
-s search only for sources
-S <dirs> define sources lookup path
-f terminate <dirs> argument list
-u search for unusual entries
-l output effective lookup paths
For more details see whereis(1).
##安装hmmer
conda install -y hmmer
启动hmmbuild --help
(python3) [u20111230014@cpu10 ~]$ hmmer -h
-bash: hmmer: command not found
(python3) [u20111230014@cpu10 ~]$ hmmbuild --help
Failed to parse command line:
No such option "--help".
Usage: hmmbuild [-options] <hmmfile_out> <msafile>
where basic options are:
-h : show brief help on version and usage
-n <s> : name the HMM <s>
-o <f> : direct summary output to file <f>, not stdout
-O <f> : resave annotated, possibly modified MSA to file <f>
To see more help on other available options, do:
hmmbuild -h
Pfam search
Pfam 数据库中每个编号代表一个蛋白质家族。Pfam 分 A 和 B 两个数据库,其中 A 数据库是经过手工校正的高质量数据库, B 数据库虽然质量低些,依然可以用来寻找蛋白质家族的保守位点。
下载 PFAM 数据库(最新版本为35,这里使用版本33.1)
ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam33.1/Pfam-A.hmm.gz
解压缩 gunzip Pfam-A.hmm.gz
得到 PFAM 数据库的 HMM 文件。HMM 文件是文本文件,需要将其变成二进制格式,以加快运算速度,同时进行压缩
# 建立索引数据库
hmmpress Pfam-A.hmm
[u20111230014@workstation Pfam-A]$ ll
total 3036508
-rw-r--r-- 1 u20111230014 u20111230014 1459135873 Feb 23 12:17 Pfam-A.hmm
-rw-rw-r-- 1 u20111230014 u20111230014 334380860 Feb 23 17:09 Pfam-A.hmm.h3f
-rw-rw-r-- 1 u20111230014 u20111230014 1259976 Feb 23 17:09 Pfam-A.hmm.h3i
-rw-rw-r-- 1 u20111230014 u20111230014 604042224 Feb 23 17:09 Pfam-A.hmm.h3m
-rw-rw-r-- 1 u20111230014 u20111230014 710553501 Feb 23 17:09 Pfam-A.hmm.h3p
lrwxrwxrwx 1 u20111230014 u20111230014 69 Feb 23 20:24 uniprot_sprot_index.fasta -> /home/u20111230014/workspace/genome/uniprot/uniprot_sprot_index.fasta