signalp工具+ 我的脚本
2017-04-13 本文已影响154人
keaidelele
之前出现了一个bug
http://stackoverflow.com/questions/38605052/signalp-error-message-cant-locate-fasta-pm-in-inc
使用环境:ubuntu
下载signalp:http://www.cbs.dtu.dk/download/EBC48D58-1F4D-11E7-A2FF-F5A645117D33/
接着解压之后,查看readme,使用说明如下:
- Test SignalP on the 10 eukaryotic sequences shipped with the package: go to
the signalp-4.1 directory on your system and issue the following commands:
> ./signalp -t euk -f short test/euk10.fsa > euk10.fsa.short_out
> ./signalp -t euk -f long test/euk10.fsa > euk10.fsa.long_out
> ./signalp -t euk -f all test/euk10.fsa > euk10.fsa.all_out
> ./signalp -t euk -f summary test/euk10.fsa > euk10.fsa.summary_out
然后我们要修改一下singalp的配置,打开signalp,编辑第14行,改成你自己文件的目录
BEGIN {
#$ENV{SIGNALP} = '/usr/cbs/bio/src/signalp-4.1';
$ENV{SIGNALP} = '/home/huangle/signalp-4.1';
}
我的脚本是批处理序列的脚本,统计cutoff:
import re
import os
filename = 'test.fas'
directory = '/home/huangle/signalp-4.1/'
filepath = directory + filename
wfilepath = directory + filename+'.out'
pfilepath =directory + filename + '.short_out'
seq ={}#key: 'cazyme_name' value: seq
cutoff ={}#key: 'cazyme_name' value: cutoff
#shell exec
os.system('./signalp -t euk -f short '+ filename +' > ' + pfilepath)
#calculate the length of cazyme
f = open(filepath)
line = f.readline()
lenn = 0
line1 = re.split('\s',line)
line1 = line1[0].split('>')
key = line1[1]
tag =0
while line:
line = f.readline()
if line and line[0] == '>':
seq[key] = lenn
lenn=0
line1 = re.split('\s',line)
line1 = line1[0].split('>')
key = line1[1]
else:
if line:
lenn += len(line)-1
seq[key] = lenn
#print seq
f.close()
#calculate cutoff
f = open(pfilepath)
line = f.readline()
line = f.readline()
line = f.readline()
while line:
line = re.split('\s\s+',line)
key = line[0]
n1 = int(line[2])
n2 = int(line[4])
n3 = int(line[6])
MAX = -1
if(n1>n2):
if(n1>n3):
MAX = n1
else:
MAX = n3
else:
if(n2>n3):
MAX = n2
else:
MAX = n3
cutoff[key] = MAX-1
line = f.readline()
#print cutoff
f.close()
#output
fw =open(wfilepath,'w')
title = "#cazyme_name cayzme_length Cleavage Site\n"
fw.write(title)
for (key,value) in seq.items():
fw.write(key + ' ' + str(value) + ' ' + str(cutoff[key]) + '\n')
fw.close()
print "success"
os.system('gedit ' + wfilepath)