一个提取序列的脚本
2016-12-09 本文已影响341人
正踪大米饭儿
由于过程比较复杂,先附上代码,希望有天能写出多个版本~~~
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
my ($list,$indir,$outdir,$help);
GetOptions(
"list|l=s" => \$list,
"indir|i=s" => \$indir,
"outdir|o:s" => \$outdir,
"help|?" => \$help
);
die &usage if ( !defined $list || !defined $indir || defined $help);
$outdir ||= ".";
mkdir $outdir,0755 if (not -e $outdir);
open LST,$list or die "Can't open the filtered gene list file $! \n";
my %lst = map{chomp;(split /\t/,$_)[0],1}<LST>;
close LST;
my %genes;
opendir DIR,$indir or die "Can't open the outDIR of RSEM expression\n";
foreach my $file (sort grep(/^.*\.genes\.results$/,readdir(DIR))) {
open IN,"$indir/$file" or die $!;
while (<IN>) {
chomp;
next if /^gene/;
my @a = split /\t/,$_;
if (exists $lst{$a[0]}) {
my @b = split /,/,$a[1];
for my $b (@b){
$genes{$b} = $a[0];
}
}
}
close IN;
}
closedir DIR;
open OUT,">$outdir/transcript.list";
foreach my $k (sort keys %genes){
print OUT "$k\t$genes{$k}\n";
}
close OUT;
sub usage {
print <<USAGE;
Name:
$0
Description:
merge the out put file of the rsem < \$sample.genes.result >
Update:
2016-12-02 genebang
Uasge:
perl $0 [options]
Options:
-l, --list <string>* list of filtered gene
-i, --indir <string>* input files contain the output of the rsem-exprssion-calular result
-o, --output <string> output file
--help print this help information
e.g
perl $0 -l filter.gene.FPKM.xls -i Test/output -o ./
perl $0 -l filter.gene.FPKM.xls -i Test/output -o ./
perl $0 --list filter.gene.FPKM.xls --indir Test/output --outdir ./
USAGE
exit 0;
}
__END__
该脚本可以实现输入一个列表文件,提取某一目录下所有某一类型文件中的所有子集对应的名称。
示例文件如下:
列表文件:
ENSG00000000003.14
ENSG00000000005.5
ENSG00000000419.12
ENSG00000000457.13
ENSG00000000460.16
文件夹目录列表:
$ ls ./
BC486AA.genes.results
BC486BA.genes.results
BC499AA.genes.results
BC499BA.genes.results
BC505AA.genes.results
BC505BA.genes.results
文件内容:
ENSG00000000003.14 ENST00000373020.8,ENST00000494424.1,ENST00000496771.5,ENST00000612152.4
ENSG00000000005.5 ENST00000373031.4,ENST00000485971.1
ENSG00000000419.12 ENST00000371582.8,ENST00000371584.8,ENST00000371588.9,ENST00000413082.1
ENSG00000000457.13 ENST00000367770.5,ENST00000367771.10,ENST00000367772.8,ENST00000423670.
ENSG00000000460.16 ENST00000286031.10,ENST00000359326.8,ENST00000413811.3,ENST00000459772.
结果输出:
ENST00000000233.9 ENSG00000004059.10
ENST00000000412.7 ENSG00000003056.7
ENST00000000442.10 ENSG00000173153.13
ENST00000001008.5 ENSG00000004478.7
ENST00000001146.6 ENSG00000003137.8
该脚本使用了perl 程序中好多实用的功能。