生物信息学习生物信息学Perl

一个提取序列的脚本

2016-12-09  本文已影响341人  正踪大米饭儿

由于过程比较复杂,先附上代码,希望有天能写出多个版本~~~

#!/usr/bin/perl -w
use strict;
use Getopt::Long;

my ($list,$indir,$outdir,$help);

GetOptions(
    "list|l=s"   => \$list,
    "indir|i=s"  => \$indir,
    "outdir|o:s" => \$outdir,
    "help|?"     => \$help
);

die &usage if ( !defined $list || !defined $indir || defined $help);
$outdir ||= ".";
mkdir $outdir,0755 if (not -e $outdir);


open LST,$list or die "Can't open the filtered gene list file $! \n";
my %lst = map{chomp;(split /\t/,$_)[0],1}<LST>;
close LST;

my %genes;
opendir DIR,$indir or die "Can't open the outDIR of RSEM expression\n";
foreach my $file (sort grep(/^.*\.genes\.results$/,readdir(DIR))) {
    open IN,"$indir/$file" or die $!;
    while (<IN>) {
        chomp;
        next if /^gene/;

        my @a = split /\t/,$_;
        if (exists $lst{$a[0]}) {
            my @b = split /,/,$a[1];
            for my $b (@b){
                $genes{$b} = $a[0];
            }
        }
    }
    close IN;
}
closedir DIR;

open OUT,">$outdir/transcript.list";
foreach my $k (sort keys %genes){
    print OUT "$k\t$genes{$k}\n";
}
close OUT;


sub usage {
    print <<USAGE;
Name:
    $0
Description:
    merge the out put file of the rsem < \$sample.genes.result >
Update:
    2016-12-02  genebang
Uasge:
    perl $0 [options] 
Options:
    -l, --list    <string>*    list of filtered gene 
    -i, --indir   <string>*    input files contain the output of the rsem-exprssion-calular result
    -o, --output  <string>     output file
    --help                     print this help information
e.g
    perl $0 -l filter.gene.FPKM.xls -i Test/output -o ./ 
    perl $0 -l filter.gene.FPKM.xls -i Test/output -o ./ 
    perl $0 --list filter.gene.FPKM.xls --indir Test/output --outdir ./  
USAGE
    exit 0;
}

__END__

该脚本可以实现输入一个列表文件,提取某一目录下所有某一类型文件中的所有子集对应的名称。
示例文件如下:

列表文件:

ENSG00000000003.14   
ENSG00000000005.5      
ENSG00000000419.12    
ENSG00000000457.13    
ENSG00000000460.16    

文件夹目录列表:

$ ls ./
BC486AA.genes.results
BC486BA.genes.results
BC499AA.genes.results
BC499BA.genes.results
BC505AA.genes.results
BC505BA.genes.results

文件内容:

ENSG00000000003.14      ENST00000373020.8,ENST00000494424.1,ENST00000496771.5,ENST00000612152.4
ENSG00000000005.5       ENST00000373031.4,ENST00000485971.1 
ENSG00000000419.12      ENST00000371582.8,ENST00000371584.8,ENST00000371588.9,ENST00000413082.1
ENSG00000000457.13      ENST00000367770.5,ENST00000367771.10,ENST00000367772.8,ENST00000423670.
ENSG00000000460.16      ENST00000286031.10,ENST00000359326.8,ENST00000413811.3,ENST00000459772.

结果输出:

ENST00000000233.9       ENSG00000004059.10
ENST00000000412.7       ENSG00000003056.7
ENST00000000442.10      ENSG00000173153.13
ENST00000001008.5       ENSG00000004478.7
ENST00000001146.6       ENSG00000003137.8

该脚本使用了perl 程序中好多实用的功能。

上一篇下一篇

猜你喜欢

热点阅读