2023-07-18提取fasta文件id并计算碱基数量

2023-07-17  本文已影响0人  麦冬花儿
[train@MiWiFi-R3P-srv 16.scripts]$ head ~/04.genome_assembling/IDBA/illumina.fasta
>SRR2131197.2 2 length=100
GGCTCACACAGATATCGCAGAAAGCGCCCGGTGGTCACGTCCCATAACTTGACAAGGCCATCCGAGCCACCCGTGACCATGTAGCGGTCGTTCAGTTGC
>SRR2131197.2 2 length=100
CCATGTTCCAAGGCTATACGCATGTGGTTGCCCACTTGCAGCTCTTCGGCGATATGTTAGCAACGGGAAGCAGTGACGGCCGCGTGCTTGTGTATTCGCT
>SRR2131197.4 4 length=100
ACGAGTCACAATGCCCGTGCCACGCGGCAGAAAGTCGCGGCCGACAATGTTCTCCAGCACACTACTCTTGCCGGACGACTGGCTACCGAGCACCGTGAT
>SRR2131197.4 4 length=100
TAACTCTCCCCCTCCGGGGGCCTCAGAGCTTGTGAATAAGGTGCGTGCGATGTCGGCTAACAGCGCAGCTGCAGGACGCCTTCCATGACGTACGTGAGAG
>SRR2131197.6 6 length=96
GGTAGTCATAGTAGGAGTAGTAGTGATAGTAGGAGTCATATTGATAGTCATGGTATTAGTAATAATAATAATAGTAGTAATACTCATAGTGGAAG

脚本如下

#!/usr/bin/perl

open IN, "<", $ARGV[0] or die "Can not open the file $ARGV[0], $!";
while ( <IN> ) {
    if ( s/^>// ) {
        s/\s.*\n//;
        print "$seq_name\t$length\n";
        $length = 0;
        $seq_name = $_;
    }
    else {
        $length += length($_) - 1;
    }
}
print "$seq_name\t$length\n";
[train@MiWiFi-R3P-srv 16.scripts]$ perl c2.pl ~/04.genome_assembling/IDBA/illumina.fasta  | head
    
SRR2131197.2    99
SRR2131197.2    100
SRR2131197.4    99
SRR2131197.4    100
SRR2131197.6    95
SRR2131197.6    96
SRR2131197.7    99
SRR2131197.7    100
SRR2131197.8    99
上一篇下一篇

猜你喜欢

热点阅读