2023-07-18提取fasta文件id并计算碱基数量
2023-07-17 本文已影响0人
麦冬花儿
[train@MiWiFi-R3P-srv 16.scripts]$ head ~/04.genome_assembling/IDBA/illumina.fasta
>SRR2131197.2 2 length=100
GGCTCACACAGATATCGCAGAAAGCGCCCGGTGGTCACGTCCCATAACTTGACAAGGCCATCCGAGCCACCCGTGACCATGTAGCGGTCGTTCAGTTGC
>SRR2131197.2 2 length=100
CCATGTTCCAAGGCTATACGCATGTGGTTGCCCACTTGCAGCTCTTCGGCGATATGTTAGCAACGGGAAGCAGTGACGGCCGCGTGCTTGTGTATTCGCT
>SRR2131197.4 4 length=100
ACGAGTCACAATGCCCGTGCCACGCGGCAGAAAGTCGCGGCCGACAATGTTCTCCAGCACACTACTCTTGCCGGACGACTGGCTACCGAGCACCGTGAT
>SRR2131197.4 4 length=100
TAACTCTCCCCCTCCGGGGGCCTCAGAGCTTGTGAATAAGGTGCGTGCGATGTCGGCTAACAGCGCAGCTGCAGGACGCCTTCCATGACGTACGTGAGAG
>SRR2131197.6 6 length=96
GGTAGTCATAGTAGGAGTAGTAGTGATAGTAGGAGTCATATTGATAGTCATGGTATTAGTAATAATAATAATAGTAGTAATACTCATAGTGGAAG
脚本如下
#!/usr/bin/perl
open IN, "<", $ARGV[0] or die "Can not open the file $ARGV[0], $!";
while ( <IN> ) {
if ( s/^>// ) {
s/\s.*\n//;
print "$seq_name\t$length\n";
$length = 0;
$seq_name = $_;
}
else {
$length += length($_) - 1;
}
}
print "$seq_name\t$length\n";
[train@MiWiFi-R3P-srv 16.scripts]$ perl c2.pl ~/04.genome_assembling/IDBA/illumina.fasta | head
SRR2131197.2 99
SRR2131197.2 100
SRR2131197.4 99
SRR2131197.4 100
SRR2131197.6 95
SRR2131197.6 96
SRR2131197.7 99
SRR2131197.7 100
SRR2131197.8 99