解析KEGG文件
2018-06-05 本文已影响0人
白云梦_7
https://www.genome.jp/kegg-bin/get_htext?hsa00001+3101
C开头的就是kegg的pathway的ID所在行,D开头的就是属于它的kegg的所有的基因
perl -alne '{if(/^C/){/PATH:hsa(\d+)/;$kegg=$1}else{print "$kegg\t$F[1]" if /^D/ and $kegg;}}' hsa00001.keg >kegg2gene.txt
++++++++++++++++++++++++++++++++
#!usr/bin/perluse warnings;use strict; my ($path, $num);open IN, 'hsa00001.keg';open OUT, '>kegg_sorting'; while (){
chomp;
if (/^C/){
($num)=$_=~/C\s*(\d+).*/;
#print OUT"$num\n";
}
elsif(/^D/){
($path)=$_=~/D\s+(\d+).*/;
print OUT "$num\t$path\n";
}
}