Rattus norvegicus基因表达探针lncRNA重注释

2019-05-02  本文已影响0人  dming1024

上一次讲的这个大鼠基因表达探针的重注释出现的了个小问题,在重注释的最后一步有这个报错:

intersectBed -a Rat230_2_probe.bed -b Rattus_norvegicus.Rnor_6.0.96.gtf -wa -wb > x.txt
***** WARNING: File Rat230_2_probe.bed has inconsistent naming convention for record:
chr6    108169080   108169105   Rat230_2:1367452_at;    1   -

***** WARNING: File Rat230_2_probe.bed has inconsistent naming convention for record:
chr6    108169080   108169105   Rat230_2:1367452_at;    1   -

报错提示说是chr6这个位点有个不一致的命名,接着我就查看了下这个位点到底是怎么回事

cat Rat230_2_probe.bed|sed -n '/108169080/p'
chr6    108169080   108169105   Rat230_2:1367452_at;    1   -

再查看下整个.bed文件,原来从第一行就开始不一致了

cat Rat230_2_probe.bed|less -SN
      1 chr6    108169080       108169105       Rat230_2:1367452_at;    1       -
      2 chr5    15325895        15325920        Rat230_2:1367452_at;    1       +
      3 chr10   105608591       105608616       Rat230_2:1367452_at;    1       -
      4 chr5    15325937        15325962        Rat230_2:1367452_at;    1       +
      5 chr5    15325986        15326011        Rat230_2:1367452_at;    1       +
      6 chr5    15326001        15326026        Rat230_2:1367452_at;    1       +
      7 chr6    108168877       108168902       Rat230_2:1367452_at;    1       -
      8 chr6    108168798       108168823       Rat230_2:1367452_at;    1       -
      9 chr6    108168774       108168799       Rat230_2:1367452_at;    1       -
     10 chr10   105608337       105608362       Rat230_2:1367452_at;    1       -
     11 chr11   81380587        81380612        Rat230_2:1367452_at;    1       +

只能goggle了,果然是因为染色体命名的方式不同,.bed文件中染色体命名都是以chr开头,而.gtf文件中都是以1,2,3...等命名,这就好办了,将.bed中的chr全部删除

cat Rat230_2_probe.bed|sed 's/chr//' > x_chr.bed
cat x_chr.bed |less -SN
      1 6       108169080       108169105       Rat230_2:1367452_at;    1       -
      2 5       15325895        15325920        Rat230_2:1367452_at;    1       +
      3 10      105608591       105608616       Rat230_2:1367452_at;    1       -
      4 5       15325937        15325962        Rat230_2:1367452_at;    1       +
      5 5       15325986        15326011        Rat230_2:1367452_at;    1       +
      6 5       15326001        15326026        Rat230_2:1367452_at;    1       +
      7 6       108168877       108168902       Rat230_2:1367452_at;    1       -
      8 6       108168798       108168823       Rat230_2:1367452_at;    1       -
      9 6       108168774       108168799       Rat230_2:1367452_at;    1       -
     10 10      105608337       105608362       Rat230_2:1367452_at;    1       -

再使用intersectBed命令就OK了

intersectBed -a x_chr.bed -b Rattus_norvegicus_lincRNA.gtf -wa -wb > Rattus_probe.txt
上一篇下一篇

猜你喜欢

热点阅读