修改BAM文件中的染色体名字
2021-09-16 本文已影响0人
生信菜菜鸟
在处理ChIP-seq数据的时候,遇到这样一个问题:前期我的FASTA和GTF文件中染色体的名字是1,2,3......22这样的,但后期某些分析过程要求染色体名称必须以chr开头,比如用来找enhancer/super enhancer的ROSE。这时候就需要对BAM文件进行修改,具体方法如下:
samtools view -H ${id}_input.deduplicate.bam | sed -e 's/SN:\([0-9XY]\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | samtools reheader - ${id}_input.deduplicate.bam > ${id}_input.deduplicate.chr.bam
只需要修改BAM文件的header部分就可以,方便又快捷。关于为什么只需要修改BAM文件的header,我在biostar上看到有个回答非常棒,引用一下:
samtools reheader <in.header.sam> <in.bam>
Replace the header in in.bam with the header in in.header.sam. This command is much faster than replacing the header with a BAM→SAM→BAM conversion.
For those reading this and wondering, "but what about the chromosome names for each read?!?", the answer is that those names aren't actually stored in a BAM file. Rather, alignments have chromosome index number associated with them and the name you see when you use samtools view is taken from the header.