转录组数据分析

SAMtools——bam文件排序

2022-02-04  本文已影响0人  Wei_Sun

bam文件在进行后续分析前,需要进行排序,samtools的安装见文章:
sam文件转换为bam文件——SAMtools - 简书 (jianshu.com)

1. samtools sort 基础命令:

$ samtools sort
Usage: samtools sort [options...] [in.bam]
Options:
  -l INT     Set compression level, from 0 (uncompressed) to 9 (best)
  -u         Output uncompressed data (equivalent to -l 0)
  -m INT     Set maximum memory per thread; suffix K/M/G recognized [768M]
  -M         Use minimiser for clustering unaligned/unplaced reads
  -K INT     Kmer size to use for minimiser [20]
  -n         Sort by read name (not compatible with samtools index command)
  -t TAG     Sort by value of TAG. Uses position as secondary index (or read name if -n is set)
  -o FILE    Write final output to FILE rather than standard output
  -T PREFIX  Write temporary files to PREFIX.nnnn.bam
  --no-PG    do not add a PG line
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
  -@, --threads INT
               Number of additional threads to use [0]
      --write-index
               Automatically index the output files [off]
      --verbosity INT
               Set level of verbosity

2. 排序:

默认是按序列在fasta文件中的顺序(即header)和序列从左往右的位点排序。

$ samtools sort -@8 LPF1_R1_MP.bam -o LPF1_R1_MP.sort.bam 
# 查看bam文件
$ samtools view LPF1_R1_MP.sort.bam

-@8:8个线程
-o:输出文件

按read name排序:

$ samtools sort -@8 -n LPF1_R1_MP.bam -o LPF1_R1_MP.name.sort.bam 

这里发现,原始的.bam文件,和.sort.bam以及.name.sort.bam文件的大小不一致,并且.sort.bam小很多,检查三个文件的行数:

$ samtools view -c LPF1_R1_MP.bam
44038570
$ samtools view -c LPF1_R1_MP.sort.bam
44038570
$ samtools view -c LPF1_R1_MP.name.sort.bam
44038570

行数一致,没有问题。常用的是默认排序,即按染色体顺序进行排序。

如果是1.9版本的SAMtools可以参考这篇文章:
https://www.jianshu.com/p/6b7a442d293f
引用转载请注明出处,如有错误敬请指出。

上一篇 下一篇

猜你喜欢

热点阅读