估计很多人都像我一样一直没弄懂关于samtools flag 0
Question: In Sam Format, Clarify The Meaning Of The "0" Flag.
@Jts, can you comment on Devon Ryan's comment in the answer to my question? I actually realize I agree with him on it that
FLAG 0 only mean it's not reverse complemented, not necessarily mapped to forward/+ strand.
A flag field of 0 may or may not mean that a read is actually mapped to the + strand. All it means is that it wasn't reverse complemented. Whether that means it should be assigned to one strand or the other or none at all is dependent on other factors and isn't encoded in BAM files. As an example,
most common single-end RNAseq experiments use a stranded protocol wherein alignments with a flag of 0 come from the "-" strand.
Link is here: SAM flag and select reads that map uniquely
Question: SAM flag and select reads that map uniquely
- 
Hi Devon, I've spend a lot of time this week trying to REALLY understand this strand, flag, mapping uniqueness issue. I think I'm finally getting my head around it and I have to say I disagree with your first point here on flag 0 meaning. I will outline my thoughts below and please feel free to comment on them. Thanks so much for all the discussions so far. 
- 
I think a flag field of 0 does mean a read is mapped to the +/forward strand. And flag field indicates where the read itself maps to, between + and - strand of the reference genome. 
- 
For dUTP-based stranded RNA-seqexperiments (Illumina TruSeq stranded), alignments withflag of 0does mean thetranscript/gene is on the -/reverse strandas you said, but at the same time,the read itself indeed maps to the +/forward strand, which is the antisense strand for a gene on the -/reverse strand. The"-"strand you mentioned is on theXS:A:-field (in Tophat or hisat2 output) and it's for whichstrand a transcript/gene(from which the read is derived) is on.It means the read is derived from a transcript/gene on the - strand (although it still maps to the + strand of the genome).
- 
Lastly, flag valuedoesn't tell if a read maps uniquely or not. I need to look forNH:i:field to determine if a read is uniquely mapped.
samtools view bamfile.bam | awk '$2==0'  #will output all the primary alignments mapped to the + strand 
samtools view bamfile.bam | awk '$2==16'  #will output all the primary alignments mapped to the - strand
- 
all the primary alignments contain uniquely mapped and primary alignments from multi-mapping reads. 
- 
uniquely mapped in the above sentence is determined by the aligner and I'm not sure what MAPQ value they have yet. 
- 
I think I did here in the end of the 3rd paragraph: 
- 
It means the read is derived from a transcript/gene on the - strand (although it still maps to the + strand of the genome).
- 
I agree that for this particular RNA-seq strand thing, we care about where the transcript is located. 
最终我选择了手动提取
samtools view -h $bam | awk 'length($6) == 3 && $$2~/0/ && $5~/255/ || $0 ~/@/' \
  | samtools view -bS | samtools sort -@ 8 > ${sample}_uniq_sorted.bam


