求 bam文件单个位点碱基比例

2019-07-01 本文已影响0人 caokai001

IGV 上看碱基分布

目的：给你一个bam 文件，如何查看一个位点的碱基情况，直接IGV 就ok,如果需要看多个点，就需要使用软件了。(一般情况vcf DP4会告诉此位置ref base和非ref base 碱基个数)

在群里面大佬提示下，samtools mpileup 和bam-readcount 可以实现统计碱基分布。

1.简单的安装

conda install -c bioconda bam-readcount

2.bam-readcount 帮助信息

[kcao@h1-lgl genome_index_bowtie2]$ bam-readcount -h
Usage: bam-readcount [OPTIONS] <bam_file> [region]
Generate metrics for bam_file at single nucleotide positions.
Example: bam-readcount -f ref.fa some.bam

Available options:
  -h [ --help ]                         produce this message
  -v [ --version ]                      output the version number
  -q [ --min-mapping-quality ] arg (=0) minimum mapping quality of reads used 
                                        for counting.
  -b [ --min-base-quality ] arg (=0)    minimum base quality at a position to 
                                        use the read for counting.
  -d [ --max-count ] arg (=10000000)    max depth to avoid excessive memory 
                                        usage.
  -l [ --site-list ] arg                file containing a list of regions to 
                                        report readcounts within.
  -f [ --reference-fasta ] arg          reference sequence in the fasta format.
  -D [ --print-individual-mapq ] arg    report the mapping qualities as a comma
                                        separated list.
  -p [ --per-library ]                  report results by library.
  -w [ --max-warnings ] arg             maximum number of warnings of each type
                                        to emit. -1 gives an unlimited number.
  -i [ --insertion-centric ]            generate indel centric readcounts. 
                                        Reads containing insertions will not be
                                        included in per-base counts

注：-w 限制warnings 输出的条数，不然好多warning!
      -p 碱基质量  1
      -q 比对质量 20

3.测试效果

github文档

[kcao@h1-lgl dup]$ bam-readcount -w 1 -f /home/kcao/data/Genome/genome_index_bowtie2/Homo_sapiens.GRCh38.dna.chromosome.fa /home/kcao/test/HHHHHHHHHHHH/SRR6213130.uniq.bam  1:89085865-89085865|sed -n '$p' >tmpjjj
Minimum mapping quality is set to 0
WARNING: In read SRR6213130.15145147: Couldn't find single-end mapping quality. Check to see if the SM tag is in BAM.
The previous warning has been emitted 1 times and will be disabled.

输出结果：
1   89085886    G   43  =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  G:43:60.00:37.51:2.79:33:10:0.43:0.01:15.51:33:0.63:101.00:0.56 T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00

tips: 列名依次是，染色体，位置，参考碱基，深度，“=”（没理解等号含义），“A”，“C”，“G”，T“”，“N”

4 统计 A,G,C,T 碱基个数

[kcao@h1-lgl dup]$ less tmpjjj |awk 'BEGIN{FS=OFS="\t"}{split($6,A,":");split($7,C,":");split($8,G,":");split($9,T,":");print $1,$2,$3,$4,"A:"A[2],"C:"C[2],"G:"G[2],"T:"T[2]}'


1   89085886    G   43  A:0 C:0 G:43    T:0

5.使用参数-l region_list 查看多个位点碱基分布

[kcao@h1-lgl dup]$ cat region_list 
1   89085886    89085886
1   89085975    89085975

[kcao@h1-lgl dup]$ bam-readcount -w 1 -f /home/kcao/data/Genome/genome_index_bowtie2/Homo_sapiens.GRCh38.dna.chromosome.fa /home/kcao/test/HHHHHHHHHHHH/SRR6213130.uniq.bam  -l region_list|awk 'BEGIN{FS=OFS="\t"}{split($6,A,":");split($7,C,":");split($8,G,":");split($9,T,":");print $1,$2,$3,$4,"A:"A[2],"C:"C[2],"G:"G[2],"T:"T[2]}' >test.txt

[kcao@h1-lgl dup]$ cat test.txt 
1   89085886    G   43  A:0 C:0 G:43    T:0
1   89085975    C   64  A:0 C:64    G:0 T:0

有兴趣的可以试试samtools mpileup 结果~~
samtools mpileup结果
有时候，IGV reads 数目和bam-readcount 不对应，可能因为标记了dup 的缘故.

求 bam文件单个位点碱基比例

1.简单的安装

2.bam-readcount 帮助信息

3.测试效果

4 统计 A,G,C,T 碱基个数

5.使用参数-l region_list 查看多个位点碱基分布

猜你喜欢

热点阅读

求 bam文件单个位点碱基比例

1.简单的安装

2.bam-readcount 帮助信息

3.测试效果

4 统计 A,G,C,T 碱基个数

5.使用 参数-l region_list 查看多个位点碱基分布

猜你喜欢

热点阅读

5.使用参数-l region_list 查看多个位点碱基分布