全基因组测序各种分析探索(挖坑,持续更新)

2020-11-27  本文已影响0人  一只烟酒僧

一、先准备一个小数据集(这里用E.coli),将各个过程文件都生成出来

#!/bin/bash
wk_dir=/media/whq/282A932A2A92F3D2/282A932A2A92F3D2/WHQ/WGS_practice
index=/media/whq/282A932A2A92F3D2/282A932A2A92F3D2/WHQ/WGS_practice/index/E.coli_K12_MG1655.fa
genome=/media/whq/282A932A2A92F3D2/282A932A2A92F3D2/WHQ/WGS_practice/genome/E.coli_K12_MG1655.fa
#bwa+samtools
cd $wk_dir/fastq
bwa mem -t 3 -R '@RG\tID:foo\tPL:illumina\tSM:E.coli_K12' $index SRR1770413_1.fastq.gz SRR1770413_2.fastq.gz |samtools sort -@ 3 -o SRR1770413.sorted.bam -
#markdup
gatk MarkDuplicates -I SRR1770413.sorted.bam -O SRR1770413.sorted.markdup.bam -M SRR1770413.sorted.bam.matrix
samtools index SRR1770413.sorted.markdup.bam
#gatk
gatk CreateSequenceDictionary -R ../genome/E.coli_K12_MG1655.fa -O ../genome/E.coli_K12_MG1655.dict
gatk HaplotypeCaller -R $genome --emit-ref-confidence GVCF -I SRR1770413.sorted.markdup.bam -O SRR1770413.g.vcf
gatk GenotypeGVCFs -R $genome -V SRR1770413.g.vcf -O SRR1770413.vcf

文件系统:

image.png
二、统计比对结果
https://www.jianshu.com/p/2107fc22e4a8
https://zhuanlan.zhihu.com/p/56430358
三、统计比对bam文件的深度和覆盖度
https://www.jianshu.com/p/2107fc22e4a8
https://zhuanlan.zhihu.com/p/56430358
四、统计bam上不同区间的深度和覆盖度
https://www.jianshu.com/p/dd9d015e090a?utm_campaign=haruki&utm_content=note&utm_medium=reader_share&utm_source=weixin
image.png
image.png
image.png

BamDeal_Linux statistics Coverage 的帮助文档

 Usage: Coverage  -l  <bam.list>  -r  <Ref.fa>  -o  <outPrefix>
        Usage: Coverage  -i  <A.bam B.bam>  -r  <Ref.fa>  -o  <outPrefix>

                -i    <str>     input SAM/BAM files, delimited by space
                -l    <str>     input list of SAM/BAM files
                -o    <str>     prefix of output file

                -r    <str>     input reference FASTA to get Depth-GC wig
                -w    <int>     windows size for Depth-GC wig, default [10000]
                -b    <str>     list of the regions of which the coverage and mean of depth would be given

                -q    <int>     the quality to filter reads, default [10]
                -d    <int>     Filter the duplicated read

                -h              show more details for help [hewm2008 v1.32]

一些计算coverage的示范操作

 1. Coverage -i <A.bam B.bam>  -r  <Ref.fa>  -o AAA -q Q
                Coverage -l <bam.list>  -r  <Ref.fa>  -o AAA -q Q
                This will generate four files (GC with depth, depth frequency, depth along reads in reference FASTA and basic statistics) and output the result to current directory. Output files are named with the prefix AAA.
                (1.1) the reads with quality lower than Q will be removed from analysis, default value of Q is 10.

                2. Coverage -i <A.bam B.bam>  -r  <Ref.fa>  -o AAA -b <bed.region>
                Coverage -l <bam.list>  -r  <Ref.fa>  -o AAA -b <bed.region>
                This will generate the same outputs as the example above with an extra file giving coverage and mean of depth information for the regions listed in the <bed.region>. The <bed.region> is formatted as [Chr_name  Start_site  End_site]. For example:
                Chr1  1 100
                Chr1  1000 2000
                Chr2  3  50
                ...

计算bam文件的区间覆盖度

mkdir BamDeal.res&&cd BamDeal.res
BamDeal_Linux statistics Coverage -i ../SRR1770413.sorted.bam -r ../../genome/E.coli_K12_MG1655.fa -o ./bamdeal.res -w 1000
#注意,如果输入时个bam的list,那计算的深度是所有bam在该位点的深度和!!!

结果展示


image.png

文件说明

less bamdeal.res.DepthGC.gz |head -4
#Depth  Number
0       47037
1       2660
2       1561


less bamdeal.res.DepthGC.gz |head -10
##Bigin Depth(X)        GC(%)
#>NC_000913.3   WindowsSite:    1000
1       63.95   50.70
1001    58.17   53.20
2001    60.37   51.90
3001    59.04   56.10
4001    62.20   53.20
5001    61.35   47.00
6001    53.03   51.40
7001    54.13   54.00


less bamdeal.res.depthsite.fa.gz |head -2#应该是每个位点的覆盖信息
>NC_000913.3
43 44 46 47 47 47 47 47 48 49 49 49 49 49 49 49 49 50 50 50 50 50 51 52 52 54 54 54 54 54 54 54 54 54 53 53 53 53 52 52 52 52 53 53 52 53 53 53 54 53 53 53 53 53 53 53 53 53 53 53 53 54 54 55 55 54 54 54 54 54 55 55 53 53 53 54 54 53 54 54 54 55 57 57 58 58 58 59 60 61 61 60 60 60 60 61 61 62 62 62 63 64 64 64 64 64 65 65 65 65 65 65 66 66 66 66 66 66 65 65 66 67 67 67 67 68 68 68 68 67 68 68 68 68 67 67 68 69 70 70 71 71 72 72 71 71 71 70 69 69 70 70 69 69 70 72 72 72 

less bamdeal.res.stat
#ID     Length  CoveredBase     TotalDepth      Coverage%       MeanDepth
NC_000913.3     4641652 4594615 267566554       98.99   57.64
#Genome 4641652 4594615 267566554       98.99   57.64

四:画circle图
https://www.jianshu.com/p/799ef8f2bfcb

上一篇 下一篇

猜你喜欢

热点阅读