二代全基因组分析上游分析

2022-05-14 本文已影响0人日月其除

找到两个参考网站，也有一些迷惑的地方，我用的第二个网站
这里主要讨论上游分析，下游分析网站中也有对应的脚本，但是这里不过多讨论
网站一：
https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/#somatic-variant-calling-workflow

主要分为三个大部
1.Genome Alignment
2 .Alignment Co-Cleaning
3.Somatic Variant Calling
第一部分：

bamtofastq把bam转为fq
bwa mem 比对到参考基因组
3）使用picard 对bam文件sort
4）使用picard merge bam，这里我个人认为是把同个样本的bam merge在一起
5）使用picard 去重复(picard.jar MarkDuplicates)
第二部分：
就是这里开始看不懂了
这一部分三个脚本使用了
GenomeAnalysisTK.jar中四个功能RealignerTargetCreator， IndelRealigner，BaseRecalibrator，PrintReads。
但是 GenomeAnalysisTK.jar这个java文件我就没找到。然后再gatk中是否能找到。

其中PrintReads在gatk中就是Print reads in the SAM/BAM/CRAM file
gatk中BaseRecalibrato的功能是 Generates recalibration table for Base Quality Score Recalibration (BQSR)
然后IndelRealigner和RealignerTargetCreator这个功能在gatk中没找到。
Google了一下好像是gatk 4没有这个脚本，参考别人的提问：https://www.biostars.org/p/339650/
而且pipline应该是更新了

后续我在gatk的官网中找到个另一个pipline
参考网址：https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows

image.png

首先还是对mapping，然后对bam文件做各种矫正以及filter的处理。点击进入Data pre-processing for variant discovery，可以看到一套更为简洁的pipline

image.png
但是这个流程没提供具体的代码，需要自己点击进去查询。
但是有热心网友整理好了代码，这个流程参考这个网址：https://www.jianshu.com/p/3a146573a239
以及：全基因组重测序流程【超细致！！】 - 简书 (jianshu.com)

后续分析call variation以及过滤还有注释，这个网站也有针对不同情况推荐的软件。

二代全基因组分析上游分析

猜你喜欢

热点阅读