GTAK4分析篇 -- 1.数据前处理
2019-12-10 本文已影响0人
面具男女
最新版的GATK4跟之前GATK2/3用法上很大的不同,下面是改版后的代码
1.创建bwa比对的index
bwa index -a bwtsw -p hg38 /home/NGS/gatk_ref/Homo_sapiens_assembly38.fasta
2.fastqc测序质量监控
fastqc /home/NGS/raw_data/A921C10.R1.fastq.gz /home/NGS/raw_data/A921C10.R2.fastq.gz -t 2 --outdir /home/NGS/1.fastqc/
3.去除低质量reads和接头
trim_galore -q 30 --phred33 --length 100 --stringency 3 --paired -o /home/NGS/2.trim-galore/ /home/NGS/raw_data/A921C10.R1.fastq.gz /home/NGS/raw_data/A921C10.R2.fastq.gz
4.比对参考基因组 + 按坐标排序
bwa mem /home/NGS/bwa_index/human /home/NGS/raw_data/A921C10.R1_trimmed.fq.gz /home/NGS/raw_data/A921C10.R2_trimmed.fq.gz -t 4 -R "@RG\tID:A921C10\tSM:A921C10\tLB:WGS\tPL:Illumina" | samtools sort -@ 4 -O BAM -o A921C10.bam
5.创建fasta的dict(用于GATK)
/home/gatk-4.1.4.1/gatk CreateSequenceDictionary -R /home/NGS/gatk_ref/Homo_sapiens_assembly38.fasta -O Homo_sapiens_assembly38.dict
6.创建fasta的fai(用于GATK)
samtools faidx /home/NGS/gatk_ref/Homo_sapiens_assembly38.fasta
7.统计比对结果
java -jar /home/picard/picard/build/libs/picard.jar CollectAlignmentSummaryMetrics R=/home/NGS/gatk_ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa I=/home/NGS/3.bwa+sort/A921C10.bam O=./alignment_metrics.txt
java -jar /home/picard/picard/build/libs/picard.jar CollectInsertSizeMetrics INPUT=/home/NGS/3.bwa+sort/A921C10.bam OUTPUT=./insert_metrics.txt HISTOGRAM_FILE=insert_size_histogram.pdf
samtools depth -a /home/NGS/3.bwa+sort/A921C10.bam > depth_out.txt
8.标记重复序列(PCR)
java -jar /home/picard/picard/build/libs/picard.jar MarkDuplicates I=/home/NGS/3.bwa+sort/A921C10.bam M=./metrics.txt O=./A921C10_dedup.bam
9.修正且确保读取和匹配是同步的
java -jar /home/picard/picard/build/libs/picard.jar FixMateInformation I=/home/NGS/3.bwa+sort/A921C10.bam O=./A921C10_dedup_fix.bam ADD_MATE_CIGAR=true
10.校正碱基质量得分
input="/home/NGS/5.MarkDuplicates/A921C10_dedup.bam"
output="/home/NGS/7.Base_QualityScore_Recalibration"
ref="/home/NGS/gatk_ref/Homo_sapiens_assembly38.fasta"
snp="/home/NGS/gatk_ref/dbsnp_146.hg38.vcf"
indel="/home/NGS/gatk_ref/Mills_and_1000G_gold_standard.indels.hg38.vcf"
gatk="/home/gatk-4.1.4.1/gatk"
$gatk BaseRecalibrator -R $ref -I $input -O $ouput/bqsr.table --known-sites $snp --known-sites $indel
echo -e "\n\n\nBaseRecalibrator has finished!!!!!!!!\n\n\n"
$gatk ApplyBQSR -R $ref -I $input -bqsr-recal-file bqsr.table -O $ouput/A921C10_dedup_fix_bqsr.bam
echo -e "\n\n\nApplyBQSR has finished!!!!!!!!\n\n\n"
注意:所用的fasta和vcf最好在同一个平台下载,实践时出现了一个错误:fasta的染色体标记和vcf不一样。有的是1、2、Y、X、M,而有的是chr1、chr2、chrX、chrY、chrM,比对是就会没有overlap的reads