Biostar Handbook学习小组

测序与质控

2017-12-12  本文已影响67人  pearlp

一、测序数据处理

1、bcl2fastq: converting the per-cycle base calls (stored in BCL files) into fastq format. (v. 1.8.4 for older sequencers, v. 2.18.x for newer sequencers),can be obtained from Illumina's portal (iCom).

2、质量评估可视化——FastQC

fastqc illumina.fq

3、质量评估QC tools:Trimmomatic, BBDuk, flexbar and cutadapt

Trimmomatic单端SE:trimmomatic SE SRR1553607_2.fastq trimmed_2.fq SLIDINGWINDOW:4:30

BBDuk单端:bbduk.sh in=SRR1553607_2.fastq out=bbduk.fq qtrim=r overwrite=true qtrim=30

Trimmomatic双端PE:trimmomatic PE SRR1553607_1.fastq SRR1553607_2.fastq trimmed_1.fq unpaired_1.fq trimmed_2.fq unpaired_2.fq SLIDINGWINDOW:4:30

BBDuk双端:bbduk.sh in1=SRR1553607_1.fastq in2=SRR1553607_2.fastq outm1=bbduk_1.fq out1=unpaired_bb_1.fq outm2=bbduk_2.fq out2=unpaired_bb_2.fq qtrim=r overwrite=true qtrim=30

4、trim adapters

Illumina Universal Adapter:AGATCGGAAGAG

Illumina Small RNA 3' Adapter:TGGAATTCTCGG

Illumina Small RNA 5'Adapter:GATCGTCGGACT

Nextera Transposase Sequence:CTGTCTCTTATA

SOLID Small RNA Adapter:CGCCTTGGCCGT

5、sequence duplication:picard MarkDuplicates

6、multiqc 结合多个样本的qc文件

使用conda 安装multiqc后升级,pip install --upgrade multiqc

multiqc illumina_fastqc iontorrent_fastqc

如果multiqc 遇到问题,可安装低版本的networkx:source activate multiqc;conda install networkx=1.11

7、merge

通常要求双端的reads length和要小于测序片段,FLASH (Fast Length Adjustment of SHort reads)可以merge 双端的reads length和大于测序片段。

使用flash和bbtools进行merge比较

安装flash:conda install flash -y

8、AfterQC:需要安装在python2.7的环境下

9、error correction

without knowing the reference genome by computing the so-called k-mer density of the data.The bbmap package includes the tadpole.sh error corrector:

  tadpole.sh in=SRR519926_1.fastq out=tadpole.fq mode=correct out=r1.fq out2=r2.fq overwrite=true

此外还可以用bfc:bfc SRR519926_1.fastq > bfc.fq

上一篇下一篇

猜你喜欢

热点阅读