BBtools/BBmap选项说明2020-12-15

2023-03-02 本文已影响0人土雕艺术家

BBtools网址：
https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/data-preprocessing/

Remove duplicates --------.2dupl.fq.gz
Quality trimming --------.3trim.fq.gz
Normalize coverage --------.4norm.fq.gz

第一步删除重复序列。

Deduplication. Optional; mainly for exome-capture. This is not actually part of RQCFilter because JGI does not typically do exon-capture. Tool: Either Dedupe or DedupeByMapping can be used if you have sufficient memory. If not, there are 3rd-party deduplication tools based on sorting that do not need much memory.

删除重复的reads，压缩参数使用pigz。

clumpify.sh in1=fq1.in.fq.gz in2=fq2.in.fq.gz out1=fq1.out.fq.gz out2=fq2.out.fq.gz pigz dedupe 

--------------------------
Compression Flags

ziplevel=2              (zl) Compression level for zip or gzip output; 1-9.
unpigz=                 Spawn a pigz process for faster decompression. Requires pigz to
                         be installed.  Valid values are t or f; the default varies by program.
pigz=                   Spawn a pigz process for faster compression. Requires pigz to be
                         installed.  Valid values are t, f, or a number; the default varies by
                         program.  "pigz=X" will enable pigz, and also force all pigz
                         processes to use exactly X threads.

Deduplication parameters:
dedupe=f            Remove duplicate reads.  For pairs, both must match.
                    By default, deduplication does not occur.
                    If dedupe and markduplicates are both false, none of
                    the other duplicate-related flags will have any effect.

第二步裁剪低质量碱基

bbduk.sh in1=fq1.in.fq.gz in2=fq2.in.fq.gz out1=fq1.out.fq.gz out2=fq2.out.fq.gz  \
         pigz ordered qtrim=rl trimq=20 minlen=100 ecco=t maxns=5  \
         trimpolya=10 trimpolyg=10 trimpolyc=10 

ordered=t   
Set to true to output reads in same order as input.

qtrim=rl 
means it will trim the right and left side

trimq=20  keep Q >20
This will quality-trim to Q20 using the Phred algorithm, which is more accurate than naive trimming. 

minlen=100   Length filtering
This will discard reads shorter than 100bp after trimming to Q20

ecco=t   Correction by overlapping
True For overlapping paired reads only.  Performs error-correction with BBMerge prior to kmer operations.

maxns=5
reads with more Ns than 5 (after trimming) will be discarded.

Polymer trimming:
trimpolya=0         If greater than 0, trim poly-A or poly-T tails of
                    at least this length on either end of reads.
trimpolygleft=0     If greater than 0, trim poly-G prefixes of at least this
                    length on the left end of reads.  Does not trim poly-C.
trimpolygright=0    If greater than 0, trim poly-G tails of at least this 
                    length on the right end of reads.  Does not trim poly-C.
trimpolyg=0         This sets both left and right at once.
filterpolyg=0       If greater than 0, remove reads with a poly-G prefix of
                    at least this length (on the left).
Note: there are also equivalent poly-C ·.

第三步降低覆盖度以及去除低覆盖read

bbnorm.sh in1=fq1.in.fq.gz in2=fq2.in.fq.gz out1=fq1.out.fq.gz out2=fq2.out.fq.gz  \
               target=10 min=2 histcol=2 khist=khist.txt peaks=peaks.txt
Normalization parameters:
fixspikes=f         (fs) Do a slower, high-precision bloom filter lookup of kmers that appear to have an abnormally high depth due to collisions.
target=100          (tgt) Target normalization depth.  NOTE:  All depth parameters control kmer depth, not read depth.
                    For kmer depth Dk, read depth Dr, read length R, and kmer size K:  Dr=Dk*(R/(R-K+1))
maxdepth=-1         (max) Reads will not be downsampled when below this depth, even if they are above the target depth.            
mindepth=5          (min) Kmers with depth below this number will not be included when calculating the depth of a read.
minkmers=15         (mgkpr) Reads must have at least this many kmers over min depth to be retained.  Aka 'mingoodkmersperread'.
percentile=54.0     (dp) Read depth is by default inferred from the 54th percentile of kmer depth, but this may be changed to any number 1-100.
uselowerdepth=t     (uld) For pairs, use the depth of the lower read as the depth proxy.
deterministic=t     (dr) Generate random numbers deterministically to ensure identical output between multiple runs.  May decrease speed with a huge number of threads.
passes=2            (p) 1 pass is the basic mode.  2 passes (default) allows greater accuracy, error detection, better contol of output depth.

Histogram parameters:
hist=<file>         Specify a file to write the input kmer depth histogram.
histout=<file>      Specify a file to write the output kmer depth histogram.
histcol=3           (histogramcolumns) Number of histogram columns, 2 or 3.
pzc=f               (printzerocoverage) Print lines in the histogram with zero coverage.
histlen=1048576     Max kmer depth displayed in histogram.  Also affects statistics displayed, but does not affect normalization.

Peak calling parameters:
peaks=<file>        Write the peaks to this file.  Default is stdout.
minHeight=2         (h) Ignore peaks shorter than this.
minVolume=5         (v) Ignore peaks with less area than this.
minWidth=3          (w) Ignore peaks narrower than this.
minPeak=2           (minp) Ignore peaks with an X-value below this.
maxPeak=BIG         (maxp) Ignore peaks with an X-value above this.
maxPeakCount=8      (maxpc) Print up to this many peaks (prioritizing height).

BBtools/BBmap选项说明2020-12-15

第一步删除重复序列。

第二步裁剪低质量碱基

猜你喜欢

热点阅读