BBtools/BBmap选项说明2020-12-15
2023-03-02 本文已影响0人
土雕艺术家
BBtools网址:
https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/data-preprocessing/
Remove duplicates --------.2dupl.fq.gz
Quality trimming --------.3trim.fq.gz
Normalize coverage --------.4norm.fq.gz
第一步删除重复序列。
Deduplication. Optional; mainly for exome-capture. This is not actually part of RQCFilter because JGI does not typically do exon-capture. Tool: Either Dedupe or DedupeByMapping can be used if you have sufficient memory. If not, there are 3rd-party deduplication tools based on sorting that do not need much memory.
删除重复的reads,压缩参数使用pigz。
clumpify.sh in1=fq1.in.fq.gz in2=fq2.in.fq.gz out1=fq1.out.fq.gz out2=fq2.out.fq.gz pigz dedupe
--------------------------
Compression Flags
ziplevel=2 (zl) Compression level for zip or gzip output; 1-9.
unpigz= Spawn a pigz process for faster decompression. Requires pigz to
be installed. Valid values are t or f; the default varies by program.
pigz= Spawn a pigz process for faster compression. Requires pigz to be
installed. Valid values are t, f, or a number; the default varies by
program. "pigz=X" will enable pigz, and also force all pigz
processes to use exactly X threads.
Deduplication parameters:
dedupe=f Remove duplicate reads. For pairs, both must match.
By default, deduplication does not occur.
If dedupe and markduplicates are both false, none of
the other duplicate-related flags will have any effect.
第二步裁剪低质量碱基
bbduk.sh in1=fq1.in.fq.gz in2=fq2.in.fq.gz out1=fq1.out.fq.gz out2=fq2.out.fq.gz \
pigz ordered qtrim=rl trimq=20 minlen=100 ecco=t maxns=5 \
trimpolya=10 trimpolyg=10 trimpolyc=10
ordered=t
Set to true to output reads in same order as input.
qtrim=rl
means it will trim the right and left side
trimq=20 keep Q >20
This will quality-trim to Q20 using the Phred algorithm, which is more accurate than naive trimming.
minlen=100 Length filtering
This will discard reads shorter than 100bp after trimming to Q20
ecco=t Correction by overlapping
True For overlapping paired reads only. Performs error-correction with BBMerge prior to kmer operations.
maxns=5
reads with more Ns than 5 (after trimming) will be discarded.
Polymer trimming:
trimpolya=0 If greater than 0, trim poly-A or poly-T tails of
at least this length on either end of reads.
trimpolygleft=0 If greater than 0, trim poly-G prefixes of at least this
length on the left end of reads. Does not trim poly-C.
trimpolygright=0 If greater than 0, trim poly-G tails of at least this
length on the right end of reads. Does not trim poly-C.
trimpolyg=0 This sets both left and right at once.
filterpolyg=0 If greater than 0, remove reads with a poly-G prefix of
at least this length (on the left).
Note: there are also equivalent poly-C ·.
第三步降低覆盖度以及去除低覆盖read
bbnorm.sh in1=fq1.in.fq.gz in2=fq2.in.fq.gz out1=fq1.out.fq.gz out2=fq2.out.fq.gz \
target=10 min=2 histcol=2 khist=khist.txt peaks=peaks.txt
Normalization parameters:
fixspikes=f (fs) Do a slower, high-precision bloom filter lookup of kmers that appear to have an abnormally high depth due to collisions.
target=100 (tgt) Target normalization depth. NOTE: All depth parameters control kmer depth, not read depth.
For kmer depth Dk, read depth Dr, read length R, and kmer size K: Dr=Dk*(R/(R-K+1))
maxdepth=-1 (max) Reads will not be downsampled when below this depth, even if they are above the target depth.
mindepth=5 (min) Kmers with depth below this number will not be included when calculating the depth of a read.
minkmers=15 (mgkpr) Reads must have at least this many kmers over min depth to be retained. Aka 'mingoodkmersperread'.
percentile=54.0 (dp) Read depth is by default inferred from the 54th percentile of kmer depth, but this may be changed to any number 1-100.
uselowerdepth=t (uld) For pairs, use the depth of the lower read as the depth proxy.
deterministic=t (dr) Generate random numbers deterministically to ensure identical output between multiple runs. May decrease speed with a huge number of threads.
passes=2 (p) 1 pass is the basic mode. 2 passes (default) allows greater accuracy, error detection, better contol of output depth.
Histogram parameters:
hist=<file> Specify a file to write the input kmer depth histogram.
histout=<file> Specify a file to write the output kmer depth histogram.
histcol=3 (histogramcolumns) Number of histogram columns, 2 or 3.
pzc=f (printzerocoverage) Print lines in the histogram with zero coverage.
histlen=1048576 Max kmer depth displayed in histogram. Also affects statistics displayed, but does not affect normalization.
Peak calling parameters:
peaks=<file> Write the peaks to this file. Default is stdout.
minHeight=2 (h) Ignore peaks shorter than this.
minVolume=5 (v) Ignore peaks with less area than this.
minWidth=3 (w) Ignore peaks narrower than this.
minPeak=2 (minp) Ignore peaks with an X-value below this.
maxPeak=BIG (maxp) Ignore peaks with an X-value above this.
maxPeakCount=8 (maxpc) Print up to this many peaks (prioritizing height).