『三代测序』

falcon-1

2019-06-29  本文已影响3人  tobebettergirl

https://pb-falcon.readthedocs.io/en/latest/

The current PacBio Assembly suite documentation which includes new bioconda instructions for installing FALCON, FALCON_unzip and their associated dependencies can be found here pb_assembly

https://github.com/PacificBiosciences/pb-assembly

input_fofn: list of paths to the input fasta files is specified

input_type: raw or preads
# the pipeline will skip the entire 0-rawreads pre-assembly phase.

# large genomes
pa_DBsplit_option=-x500 -s200
ovlp_DBsplit_option=-x500 -s200

# small genomes (<10Mb)
pa_DBsplit_option = -x500 -s50
ovlp_DBsplit_option = -x500 -s50
#-x: flag filters reads smaller than what's specified
#-s: flag controls the size of DB blocks

pa_HPCTANmask_option
#additional arguments for tandem repeat masking that will be passed to HPC.TANmask
pa_REPmask_code
#The second phase of masking deals with interspersed repeats and can be run in up to 3 iterations specified with the pa_REPmask_code option. The parameters needed for each iteration are both the group size and coverage specified as group,coverage pairs separated by semicolons as seen above.

genome_size=200000

seed_coverage=30
# 20-40x seed coverage.

length_cutoff=-1

pa_daligner_option=-h70 -e.75 -l1000 -s100 -k18
# -e: average correlation rate (average sequence identity),0.70 (low quality data) - 0.80 (high quality data). A higher value will help prevent haplotype collapse.
# -l: minimum length of overlap,1000 (shorter library) - 5000 (longer library)
# -k: kmer size,14 (low quality data) - 18 (high quality data),Lower values of -k have higher sensitivity at the tradeoff of increased diskspace, memory consumption and slower run time and tend to work best with lower quality data. In contrast, a larger kmer value for -k has a higher specificity, uses less system resources and runs faster, but will only be suitable for high quality data.

falcon_sense_option=--output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200

# --output-multi flag is necessary for generating proper fasta headers and should not be removed unless your specific use case requires it. 
# The parameters --min-idt, --min-cov and --max-n-read set the minimum alignment identity, minimum coverage necessary and max number of reads, respectively, for calling consensus to make the preads.

pa_HPCdaligner_option=-v -B24 -M16
# the -v parameter is passed to the LAsort and LAmerge programs while -B and -M parameters are passed to the daligner sub-commands.

[job.defaults]
job_type=sge
#the job_type. Allowed values are sge, pbs, torque, slurm, lsf and local.
pwatcher_type=blocking
#pwatcher_type: blocking or fs_based
#fs_based : the default and relies on the pipeline polling the file system periodically to determine whether a sentinel file has appeared that would signal the pipeline to continue
#blocking : The other option is to use a blocking process watcher which can help with systems that have issues with filesystem latency

JOB_QUEUE = default
MB = 32768
NPROC = 6
njobs = 32
submit = qsub -S /bin/bash -sync y -V  \
  -q ${JOB_QUEUE}     \
  -N ${JOB_NAME}      \
  -o "${JOB_STDOUT}"  \
  -e "${JOB_STDERR}"  \
  -pe smp ${NPROC}    \
  -l h_vmem=${MB}M    \
  "${JOB_SCRIPT}"

[job.step.da]
NPROC=4
#NPROC number of processors per job

MB=49152
# MB memory allocated per job

njobs=240
#number of concurrently running jobs njobs
image.png

参数的网址: https://github.com/PacificBiosciences/pb-assembly

上一篇 下一篇

猜你喜欢

热点阅读