falcon-1
2019-06-29 本文已影响3人
tobebettergirl
https://pb-falcon.readthedocs.io/en/latest/
The current PacBio Assembly suite documentation which includes new bioconda instructions for installing FALCON, FALCON_unzip and their associated dependencies can be found here pb_assembly
https://github.com/PacificBiosciences/pb-assembly
input_fofn: list of paths to the input fasta files is specified
input_type: raw or preads
# the pipeline will skip the entire 0-rawreads pre-assembly phase.
# large genomes
pa_DBsplit_option=-x500 -s200
ovlp_DBsplit_option=-x500 -s200
# small genomes (<10Mb)
pa_DBsplit_option = -x500 -s50
ovlp_DBsplit_option = -x500 -s50
#-x: flag filters reads smaller than what's specified
#-s: flag controls the size of DB blocks
pa_HPCTANmask_option
#additional arguments for tandem repeat masking that will be passed to HPC.TANmask
pa_REPmask_code
#The second phase of masking deals with interspersed repeats and can be run in up to 3 iterations specified with the pa_REPmask_code option. The parameters needed for each iteration are both the group size and coverage specified as group,coverage pairs separated by semicolons as seen above.
genome_size=200000
seed_coverage=30
# 20-40x seed coverage.
length_cutoff=-1
pa_daligner_option=-h70 -e.75 -l1000 -s100 -k18
# -e: average correlation rate (average sequence identity),0.70 (low quality data) - 0.80 (high quality data). A higher value will help prevent haplotype collapse.
# -l: minimum length of overlap,1000 (shorter library) - 5000 (longer library)
# -k: kmer size,14 (low quality data) - 18 (high quality data),Lower values of -k have higher sensitivity at the tradeoff of increased diskspace, memory consumption and slower run time and tend to work best with lower quality data. In contrast, a larger kmer value for -k has a higher specificity, uses less system resources and runs faster, but will only be suitable for high quality data.
falcon_sense_option=--output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200
# --output-multi flag is necessary for generating proper fasta headers and should not be removed unless your specific use case requires it.
# The parameters --min-idt, --min-cov and --max-n-read set the minimum alignment identity, minimum coverage necessary and max number of reads, respectively, for calling consensus to make the preads.
pa_HPCdaligner_option=-v -B24 -M16
# the -v parameter is passed to the LAsort and LAmerge programs while -B and -M parameters are passed to the daligner sub-commands.
[job.defaults]
job_type=sge
#the job_type. Allowed values are sge, pbs, torque, slurm, lsf and local.
pwatcher_type=blocking
#pwatcher_type: blocking or fs_based
#fs_based : the default and relies on the pipeline polling the file system periodically to determine whether a sentinel file has appeared that would signal the pipeline to continue
#blocking : The other option is to use a blocking process watcher which can help with systems that have issues with filesystem latency
JOB_QUEUE = default
MB = 32768
NPROC = 6
njobs = 32
submit = qsub -S /bin/bash -sync y -V \
-q ${JOB_QUEUE} \
-N ${JOB_NAME} \
-o "${JOB_STDOUT}" \
-e "${JOB_STDERR}" \
-pe smp ${NPROC} \
-l h_vmem=${MB}M \
"${JOB_SCRIPT}"
[job.step.da]
NPROC=4
#NPROC number of processors per job
MB=49152
# MB memory allocated per job
njobs=240
#number of concurrently running jobs njobs
