snakemake杂记:基因组组装工具Megahit提交任务
2023-03-21 本文已影响0人
小明的数据分析笔记本
最开始写的内容
SAMPLES, = glob_wildcards("01.clean.fq/{sample}_1.fq.gz")
print("Total sample: ",len(SAMPLES))
rule all:
input:
expand("02.megahit/{sample}/{sample}.contigs.fa",sample=SAMPLES)
rule run_megahit:
input:
r1 = "01.clean.fq/{sample}_1.fq.gz",
r2 = "01.clean.fq/{sample}_2.fq.gz"
output:
"02.megahit/{sample}/{sample}.contigs.fa"
threads:
12
resources:
mem = 24000
params:
output_folder = "02.megahit/{sample}",
prefix = "{sample}",
mem = "24000000000"
shell:
"""
megahit -1 {input.r1} -2 {input.r2} -o {params.output_folder} --out-prefix {params.prefix} -t {threads} -m {params.mem}
"""
提交任务的时候会一直报错
Output directory /data/myan/raw_data/pome/pan.raw.fq/02.megahit/Gl_MOL already exists, please change the parameter -o to another value to avoid overwriting.
改成如下
SAMPLES, = glob_wildcards("../01.clean.fq/{sample}_1.fq.gz")
print("Total sample: ",len(SAMPLES))
rule all:
input:
expand("{sample}.log",sample=SAMPLES)
rule run_megahit:
input:
r1 = "../01.clean.fq/{sample}_1.fq.gz",
r2 = "../01.clean.fq/{sample}_2.fq.gz"
output:
"{sample}.log"
threads:
12
resources:
mem = 24000
params:
output_folder = "{sample}",
prefix = "{sample}",
mem = "24000000000"
log:
"{sample}.log"
shell:
"""
megahit -1 {input.r1} -2 {input.r2} -o {params.output_folder} --out-prefix {params.prefix} -t {threads} -m {params.mem} --min-contig-len 500 1>{log} 2>&1
"""
就是output那里如果写了文件夹,snakemake会新建文件夹,到了shell命令那里又会有 -o参数,就会检测到存在这个文件夹就报错,目前能想到的办法就是在output那里不写文件夹,不知道还有没有其他解决办法
之前写批量get_organelle_from_reads.py组装叶绿体的时候也遇到了这个问题,但是这个脚本有个参数是如果问价存在就覆盖这个文件夹,当时加了这个参数就好了