snakemake杂记:基因组组装工具Megahit提交任务

2023-03-21  本文已影响0人  小明的数据分析笔记本

最开始写的内容

SAMPLES, = glob_wildcards("01.clean.fq/{sample}_1.fq.gz")

print("Total sample: ",len(SAMPLES))

rule all:
    input:
        expand("02.megahit/{sample}/{sample}.contigs.fa",sample=SAMPLES)

rule run_megahit:
    input:
        r1 = "01.clean.fq/{sample}_1.fq.gz",
        r2 = "01.clean.fq/{sample}_2.fq.gz"
    output:
        "02.megahit/{sample}/{sample}.contigs.fa"
    threads:
        12
    resources:
        mem = 24000
    params:
        output_folder = "02.megahit/{sample}",
        prefix = "{sample}",
        mem = "24000000000"
    shell:
        """
        megahit -1 {input.r1} -2 {input.r2} -o {params.output_folder} --out-prefix {params.prefix} -t {threads} -m {params.mem}
        """

提交任务的时候会一直报错

Output directory /data/myan/raw_data/pome/pan.raw.fq/02.megahit/Gl_MOL already exists, please change the parameter -o to another value to avoid overwriting.

改成如下

SAMPLES, = glob_wildcards("../01.clean.fq/{sample}_1.fq.gz")

print("Total sample: ",len(SAMPLES))

rule all:
    input:
        expand("{sample}.log",sample=SAMPLES)

rule run_megahit:
    input:
        r1 = "../01.clean.fq/{sample}_1.fq.gz",
        r2 = "../01.clean.fq/{sample}_2.fq.gz"
    output:
        "{sample}.log"
    threads:
        12
    resources:
        mem = 24000
    params:
        output_folder = "{sample}",
        prefix = "{sample}",
        mem = "24000000000"
    log:
        "{sample}.log"
    shell:
        """
        megahit -1 {input.r1} -2 {input.r2} -o {params.output_folder} --out-prefix {params.prefix} -t {threads} -m {params.mem} --min-contig-len 500 1>{log} 2>&1
        """

就是output那里如果写了文件夹,snakemake会新建文件夹,到了shell命令那里又会有 -o参数,就会检测到存在这个文件夹就报错,目前能想到的办法就是在output那里不写文件夹,不知道还有没有其他解决办法

之前写批量get_organelle_from_reads.py组装叶绿体的时候也遇到了这个问题,但是这个脚本有个参数是如果问价存在就覆盖这个文件夹,当时加了这个参数就好了

上一篇 下一篇

猜你喜欢

热点阅读