生物信息

snakemake常见功能记录

2021-08-29  本文已影响0人  无话_

通配符的使用

rule complex_conversion:
    input:
        "{dataset}/inputfile"
    output:
        "{dataset}/file.{group}.txt"
    shell:
        "somecommand --group {wildcards.group} < {input} > {output}"
#shell中使用 wildcards.xxx
进阶操作
output: "{dataset,\d+}.{group}.txt"
#正则表达式限制

wildcard_constraints:
    dataset="\d+"
rule a:
    ...
rule b:
    ...
#全局限制

Expend(自定义数组)

rule aggregate:
    input:
        expand("{dataset}/a.{ext}", dataset=DATASETS, ext=FORMATS)
    output:
        "aggregated.txt"
    shell:
        ...
进阶操作
expand("{{dataset}}/a.{ext}", ext=FORMATS)
#保留{dataset}通配符功能

#简化版expend---multiext
rule plot:
    input:
        ...
    output:
        multiext("some/plot", ".pdf", ".svg", ".png")
    shell:
        ...
#同expand("some/plot.{ext}", ext=[".pdf", ".svg", ".png"])

Threads与Resources

# attempt和--restart-times
# 通过设置这两个参数,在处理大内存项目时,可以实现自动增加内存多次尝试投递
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources

Messages

rule NAME:
    input: "path/to/inputfile", "path/to/other/inputfile"
    output: "path/to/outputfile", "path/to/another/outputfile"
    threads: 8
    message: "Executing somecommand with {threads} threads on the following files {input}."
    shell: "somecommand --threads {threads} {input} {output}"

Priorities(优先级)

# 数字越大优先级越高,感觉也没啥大用
rule:
  input: ...
  output: ...
  priority: 50
  shell: ...

Log-Files

rule abc:
    input: "input.txt"
    output: "output.txt"
    log: "logs/abc.log"
    shell: "somecommand --log {log} {input} {output}"
# 会创建log文件,需要该命令本身的支持
#无-log参数,可尝试将标准输出重定向至log文件(未实际尝试)
#使用通配符写入多个
log: "logs/abc.{dataset}.log"

parameters

rule:
    input:
        ...
    params:
        prefix="somedir/{sample}"
    output:
        "somedir/{sample}.csv"
    shell:
        "somecommand -o {params.prefix}"
#某些脚本的使用时,并不直接使用其文件,而完整文件名的一部分,甚至是目录
#以及某些参数的指定

rule:
    input:
        ...
    params:
        prefix=lambda wildcards, output: output[0][:-4]
    output:
        "somedir/{sample}.csv"
    shell:
        "somecommand -o {params.prefix}"
#func作为输入/参数
#wildcards has to be the first argument

python

#调用外部python脚本,读取snakemake参数,此处应该需要将snakemakefile与py置于同一目录(未实测)
#此处不是指直接用 run 运行Python脚本

#外部python脚本示例
def do_something(data_path, out_path, threads, myparam):
do_something(snakemake.input[0], snakemake.output[0], snakemake.threads, snakemake.config["myparam"])

R

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#r-and-r-markdown

temp/protected

#该文件使用后将删除
rule NAME:
    input:
        "path/to/inputfile"
    output:
        temp("path/to/outputfile")
    shell:
        "somecommand {input} {output}"
#该文件生成后保护
rule NAME:
    input:
        "path/to/inputfile"
    output:
        protected("path/to/outputfile")
    shell:
        "somecommand {input} {output}"

directory

#目录作为输出,能不能就不用
rule NAME:
    input:
        "path/to/inputfile"
    output:
        directory("path/to/outputdir")
    shell:
        "somecommand {input} {output}"

flag file

rule all:
    input: "mytask.done"

rule mytask:
    output: touch("mytask.done")
    shell: "mycommand ..."

Job Properties

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#job-properties

Functions as Input Files

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#functions-as-input-files

config

使用yaml格式
https://www.runoob.com/w3cnote/yaml-intro.html
#使用snamemake --config 可覆盖config文件中含有的参数
#不能使用sys读取输入,sys读取时,会按空格切分,会将snakemake本身使用的参数如(-s,-np,--config)读取到脚本中

其它

#产生新文件的同时,直接修改了原文件
#在snakemake断点运行时会出现问题

# 不要陷入使用rule all 定义变量后二次运行的误区
# http://www.xknote.com/ask/60f336aaf2eb8.html

#读取文件大小以设定参数
https://stackoverflow.com/questions/50891407/snakemake-how-to-dynamically-set-memory-resource-based-on-input-file-size
上一篇 下一篇

猜你喜欢

热点阅读