Common Workflow Language [二]

2020-06-30 本文已影响0人生信师姐

六、参数引用

能在另一个位置重复使用参数值吗?

在上一个例子中，我们使用tar程序提取了一个文件。然而，这个例子非常有限，因为它假设我们感兴趣的文件名为“ hello.txt” ，它被写入.cwl文件中。这不是好方法，因为“ hello.txt”文件名可能会有所不同，或者取决于所使用的输入文件。为了避免这种情况，我们可以在作业参数文件(.yml)中指定所需文件的名称。在本例中，将看到如何从其他字段动态引用输入参数的值，这将允许我们指定要提取的文件的名称。

tar-param.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [tar, --extract]
inputs:
  tarfile:
    type: File
    inputBinding:
      prefix: --file
  extractfile:
    type: string
    inputBinding:
      position: 1
outputs:
  extracted_file:
    type: File
    outputBinding:
      glob: $(inputs.extractfile)                       #######**************

tar-param-job.yml

tarfile:
  class: File
  path: hello.tar
extractfile: goodbye.txt

调用cwl-runner :

$ rm hello.tar || true && touch goodbye.txt && tar -cvf hello.tar goodbye.txt
$ cwl-runner tar-param.cwl tar-param-job.yml
[job tar-param.cwl] /tmp/tmpwH4ouT$ tar \
    --extract --file \
    /tmp/tmpREYiEt/stgd7764383-99c9-4848-af51-7c2d6e5527d9/hello.tar \
    goodbye.txt
[job tar-param.cwl] completed success
{
    "extracted_file": {
        "location": "file:///home/me/cwl/user_guide/goodbye.txt",
        "basename": "goodbye.txt",
        "class": "File",
        "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
        "size": 0,
        "path": "/home/me/cwl/user_guide/goodbye.txt"
    }
}
Final process status is success

某些字段允许参数引用，这些参数包含在$(...)中。这些参数被替换为引用的值.

outputs:
  extracted_out:
    type: File
    outputBinding:
      glob: $(inputs.extractfile)

引用是使用 javascript 语法的子集编写的。在本例中，$(inputs.extractfile), $(inputs["extractfile"])和$(inputs['extractfile'])是等价的。

“inputs”变量的值是调用 cwl 工具时提供的输入对象。

注意：因为File参数是对象，要获得输入文件的路径，必须引用 file 对象上的 path 字段; 要引用上面示例中 tar 文件的路径，将写入‘ $(inputs.tarfile.path)。

Where are parameter references allowed?

You can only use parameter references in certain fields. These are:

From CommandLineTool

arguments
valueFrom

stdin

stdout

stderr

From CommandInputParameter

format

secondaryFiles

From inputBinding

valueFrom

From CommandOutputParamater

format

secondaryFiles

From CommandOutputBinding

glob

outputEval

From Workflow

From InputParameter and WorkflowOutputParameter

format

secondaryFiles

From steps

From WorkflowStepInput

valueFrom

From ExpressionTool

expression

From InputParameter and ExpressionToolOutputParameter

format

secondaryFiles

From ResourceRequirement

coresMin

coresMax

ramMin

ramMax

tmpdirMin

tmpdirMax

outdirMin

outdirMax

From InitialWorkDirRequirement

listing

in Dirent

entry

entryname

From EnvVarRequirement

From EnvironmentDef

envValue

总结

一些字段允许参数引用包含在$(...)中.
引用是用 javascript 语法的子集编写的。

七、在 docker 内部运行工具

Docker 容器通过为软件及其依赖提供一个complete known-good runtime来简化软件安装。然而，容器也与主机系统隔离，因此为了在 docker 容器中运行工具，还需要做额外的工作，以确保容器中的输入文件可用，并且可以从容器中恢复输出文件。 CWL 运行器可以自动执行这项工作，允许您使用 docker 简化软件管理，同时避免调用和管理 docker 容器的复杂性。

CWL runner的一项职责，是调整输入文件的路径，以反映它们在容器内的位置。

这个示例在 docker 容器中运行一个简单的 node.js 脚本，然后将“ hello world”打印到标准输出。

docker.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: node
hints:
  DockerRequirement:
    dockerPull: node:slim
inputs:
  src:
    type: File
    inputBinding:
      position: 1
outputs:
  example_out:
    type: stdout
stdout: output.txt

docker-job.yml

src:
  class: File
  path: hello.js

baseCommand: node
hints:
  DockerRequirement:
    dockerPull: node:slim

baseCommand: node 告诉 CWL 我们将在一个容器中运行这个命令。然后我们需要指定一些hints来找到我们想要的容器。在这种情况下，DockerRequirements列出docker容器的要求

dockerPull: 参数的值与传递给docker pull命令的值相同。也就是容器图像的名称(您甚至可以指定标记，这对于使用容器进行可重复性研究时的最佳实践是个好主意)。在这个案例中，我们使用了一个叫做node:slim的容器。

$ echo "console.log(\"Hello World\");" > hello.js
$ cwl-runner docker.cwl docker-job.yml
[job docker.cwl] /tmp/tmpgugLND$ docker \
    run \
    -i \
    --volume=/tmp/tmpgugLND:/var/spool/cwl:rw \
    --volume=/tmp/tmpSs5JoN:/tmp:rw \
    --volume=/home/me/cwl/user_guide/hello.js:/var/lib/cwl/job369354770_examples/hello.js:ro \
    --workdir=/var/spool/cwl \
    --read-only=true \
    --user=1000 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/var/spool/cwl \
    node:slim \
    node \
    /var/lib/cwl/job369354770_examples/hello.js > /tmp/tmpgugLND/output.txt
[job docker.cwl] completed success
{
    "example_out": {
        "location": "file:///home/me/cwl/user_guide/output.txt",
        "basename": "output.txt",
        "class": "File",
        "checksum": "sha1$648a6a6ffffdaa0badb23b8baf90b6168dd16b3a",
        "size": 12,
        "path": "/home/me/cwl/user_guide/output.txt"
    }
}
Final process status is success
$ cat output.txt
Hello World

注意 cwl runner 已经构建了一个 docker 命令行来运行脚本。

在这个例子中，脚本hello.js在容器外部的路径是/home/me/cwl/user_guide/hello.js，但是在容器内部是/var/lib/cwl/job369354770_examples/hello.js，这同样反映在node命令的调用中。

重点

容器可以帮助简化工具软件需求的管理。
为在hints部分具有DockerRequirement的工具指定一个 docker 映像。

八、附加参数参数Arguments 和 Parameters

如何指定不需要输入值的参数？
如何引用runtime参数？

有时工具需要额外的命令行选项，这些选项与输入参数不完全对应。

本例中，我们将包装 java 编译器，将 java 源文件编译成类文件。默认情况下，“ javac”将在与源文件相同的目录中创建类文件。然而，CWL输入文件(以及它们出现的目录)可能是只读的，因此我们需要指示“ javac”将类文件写到指定的输出目录。

arguments.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
label: Example trivial wrapper for Java 9 compiler
hints:
  DockerRequirement:
    dockerPull: openjdk:9.0.1-11-slim
baseCommand: javac
arguments: ["-d", $(runtime.outdir)]
inputs:
  src:
    type: File
    inputBinding:
      position: 1
outputs:
  classfile:
    type: File
    outputBinding:
      glob: "*.class"

arguments-job.yml

src:
  class: File
  path: Hello.java

$ echo "public class Hello {}" > Hello.java
$ cwl-runner arguments.cwl arguments-job.yml
[job arguments.cwl] /tmp/tmpwYALo1$ docker \
 run \
 -i \
 --volume=/home/peter/work/common-workflow-language/v1.0/examples/Hello.java:/var/lib/cwl/stg8939ac04-7443-4990-a518-1855b2322141/Hello.java:ro \
 --volume=/tmp/tmpwYALo1:/var/spool/cwl:rw \
 --volume=/tmp/tmpptIAJ8:/tmp:rw \
 --workdir=/var/spool/cwl \
 --read-only=true \
 --user=1001 \
 --rm \
 --env=TMPDIR=/tmp \
 --env=HOME=/var/spool/cwl \
 java:7 \
 javac \
 -d \
 /var/spool/cwl \
 /var/lib/cwl/stg8939ac04-7443-4990-a518-1855b2322141/Hello.java
Final process status is success
{
  "classfile": {
    "size": 416,
    "location": "/home/example/Hello.class",
    "checksum": "sha1$2f7ac33c1f3aac3f1fec7b936b6562422c85b38a",
    "class": "File"
  }
}

在这里，我们使用arguments字段向命令行添加一个附加参数，该参数不绑定到特定的输入参数。

arguments: ["-d", $(runtime.outdir)]

这个例子引用了一个Runtime参数。Runtime参数在实际执行工具时提供有关硬件或软件环境的信息。 $(runtime.outdir)参数指定输出目录的路径。其他参数包括 $(runtime.tmpdir), $(runtime.ram), $(runtime.cores), $(runtime.outdirSize)和 $(runtime.tmpdirSize)。

重点

arguments 部分描述与输入参数不完全对应的命令行选项。
Runtime参数在工具实际执行时提供有关环境的信息。
Runtime参数在 runtime 名称空间下引用。

九、Array Inputs

添加输入参数数组到命令行很容易。有两种方法指定数组参数。

为type字段提供type: array 和items，定义出现在数组中的有效数据类型。
在类型名称之后添加方括号[]，以指示输入参数是该类型的数组。

$\color{red}{array-inputs.cwl}$

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
inputs:
  filesA:
    type: string[]
    inputBinding:
      prefix: -A
      position: 1

  filesB:
    type:
      type: array
      items: string
      inputBinding:
        prefix: -B=
        separate: false
    inputBinding:
      position: 2

  filesC:
    type: string[]
    inputBinding:
      prefix: -C=
      itemSeparator: ","
      separate: false
      position: 4

outputs:
  example_out:
    type: stdout
stdout: output.txt
baseCommand: echo

array-inputs-job.yml

filesA: [one, two, three]
filesB: [four, five, six]
filesC: [seven, eight, nine]

$ cwl-runner array-inputs.cwl array-inputs-job.yml
[job array-inputs.cwl] /home/examples$ echo \
    -A \
    one \
    two \
    three \
    -B=four \
    -B=five \
    -B=six \
    -C=seven,eight,nine > /home/examples/output.txt
[job array-inputs.cwl] completed success
{
    "example_out": {
        "location": "file:///home/examples/output.txt",
        "basename": "output.txt",
        "class": "File",
        "checksum": "sha1$91038e29452bc77dcd21edef90a15075f3071540",
        "size": 60,
        "path": "/home/examples/output.txt"
    }
}
Final process status is success
$ cat output.txt
-A one two three -B=four -B=five -B=six -C=seven,eight,nine

inputBinding可以出现在外部数组参数定义或内部数组元素定义中，如上所示，这些都会在构造命令行时产生不同的行为。此外，如果提供了itemSeparator字段，则指定数组值应连接成一个由分隔符分隔的参数。

注意：输入数组是在array-inputs-job.yml方括号内指定的[]中。数组也可以在多行上表示，其中没有用关联键定义的数组值由前导的-标记。

重点

数组参数的定义嵌套在type字段，type: array.
数组参数在命令行上的外观不同取决于描述中提供的inputBinding字段。
使用itemSeparator字段控制数组参数的连接。

十、Array Outputs

可以使用glob将多个输出文件捕获到文件数组中。

array-outputs.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: touch
inputs:
  touchfiles:
    type:
      type: array
      items: string
    inputBinding:
      position: 1
outputs:
  output:
    type:
      type: array
      items: File
    outputBinding:
      glob: "*.txt"

array-outputs-job.yml

touchfiles:
  - foo.txt
  - bar.dat
  - baz.txt

$ cwl-runner array-outputs.cwl array-outputs-job.yml
[job 140190876078160] /home/example$ touch foo.txt bar.dat baz.txt
Final process status is success
{
  "output": [
    {
      "size": 0,
      "location": "foo.txt",
      "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
      "class": "File"
    },
    {
      "size": 0,
      "location": "baz.txt",
      "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
      "class": "File"
    }
  ]
}

the array of expected outputs is specified in array-outputs-job.yml with each entry marked by a leading -. This format can also be used in CWL descriptions to mark entries in arrays, as demonstrated in several of the upcoming sections.
预期输出的数组在array-outputs-job.yml中以一个前导的-指定。这种格式也可以用在 CWL 描述中，以数组的形式标记条目。

重点

使用glob通配符将多个输出文件捕获到一个文件数组中.
使用通配符和文件名指定工具执行后将返回的输出文件。

Common Workflow Language [二]

六、参数引用

Where are parameter references allowed?

七、在 docker 内部运行工具

八、附加参数参数Arguments 和 Parameters

九、Array Inputs

十、Array Outputs

猜你喜欢

热点阅读