nexflow使用(一)
以下内容仅为自己学习nexflow时,方便理解的记录。
本篇我们只介绍nextflow中函数、正则、文件操作
我将在nexflow使用(二)中介绍Process
在nexflow使用(三)中介绍Channels
在nexflow使用(四)中介绍Channels的操作(operators)及全局变量的配置
在看本篇内容之前,请自行根据官网安装nextflow,并运行官网的例子nextflow_example
基本概念
1.每个程序用process,每个数据流是一个channels
2.可基于sge等执行
3.主流程的目录可定义 nextflow.config 文件,包括一些全局变量:executor='sge'
4.代码规范
Nextflow scripting is an extension of the Groovy programming language, which in turn is a super-set of the Java programming language. Groovy can be considered as Python for Java in that is simplifies the writing of code and is more approachable.
nextflow脚本撰写
函数类型:变量、数组、Maps(字典)
其他: 多变量赋值((a, b, c) = [10, 20, 'foo'])、if else、打印、Closures(相当于模块)
#list
myList = [1776, -1, 33, 99, 0, 928734928763]
println myList.size()
#maps
scores = [ "Brett":100, "Pete":"Did not finish", "Andrew":86.87934 ]
println scores["Pete"]
scores["Pete"] = 3
#if else
x = Math.random()
if( x < 0.5 ) {
println "You lost."}else {
println "You won!"}
#打印
println "he said 'cheese' once"
println 'he said "cheese!" again'
a = "world"
print "hello " + a + "\n"
#clousers
square = { it * it }
[ 1, 2, 3, 4 ].collect(square)
#print:[ 1, 4, 9, 16 ]
正则
assert 'foobar' =~ /foo/ // return TRUE
#替代
x = "colour".replaceFirst(/ou/, "o")
println x// prints: color
y = "cheesecheese".replaceAll(/cheese/, "nice")
println y// prints: nicenice
#捕获
programVersion = '2.7.3-beta'
m = programVersion =~ /(\d+)\.(\d+)\.(\d+)-?(.+)/
assert m[0] == ['2.7.3-beta', '2', '7', '3', 'beta']
assert m[0][1] == '2'
assert m[0][2] == '7'
assert m[0][3] == '3'
assert m[0][4] == 'beta'
programVersion = '2.7.3-beta'
(full, major, minor, patch, flavor) = (programVersion =~ /(\d+)\.(\d+)\.(\d+)-?(.+)/)[0]
println full // 2.7.3-beta
println major // 2
println minor // 7
println patch // 3
println flavor // beta
#去除匹配的字符
// define the regexp pattern
wordStartsWithGr = ~/(?i)\s+Gr\w+/
// apply and verify the result
('Hello Groovy world!' - wordStartsWithGr) == 'Hello world!'
('Hi Grails users' - wordStartsWithGr) == 'Hi users'
assert ('Remove first match of 5 letter word' - ~/\b\w{5}\b/) == 'Remove match of 5 letter word'
assert ('Line contains 20 characters' - ~/\d+\s+/) == 'Line contains characters'
文件与I/O
文件可用函数:glob、type(file、dir、any)、hidden、maxDepth、followLinks、checkIfExists
listOfFiles = file('some/path/*.fa')
//隐藏文件
listWithHidden = file('some/path/*.fa', hidden: true)
write file
#写入一个文件
myFile.text = 'Hello world!'
myFile.append('Add this line\n')
#or
myFile << 'Add a line more\n'
读文件
#读二进制文件
binaryContent = myFile.bytes
#or
myFile.bytes = binaryBuffer
#注意:最好不要这样读大文件,暂用内存大
按行读文件
myFile = file('some/my_file.txt')
allLines = myFile.readLines()
for( line : allLines ) {
println line
}
#上面的代码可被写成下面的惯用语法
file('some/my_file.txt')
.readLines()
.each { println it }
#如果是特别大的文件最好用eachLine,因为readLines会读所有的内容
count = 0
myFile.eachLine { str ->
println "line ${count++}: $str"
}
高级的读文件的操作:
getText、getBytes、readLines、withReader、withInputStream、newReader、newInputStream
#Reader, InputStream读文件,创造了方便控制的对象(object), withReader不用写close;
#newInputStream , withInputStream 相较于Reader,多了一个写入二进制的对象
myReader = myFile.newReader()
String line
while( line = myReader.readLine() ) {
println line}myReader.close()
}
myReader.close()
#or
myFile.withReader {
String line
while( line = myReader.readLine() ) {
println line
}
}
写文件的操作
setText、setBytes、write、append、newWriter、newPrintWriter、newOutputSteam、withWriter、withPrintWriter、withOutputStream
#列出目录所有的文件
myDir = file('any/path')
allFiles = myDir.list()
for( def file : allFiles ) {
println file
}
注意: list 和 listFiles的区别是,list其实返回的是字符,listFiles返回的是对象,可使用size,last modified time
myDir.eachFile { item ->
if( item.isFile() ) {
println "${item.getName()} - size: ${item.size()}"
}
else if( item.isDirectory() ) {
println "${item.getName()} - DIR"
}
}
类似的可用变量:eachFile、eachDir、eachFileMatch、eachDirmatch、eachFileRecurse、eachDirRecurse
接下来我们还会说 类似的, Channel的fromPath
#创建目录
myDir.mkdirs()
#做链接,参数有hard、overwrite
myFile = file('/some/path/file.txt')myFile.mklink('/user/name/link-to-file.txt')
#复制
myFile.copyTo('new_name.txt')
#移动
myDir = file('/any/dir_a')
myDir.moveTo('/any/dir_b')
#重命名
myFile.renameTo('new_file_name.txt')
#删除文件
result = myFile.delete
println result ? "OK" : "Can delete: $myFile"
查看文件的属性
file方法的属性主要有:getName、getBaseName、getSimplename、getExtension、getParent、size、exists、isEmpty、isFile、isDirectory、isHidden、lastModified
println "File ${myFile.getName() size: ${myFile.size()}"
#获得修改文件的权限
permissions = myFile.getPermissions()
myFile.setPermissions('rwxr-xr-x')
myFile.setPermissions(7,5,5)

HTTP/FTP 文件
#示例
pdb = file('http://files.rcsb.org/header/5FID.pdb')
println pdb.text
计算(数数)
#countlines计算行数
def sample = file('/data/sample.txt')
println sample.countLines()
#countFasta计算fasta条数
def sample = file('/data/sample.fasta')
println sample.countFasta()
#fastq
def sample = file('/data/sample.fastq')
println sample.countFastq()