Linux系统

linux下的parallel:强大的并行

2019-07-05  本文已影响19人  Amy_Cui

使用示例

如果你有并行需求,就安装和学习下面最基本的示例命令,其他的参数暂不学习即可。

 for f in `ls /public/project/RNA/airway/raw_fq/*gz` ; do echo "name=`basename $f .gz`; gunzip -c $f >~/\$name"; done |parallel -j 2

# 循环的并行:随便你想几个并行
# 无需通过拆文件、用shell的循环来做,或者条件判断等

安装

非管理员安装命令,下载二进制的包

# https://www.gnu.org/software/
# https://www.gnu.org/manual/manual.html
# https://www.gnu.org/software/parallel/
# http://ftp.gnu.org/gnu/parallel/
wget -c https://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2 
tar -jxvf parallel-latest.tar.bz2
cd parallel-20190622
cat README
mkdir $HOME/parallel
./configure --prefix=$HOME/parallel&& make && make install
# $HOME/parallel/自定义安装路径

$HOME/parallel/bin/parallel --help

make后显示如下

checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether ln -s works... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating config.h
make  all-recursive
make[1]: Entering directory '/home/qmcui/parallel-20190622'
Making all in src
make[2]: Entering directory '/home/qmcui/parallel-20190622/src'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/qmcui/parallel-20190622/src'
make[2]: Entering directory '/home/qmcui/parallel-20190622'
make[2]: Leaving directory '/home/qmcui/parallel-20190622'
make[1]: Leaving directory '/home/qmcui/parallel-20190622'
Making install in src
make[1]: Entering directory '/home/qmcui/parallel-20190622/src'
make[2]: Entering directory '/home/qmcui/parallel-20190622/src'
 /bin/mkdir -p '/home/qmcui/parallel/bin'
 /usr/bin/install -c parallel sql niceload parcat parset env_parallel env_parallel.ash env_parallel.bash env_parallel.csh env_parallel.dash env_parallel.fish env_parallel.ksh env_parallel.mksh env_parallel.pdksh env_parallel.sh env_parallel.tcsh env_parallel.zsh '/home/qmcui/parallel/bin'
make  install-exec-hook
make[3]: Entering directory '/home/qmcui/parallel-20190622/src'
rm /home/qmcui/parallel/bin/sem || true
rm: cannot remove '/home/qmcui/parallel/bin/sem': No such file or directory
ln -s parallel /home/qmcui/parallel/bin/sem
make[3]: Leaving directory '/home/qmcui/parallel-20190622/src'
 /bin/mkdir -p '/home/qmcui/parallel/share/doc/parallel'
 /usr/bin/install -c -m 644 parallel.html env_parallel.html sem.html sql.html niceload.html parallel_tutorial.html parallel_book.html parallel_design.html parallel_alternatives.html parcat.html parset.html parallel.texi env_parallel.texi sem.texi sql.texi niceload.texi parallel_tutorial.texi parallel_book.texi parallel_design.texi parallel_alternatives.texi parcat.texi parset.texi parallel.pdf env_parallel.pdf sem.pdf sql.pdf niceload.pdf parallel_tutorial.pdf parallel_book.pdf parallel_design.pdf parallel_alternatives.pdf parcat.pdf parset.pdf parallel_cheat.pdf '/home/qmcui/parallel/share/doc/parallel'
 /bin/mkdir -p '/home/qmcui/parallel/share/man/man1'
 /usr/bin/install -c -m 644 parallel.1 env_parallel.1 sem.1 sql.1 niceload.1 parcat.1 parset.1 '/home/qmcui/parallel/share/man/man1'
 /bin/mkdir -p '/home/qmcui/parallel/share/man/man7'
 /usr/bin/install -c -m 644 parallel_tutorial.7 parallel_book.7 parallel_design.7 parallel_alternatives.7 '/home/qmcui/parallel/share/man/man7'
make[2]: Leaving directory '/home/qmcui/parallel-20190622/src'
make[1]: Leaving directory '/home/qmcui/parallel-20190622/src'
make[1]: Entering directory '/home/qmcui/parallel-20190622'
make[2]: Entering directory '/home/qmcui/parallel-20190622'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/home/qmcui/parallel-20190622'
make[1]: Leaving directory '/home/qmcui/parallel-20190622'

parallel --help

Usage:

parallel [options] [command [arguments]] < list_of_arguments
parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
cat ... | parallel --pipe [options] [command [arguments]]

-j n            Run n jobs in parallel
-k              Keep same order
-X              Multiple arguments with context replace
--colsep regexp Split input on regexp for positional replacements
{} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
{3} {3.} {3/} {3/.} {=3 perl code =}    Positional replacement strings
With --plus:    {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
                {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}

-S sshlogin     Example: foo@server.example.com
--slf ..        Use ~/.parallel/sshloginfile as the list of sshlogins
--trc {}.bar    Shorthand for --transfer --return {}.bar --cleanup
--onall         Run the given command with argument on all sshlogins
--nonall        Run the given command with no arguments on all sshlogins

--pipe          Split stdin (standard input) to multiple jobs.
--recend str    Record end separator for --pipe.
--recstart str  Record start separator for --pipe.

See 'man parallel' for details

Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  O. Tange (2018): GNU Parallel 2018, Mar 2018, ISBN 9781387509881,
  DOI https://doi.org/10.5281/zenodo.1146014

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

配置环境变量

vim ~/.bashrc
export PATH=/home/qmcui/parallel/bin:$PATH
. ~/.bashrc

除去parallel的提示

这一步不必须,不太懂代码的,不要乱改,这一步忽略。

vim $HOME/parallel/bin/parallel     # 或者vim parallel 
# 作如下处理,避免每次运行都要出这个命令的提示信息
# 删掉后保存命令文档
image.png image.png

示例

https://www.jianshu.com/p/cc54a72616a1

$ parallel echo ::: a b c d e | tee a.txt
a
b
c
d
e

$ parallel echo ::: A B C ::: D E F | tee b.txt
A D
A E
A F
B D
B E
B F
C D
C E
C F

$ parallel echo ::: a b c d e|tee a.txt
a
b
c
d
e
qmcui 12:23:41 ~/parallel/bin
$  parallel -a a.txt -a b.txt echo
a A D
a A E
a A F
a B D
a B E
a B F
a C D
a C E
a C F
......
e C E
e C F
# 同:cat a.txt |parallel -a - -a b.txt echo
# -标准输入符,缓存占位符
# 同:cat a.txt | parallel echo :::: - b.txt
# 同: parallel echo ::: a b c d e :::: b.txt 

# GNU Parallel使用 --no-run-if-empty 来跳过空行:
qmcui 12:32:43 ~/parallel/bin
$ (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
1
2
qmcui 12:32:45 ~/parallel/bin
$ (echo 1; echo; echo 2) | parallel echo
1

2

参数解释

Usage:

parallel [options] [command [arguments]] < list_of_arguments
parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
cat ... | parallel --pipe [options] [command [arguments]]

常用选项:
::: 后面接参数
:::: 后面接文件
-j、--jobs   并行任务数
-N  每次输入的参数数量
--xargs会在一行中输入尽可能多的参数
-xapply 从每一个源获取一个参数(或文件一行)
--header  把每一行输入中的第一个值做为参数名
-m   表示每个job不重复输出“背景”(context)
-X   与-m相反,会重复输出“背景文本”
-q  保护后面的命令
--trim  lr 去除参数两头的空格,只能去除空格,换行符和tab都不能去除
--keep-order/-k   强制使输出与参数保持顺序 --keep-order/-k
--tmpdir/ --results   都是保存文件,但是后者可以有结构的保存
--delay  延迟每个任务启动时间
--halt  终止任务
--pipe    该参数使得我们可以将输入(stdin)分为多块(block)
--block  参数可以指定每块的大小

学习资料

上一篇 下一篇

猜你喜欢

热点阅读