Linux学习

多行命令并行管理,只需要一个脚本

2022-01-26  本文已影响0人  小汪Waud

在上游分析中,多个样本常常要同时分析,为了节省时间我们常常会通过写一个简单的脚本去运行。

比如对于这样的一个accessionlist,样本数较少

SRR15927225
SRR15927226
SRR15927227
SRR15927228
SRR15927229
SRR15927230

我们先写一个比对的脚本

awk '{print "hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/"$1"_1*.gz -2 ~/donkey_oocyte/3_cleandata/"$1"_2*.gz -S ./"$1".sam &"}' accessionlist > hisat2.sh
nohup bash hisat2.sh >hisat2.log 2>&1 &

重点在于print "xxxx &",引号内的最后为&。我们查看一下hisat2.sh如下

hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927225_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927225_2*.gz -S ./SRR15927225.sam &
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927226_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927226_2*.gz -S ./SRR15927226.sam &
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927227_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927227_2*.gz -S ./SRR15927227.sam &
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927228_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927228_2*.gz -S ./SRR15927228.sam &
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927229_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927229_2*.gz -S ./SRR15927229.sam &
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927230_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927230_2*.gz -S ./SRR15927230.sam &

在每一行命令的最后都有一个"&",表示这六条命令同时读取运行,即在这种情况下该脚本最多会占用6*6=36个线程。

我一直都按照上面的方法运行着,直到遇到了34个样本,双端测序共68个文件,我裂开了。

因为一旦样本过多,我就要考虑到服务器占用率的问题。

对于我使用的96线程服务器,即使我可以独自使用(往往不可能),我仍需要进行计算:68个文件如果按照以上方法写脚本,那每一个命令所用的线程数至多为1(2×68>96)。如果运行过程中服务器出现了故障或崩溃,所有文件将全部完蛋。这该如何是好?

神器submit.sh

因此,我向曾老师请教了这个问题,拿到了一个完美的解决办法。即神器submit.sh,代码如下

cat $1 | while read id
do
    if ((i%$2==$3))
    then
        $id
    fi
i=$((i+1))
done

最终在执行脚本时就执行以下命令

for i in {0..3};do (nohup bash submit.sh script2.sh 4 $i 2>&1);done

代码解析

接下来给大家解析一下这两个命令。

在Linux Shell脚本中,$1,$2,$3用来表示传入到脚本中对应位置的参数。

在上面的脚本中$1代表输入给(submit.sh)脚本的第一个参数,即第二个脚本(script2.sh),同理$2代表4,$3代表$i。

这里我们以包含34行命令的hisat2.sh(script2.sh)为例,hisat2.sh内容如下(命令的最后是没有"&"的)

hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_001_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_001_1*_2.fq.gz -S ./TR_5445_001_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_001_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_001_2*_2.fq.gz -S ./TR_5445_001_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_001_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_001_3*_2.fq.gz -S ./TR_5445_001_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_002_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_002_1*_2.fq.gz -S ./TR_5445_002_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_002_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_002_2*_2.fq.gz -S ./TR_5445_002_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_002_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_002_3*_2.fq.gz -S ./TR_5445_002_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_003_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_003_1*_2.fq.gz -S ./TR_5445_003_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_003_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_003_2*_2.fq.gz -S ./TR_5445_003_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_003_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_003_3*_2.fq.gz -S ./TR_5445_003_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_004_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_004_1*_2.fq.gz -S ./TR_5445_004_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_004_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_004_2*_2.fq.gz -S ./TR_5445_004_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_004_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_004_3*_2.fq.gz -S ./TR_5445_004_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_005_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_005_1*_2.fq.gz -S ./TR_5445_005_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_005_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_005_2*_2.fq.gz -S ./TR_5445_005_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_005_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_005_3*_2.fq.gz -S ./TR_5445_005_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_006_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_006_1*_2.fq.gz -S ./TR_5445_006_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_006_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_006_2*_2.fq.gz -S ./TR_5445_006_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_006_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_006_3*_2.fq.gz -S ./TR_5445_006_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_007_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_007_1*_2.fq.gz -S ./TR_5445_007_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_007_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_007_2*_2.fq.gz -S ./TR_5445_007_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_007_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_007_3*_2.fq.gz -S ./TR_5445_007_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_008_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_008_1*_2.fq.gz -S ./TR_5445_008_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_008_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_008_2*_2.fq.gz -S ./TR_5445_008_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_009_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_009_1*_2.fq.gz -S ./TR_5445_009_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_009_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_009_2*_2.fq.gz -S ./TR_5445_009_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_010_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_010_1*_2.fq.gz -S ./TR_5445_010_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_010_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_010_2*_2.fq.gz -S ./TR_5445_010_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_011_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_011_1*_2.fq.gz -S ./TR_5445_011_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_011_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_011_2*_2.fq.gz -S ./TR_5445_011_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_012_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_012_1*_2.fq.gz -S ./TR_5445_012_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_012_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_012_2*_2.fq.gz -S ./TR_5445_012_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_013_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_013_1*_2.fq.gz -S ./TR_5445_013_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_013_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_013_2*_2.fq.gz -S ./TR_5445_013_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_013_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_013_3*_2.fq.gz -S ./TR_5445_013_3.sam

为了便于大家理解,我们把for in循环的i称作"循环i",submit.sh中的i称作"行数i"。

在Linux的shell脚本中,行号是从0开始计数的哦!但如果通过cat -n来查看,行号仍是从1开始显示。下文中提到的第X行命令都指的是shell脚本中的行号。

https://files.mdnice.com/user/13938/c280c53d-36e3-4d1e-915a-7931b47f0073.png

系统会在后台依次读取hisat2.sh的每一行,每读一行就会判断【行数i】除以4的余数,如果余数等于【循环i】,就会执行该行命令,命令执行完成,随即if语句结束,【行数i】=【行数i+1】,运行结束。

当for in循环的【循环i】=0时,【行数i】先为0,0%4=0=【循环i】,第0行命令执行,【行数i】=0+1=1,1%4=1≠【循环i】,第1行命令被跳过,【行数i】=1+1=2,2%4=2≠【循环i】,第2行命令被跳过,【行数i】=2+1=3,3%4=3≠【循环i】,第3行命令被跳过,【行数i】=3+1=4,4%4=0=【循环i】,第4行命令执行...

因此,【循环i】=0时,行数为4,8,12...4n的命令被执行;【循环i】=1时,行数为1,5,9...4n+1的命令被执行;【循环i】=2时,行数为2,6,10...4n+2的命令被执行;【循环i】=3时,行数为3,7,11...4n+3的命令被执行。

通过这种方式,我们将所有的命令分成了四份。而for in循环语句中的几个循环是并行的,也就代表这4个循环同时进行,即4份命令同时进行,每一份的上一条命令运行结束后,下一条命令才会执行。

我们可以简单的把linux系统理解为景区售票处,每一行命令代表一个人,景区因为人流量过大,安排了几个入口,所有人都需要按照一定的规则排队,只有当前面的人通过时,后面的人才能有序通过。

如何使用和修改

如果实在是理解不了上面的代码也没有关系,想要使用冰箱难道一定需要知道冰箱的原理吗?

在运行包含多行命令的脚本时,只需要修改下图中红框里的内容。

https://files.mdnice.com/user/13938/e42b47b6-a495-473e-bca3-ab1590f94606.png

同理,如果想要把命令分为10份并行只需要

for i in {0..9};do (nohup bash submit.sh script2.sh 10 $i 2>&1);done

公z号:小汪Waud

上一篇 下一篇

猜你喜欢

热点阅读