多行命令并行管理,只需要一个脚本
在上游分析中,多个样本常常要同时分析,为了节省时间我们常常会通过写一个简单的脚本去运行。
比如对于这样的一个accessionlist,样本数较少
SRR15927225
SRR15927226
SRR15927227
SRR15927228
SRR15927229
SRR15927230
我们先写一个比对的脚本
awk '{print "hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/"$1"_1*.gz -2 ~/donkey_oocyte/3_cleandata/"$1"_2*.gz -S ./"$1".sam &"}' accessionlist > hisat2.sh
nohup bash hisat2.sh >hisat2.log 2>&1 &
重点在于print "xxxx &",引号内的最后为&。我们查看一下hisat2.sh如下
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927225_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927225_2*.gz -S ./SRR15927225.sam &
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927226_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927226_2*.gz -S ./SRR15927226.sam &
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927227_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927227_2*.gz -S ./SRR15927227.sam &
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927228_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927228_2*.gz -S ./SRR15927228.sam &
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927229_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927229_2*.gz -S ./SRR15927229.sam &
hisat2 -p 6 -x ../1_genome/ASM130575v1_genomic -1 ~/donkey_oocyte/3_cleandata/SRR15927230_1*.gz -2 ~/donkey_oocyte/3_cleandata/SRR15927230_2*.gz -S ./SRR15927230.sam &
在每一行命令的最后都有一个"&",表示这六条命令同时读取运行,即在这种情况下该脚本最多会占用6*6=36个线程。
我一直都按照上面的方法运行着,直到遇到了34个样本,双端测序共68个文件,我裂开了。
因为一旦样本过多,我就要考虑到服务器占用率的问题。
![](https://img.haomeiwen.com/i25976702/c85623f818e32e19.png)
对于我使用的96线程服务器,即使我可以独自使用(往往不可能),我仍需要进行计算:68个文件如果按照以上方法写脚本,那每一个命令所用的线程数至多为1(2×68>96)。如果运行过程中服务器出现了故障或崩溃,所有文件将全部完蛋。这该如何是好?
神器submit.sh
因此,我向曾老师请教了这个问题,拿到了一个完美的解决办法。即神器submit.sh,代码如下
cat $1 | while read id
do
if ((i%$2==$3))
then
$id
fi
i=$((i+1))
done
最终在执行脚本时就执行以下命令
for i in {0..3};do (nohup bash submit.sh script2.sh 4 $i 2>&1);done
代码解析
接下来给大家解析一下这两个命令。
在Linux Shell脚本中,$1,$2,$3用来表示传入到脚本中对应位置的参数。
![](https://img.haomeiwen.com/i25976702/6b5f8952c2d7d4d2.png)
在上面的脚本中$1代表输入给(submit.sh)脚本的第一个参数,即第二个脚本(script2.sh),同理$2代表4,$3代表$i。
这里我们以包含34行命令的hisat2.sh(script2.sh)为例,hisat2.sh内容如下(命令的最后是没有"&"的)
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_001_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_001_1*_2.fq.gz -S ./TR_5445_001_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_001_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_001_2*_2.fq.gz -S ./TR_5445_001_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_001_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_001_3*_2.fq.gz -S ./TR_5445_001_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_002_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_002_1*_2.fq.gz -S ./TR_5445_002_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_002_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_002_2*_2.fq.gz -S ./TR_5445_002_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_002_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_002_3*_2.fq.gz -S ./TR_5445_002_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_003_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_003_1*_2.fq.gz -S ./TR_5445_003_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_003_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_003_2*_2.fq.gz -S ./TR_5445_003_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_003_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_003_3*_2.fq.gz -S ./TR_5445_003_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_004_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_004_1*_2.fq.gz -S ./TR_5445_004_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_004_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_004_2*_2.fq.gz -S ./TR_5445_004_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_004_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_004_3*_2.fq.gz -S ./TR_5445_004_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_005_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_005_1*_2.fq.gz -S ./TR_5445_005_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_005_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_005_2*_2.fq.gz -S ./TR_5445_005_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_005_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_005_3*_2.fq.gz -S ./TR_5445_005_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_006_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_006_1*_2.fq.gz -S ./TR_5445_006_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_006_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_006_2*_2.fq.gz -S ./TR_5445_006_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_006_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_006_3*_2.fq.gz -S ./TR_5445_006_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_007_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_007_1*_2.fq.gz -S ./TR_5445_007_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_007_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_007_2*_2.fq.gz -S ./TR_5445_007_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_007_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_007_3*_2.fq.gz -S ./TR_5445_007_3.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_008_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_008_1*_2.fq.gz -S ./TR_5445_008_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_008_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_008_2*_2.fq.gz -S ./TR_5445_008_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_009_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_009_1*_2.fq.gz -S ./TR_5445_009_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_009_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_009_2*_2.fq.gz -S ./TR_5445_009_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_010_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_010_1*_2.fq.gz -S ./TR_5445_010_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_010_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_010_2*_2.fq.gz -S ./TR_5445_010_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_011_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_011_1*_2.fq.gz -S ./TR_5445_011_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_011_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_011_2*_2.fq.gz -S ./TR_5445_011_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_012_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_012_1*_2.fq.gz -S ./TR_5445_012_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_012_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_012_2*_2.fq.gz -S ./TR_5445_012_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_013_1*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_013_1*_2.fq.gz -S ./TR_5445_013_1.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_013_2*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_013_2*_2.fq.gz -S ./TR_5445_013_2.sam
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome -1 /home/xiaowang/proj1115/3_trim/TR_5445_013_3*_1.fq.gz -2 /home/xiaowang/proj1115/3_trim/TR_5445_013_3*_2.fq.gz -S ./TR_5445_013_3.sam
为了便于大家理解,我们把for in循环的i称作"循环i",submit.sh中的i称作"行数i"。
在Linux的shell脚本中,行号是从0开始计数的哦!但如果通过cat -n来查看,行号仍是从1开始显示。下文中提到的第X行命令都指的是shell脚本中的行号。
![](https://img.haomeiwen.com/i25976702/1bd837ed09b1e46f.png)
系统会在后台依次读取hisat2.sh的每一行,每读一行就会判断【行数i】除以4的余数,如果余数等于【循环i】,就会执行该行命令,命令执行完成,随即if语句结束,【行数i】=【行数i+1】,运行结束。
当for in循环的【循环i】=0时,【行数i】先为0,0%4=0=【循环i】,第0行命令执行,【行数i】=0+1=1,1%4=1≠【循环i】,第1行命令被跳过,【行数i】=1+1=2,2%4=2≠【循环i】,第2行命令被跳过,【行数i】=2+1=3,3%4=3≠【循环i】,第3行命令被跳过,【行数i】=3+1=4,4%4=0=【循环i】,第4行命令执行...
因此,【循环i】=0时,行数为4,8,12...4n的命令被执行;【循环i】=1时,行数为1,5,9...4n+1的命令被执行;【循环i】=2时,行数为2,6,10...4n+2的命令被执行;【循环i】=3时,行数为3,7,11...4n+3的命令被执行。
通过这种方式,我们将所有的命令分成了四份。而for in循环语句中的几个循环是并行的,也就代表这4个循环同时进行,即4份命令同时进行,每一份的上一条命令运行结束后,下一条命令才会执行。
我们可以简单的把linux系统理解为景区售票处,每一行命令代表一个人,景区因为人流量过大,安排了几个入口,所有人都需要按照一定的规则排队,只有当前面的人通过时,后面的人才能有序通过。
如何使用和修改
如果实在是理解不了上面的代码也没有关系,想要使用冰箱难道一定需要知道冰箱的原理吗?
在运行包含多行命令的脚本时,只需要修改下图中红框里的内容。
![](https://img.haomeiwen.com/i25976702/0f246ebf50511bff.png)
同理,如果想要把命令分为10份并行只需要
for i in {0..9};do (nohup bash submit.sh script2.sh 10 $i 2>&1);done
公z号:小汪Waud