awk
awk is the most useful command for handling text files. It operates on an entire file line by line. By default it uses whitespace to separate the fields. The most common syntax for awk command is
awk '/search_pattern/ { action_to_take_if_pattern_matches; }' file_to_parse
awk是解释性语言,一行处理完,处理下一行
Print a Text File
awk '{ print }' /etc/passwd
- OR
awk '{ print $0 }' /etc/passwd
单引号中的被大括号括着的就是awk的语句,其只能被单引号包含。
其中的n表示第几列。注:$0表示整个行。
awk的格式化输出,和C语言的printf没什么两样
$ awk '{printf "%-8s %-8s %-8s %-18s %-22s %-15s\n", $1,$2,$3,$4,$5,$6}' netstat.txt
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
tcp 0 0 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
Print Specific Field
Use : as the input field separator and print first field only i.e. usernames (will print the the first field. all other fields are ignored):
awk -F':' '{ print $1 }' /etc/passwd
Send output to sort command using a shell pipe:
awk -F':' '{ print $1 }' /etc/passwd | sort
Pattern Matching
You can only print line of the file if pattern matched. For e.g. display all lines from Apache log file if HTTP error code is 500 (9th field logs status error code for each http request):
awk '$9 == 500 { print $0}' /var/log/httpd/access.log
The part outside the curly braces is called the “pattern”, and the part inside is the “action”. The comparison operators include the ones from C:
== != < > <= >= ?:
If no pattern is given, then the action applies to all lines. If no action is given, then the entire line is printed. If “print” is used all by itself, the entire line is printed. Thus, the following are equivalent:
awk '$9 == 500 ' /var/log/httpd/access.log
awk '$9 == 500 {print} ' /var/log/httpd/access.log
awk '$9 == 500 {print $0} ' /var/log/httpd/access.log
Print Lines Containing tom, jerry AND vivek
Print pattern possibly on separate lines:
awk '/tom|jerry|vivek/' /etc/passwd
Print 1st Line From File, (如果我们需要表头的话,我们可以引入内建变量NR)
awk "NR==1{print;exit}" /etc/resolv.conf
awk "NR==$line{print;exit}" /etc/resolv.conf
字符串匹配
$ awk '$6 ~ /FIN/ || NR==1 {print NR,$4,$5,$6}' OFS="\t" netstat.txt
1 Local-Address Foreign-Address State
6 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
9 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
13 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
18 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2
上面的第一个示例匹配FIN状态, 第二个示例匹配WAIT字样的状态。其实 ~ 表示模式开始。/ /中是模式。这就是一个正则表达式的匹配。
模式取反的例子
$ awk '$6 !~ /WAIT/ || NR==1 {print NR,$4,$5,$6}' OFS="\t" netstat.txt
或者
awk '!/WAIT/' netstat.txt
1 Local-Address Foreign-Address State
2 0.0.0.0:3306 0.0.0.0:* LISTEN
3 0.0.0.0:80 0.0.0.0:* LISTEN
4 127.0.0.1:9000 0.0.0.0:* LISTEN
格式化输出
$ awk '$3==0 && $6=="LISTEN" || NR==1 {printf "%-20s %-20s %s\n",$4,$5,$6}' netstat.txt
Local-Address Foreign-Address State
0.0.0.0:3306 0.0.0.0:* LISTEN
0.0.0.0:80 0.0.0.0:* LISTEN
127.0.0.1:9000 0.0.0.0:* LISTEN
:::22 :::* LISTEN
内建变量
$0 当前记录(这个变量中存放着整个行的内容)
$1~$n 当前记录的第n个字段,字段间由FS分隔
FS 输入字段分隔符 默认是空格或Tab
NF 当前记录中的字段个数,就是有多少列
NR 已经读出的记录数,就是行号,从1开始,如果有多个文件话,这个值也是不断累加中。
FNR 当前记录数,与NR不同的是,这个值会是各个文件自己的行号
RS 输入的记录分隔符, 默认为换行符
OFS 输出字段分隔符, 默认也是空格
ORS 输出的记录分隔符,默认为换行符
FILENAME 当前输入文件的名字
指定分隔符
$ awk 'BEGIN{FS=":"} {print $1,$3,$6}' /etc/passwd
// 等价
$ awk -F: '{print $1,$3,$6}' /etc/passwd
// 多个分隔符
awk -F '[;:]'
折分文件
awk拆分文件很简单,使用重定向就好了。下面这个例子,是按第6例分隔文件,相当的简单(其中的NR!=1表示不处理表头)。
$ awk 'NR!=1{print > $6}' netstat.txt
$ ls
ESTABLISHED FIN_WAIT1 FIN_WAIT2 LAST_ACK LISTEN netstat.txt TIME_WAIT
统计
计算所有的C文件,CPP文件和H文件的文件大小总和
$ ls -l *.cpp *.c *.h | awk '{sum+=$5} END {print sum}'
// 统计各个connection状态的用法, 注意其中的数组的用法
$ awk 'NR!=1{a[$6]++;} END {for (i in a) print i ", " a[i];}' netstat.txt
TIME_WAIT, 3
FIN_WAIT1, 1
ESTABLISHED, 6
FIN_WAIT2, 3
LAST_ACK, 1
LISTEN, 4
// 统计每个用户的进程的占了多少内存
$ ps aux | awk 'NR!=1{a[$1]+=$6;} END { for(i in a) print i ", " a[i]"KB";}'
dbus, 540KB
mysql, 99928KB
www, 3264924KB
You get the sum of all the numbers in a column:
awk '{total += $1} END {print total}' earnings.txt
Shell cannot calculate with floating point numbers, but awk can:
awk 'BEGIN {printf "%.3f\n", 2005.50 / 3}'
awk脚本
BEGIN, END这两个关键字意味着执行前和执行后的意思,语法如下:
- BEGIN{ 这里面放的是执行前的语句 }
- END {这里面放的是处理完所有的行后要执行的语句 }
- {这里面放的是处理每一行时要执行的语句}
假设有这么一个文件(学生成绩表):
$ cat score.txt
Marry 2143 78 84 77
Jack 2321 66 78 45
Tom 2122 48 77 71
Mike 2537 87 97 95
Bob 2415 40 57 62
awk脚本如下
#!/bin/awk -f
#运行前
BEGIN {
math = 0
english = 0
computer = 0
printf "NAME NO. MATH ENGLISH COMPUTER TOTAL\n"
printf "---------------------------------------------\n"
}
#运行中
{
math+=$3
english+=$4
computer+=$5
printf "%-6s %-6s %4d %8d %8d %8d\n", $1, $2, $3,$4,$5, $3+$4+$5
}
#运行后
END {
printf "---------------------------------------------\n"
printf " TOTAL:%10d %8d %8d \n", math, english, computer
printf "AVERAGE:%10.2f %8.2f %8.2f\n", math/NR, english/NR, computer/NR
}
// 这样运行 ./cal.awk score.txt
环境变量
即然说到了脚本,我们来看看怎么和环境变量交互:(使用-v参数和ENVIRON,使用ENVIRON的环境变量需要export)
$ x=5
$ y=10
$ export y
$ echo $x $y
5 10
$ awk -v val=$x '{print $1, $2, $3, $4+val, $5+ENVIRON["y"]}' OFS="\t" score.txt
Marry 2143 78 89 87
Jack 2321 66 83 55
Tom 2122 48 82 81
Call AWK From Shell Script
A shell script to list all IP addresses that accessing your website. This script use awk for processing log file and verification is done using shell script commands.
#!/bin/bash
d=$1
OUT=/tmp/spam.ip.$$
HTTPDLOG="/www/$d/var/log/httpd/access.log"
[ $# -eq 0 ] && { echo "Usage: $0 domain-name"; exit 999; }
if [ -f $HTTPDLOG ];
then
awk '{print}' $HTTPDLOG >$OUT
awk '{ print $1}' $OUT | sort -n | uniq -c | sort -n
else
echo "$HTTPDLOG not found. Make sure domain exists and setup correctly."
fi
/bin/rm -f $OUT
AWK and Shell Functions
Here is another example. chrootCpSupportFiles() find out the shared libraries required by each program (such as perl / php-cgi) or shared library specified on the command line and copy them to destination. This code calls awk to print selected fields from the ldd output:
chrootCpSupportFiles() {
# Set CHROOT directory name
local BASE="$1" # JAIL ROOT
local pFILE="$2" # copy bin file libs
[ ! -d $BASE ] && mkdir -p $BASE || :
FILES="$(ldd $pFILE | awk '{ print $3 }' |egrep -v ^'\(')"
for i in $FILES
do
dcc="$(dirname $i)"
[ ! -d $BASE$dcc ] && mkdir -p $BASE$dcc || :
/bin/cp $i $BASE$dcc
done
sldl="$(ldd $pFILE | grep 'ld-linux' | awk '{ print $1}')"
sldlsubdir="$(dirname $sldl)"
if [ ! -f $BASE$sldl ];
then
/bin/cp $sldl $BASE$sldlsubdir
else
:
fi
}
This function can be called as follows:
chrootCpSupportFiles /lighttpd-jail /usr/local/bin/php-cgi
AWK and Shell Pipes
List your top 10 favorite commands:
history | awk '{print $2}' | sort | uniq -c | sort -rn | head
Sample Output:
172 ls
144 cd
69 vi
62 grep
41 dsu
36 yum
29 tail
28 netstat
21 mysql
20 cat
Another example to find out domain expiry date:
$ whois cyberciti.com | awk '/Registry Expiry Date:/ { print $4 }'
2018-07-31T18:42:58Z