Linux文本处理三剑客——grep

2018-05-30  本文已影响0人  Hye_Lau

文本处理三工具:grep,sed,awk

一.grep

作用:

模式:

由正则表达式的元字符及文本字符所编写出的过滤条件;

正则表达式引擎:

grep [OPTIONS] PATTERN [FILE...]
grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]

常用选项

--color=auto:对匹配到的文本着色高亮显示;
-i,ignorecase:忽略字符的大小写;
-o:仅显示匹配到的字符串本身;
-v,--invert-match:显示不能被模式匹配到的行;
-E: 支持使用扩展的正则表达式元字符;
-q,--quiet, --silent:静默模式,即不输出任何信息;
-A #:after,后#行
-B #:before,前#行
-C #:context,前后各#行

二.基本正则表达式元字符

1.字符匹配

 .:匹配任意单个字符;
[]:匹配指定范围内的任意单个字符;
[^]:匹配指定范围外的任意单个字符;

以上[]中的范围有以下几种表示方法:

2.匹配次数

*:匹配其前面的字符任意次;0,1,多次;
     例如:grep “x*y"    abxy、aby、xxxxy、yab均匹配
.*:匹配任意长度的任意字符
\?:匹配其前面的字符0次或1次;即其前面的字符是可有可无的;
\+:匹配其前面的字符1次或多次;即前面的字符要出现至少1次;
\{m\}:匹配其前面的字符m次;
\{m,n\}:匹配其前面的字符至少m次,至多n次;
\{0,n\}:至多n次;
\{m,\}:至少m次;

3.位置锚定

^:行首锚定;用于模式的最左侧;
$:行尾锚定;用于模式的最右侧;
^PATTERN$:用PATTERN来匹配整行;
       ^$:空白行;  
       ^[[:space:]]*$:空行或包含空白字符的行;

单词:非特殊字符组成的连续字符(字符串)都成为单词;
\<或\b:词首锚定,用于单词模式的左侧;
\>或\b:词尾锚定,用于单词模式的右侧;
\<PATTERN\>:匹配完整单词;

4.分组及引用

 \1:模式从左侧起,第一个左括号以及与之匹配的右括号之间的模式所匹配到的字符;
 \2:模式从左侧起,第二个左括号以及与之匹配的右括号之间的模式所匹配到的字符;
 \3:
     ....

实例

1.查找特定字符串;
//(1)从文件中/scripts/regular_express.txt 中取得the这个特定字符串
[root@localhost ~]# grep -n 'the' ~/scripts/regular_express.txt 
8:I can't finish the test.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

//(2)取反,取出不含有'the'这个字符串的行,显示8/12/15/16/18以外的行
[root@localhost ~]# grep -vn 'the' ~/scripts/regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
5:However, this dress is about $ 3183 dollars.
6:GNU is free air not free beer.
7:Her hair is very beauty.
9:Oh! The soup taste good.
10:motorcycle is cheap than car.
11:This window is clear.
13:Oh!  My god!
14:The gd software is a library for drafting programs.
17:I like dog.
19:goooooogle yes!
20:go! go! Let's go.
21:# I am VBird

//(3)取出不区分大小写的'the'字符
[root@localhost ~]# grep -in 'the' ~/scripts/regular_express.txt 
8:I can't finish the test.
9:Oh! The soup taste good.
12:the symbol '*' is represented as start.
14:The gd software is a library for drafting programs.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.
2.利用中括号[ ]来查找集合字符集
//(1)查找test或taste这两个单词,它们有共同的't?st'存在
[root@localhost ~]# grep -n 't[ae]st' ~/scripts/regular_express.txt 
8:I can't finish the test.
9:Oh! The soup taste good.

//(2)利用反向选择[^]查找oo前面不为g的字符
[root@localhost ~]# grep -n '[^g]oo' ~/scripts/regular_express.txt 
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

//(3)查找ooq前面不为小写字母的字符
[root@localhost ~]# grep -n '[^a-z]oo' ~/scripts/regular_express.txt 
3:Football game is not use feet only.
或
[root@localhost ~]# grep -n '[^[:lower:]]oo' ~/scripts/regular_express.txt 
3:Football game is not use feet only.

//(4)取出有数字的行
[root@localhost ~]# grep -n '[0-9]' ~/scripts/regular_express.txt 
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.
或
[root@localhost ~]# grep -n '[[:digit:]]' ~/scripts/regular_express.txt 
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.
3.行首字符^与行尾字符$
//(1)查找行首为the的行
[root@localhost ~]# grep -n '^the' ~/scripts/regular_express.txt 
12:the symbol '*' is represented as start.

//(2)查找开头是小写字符的行
[root@localhost ~]# grep -n '^[a-z]' ~/scripts/regular_express.txt 
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
或
[root@localhost ~]# grep -n '^[[:lower:]]' ~/scripts/regular_express.txt 
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

//(3)查找空白行
grep -n '^$' ~/scripts/regular_express.txt 
4.任意一个字符.与重复字符*
//(1).代表一定有一个任意字符
[root@localhost ~]# grep -n 'g..d' ~/scripts/regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.
16:The world <Happy> is the same with "glad".

//(2)*代表0或无穷多次
[root@localhost ~]# grep -n 'ooo*' ~/scripts/regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!
5.限定连续RE字符范围{}
//(1)找到有两个o的字符串
[root@localhost ~]# grep -n 'o\{2\}' ~/scripts/regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

//(2)找出g后面接2到5个o,然后再接一个g的字符串
[root@localhost ~]# grep -n 'go\{2,5\}g' ~/scripts/regular_express.txt 
18:google is the best tools for search keyword.

//(3)找出2个o以上的gooo...g
[root@localhost ~]# grep -n 'go\{2,\}g' ~/scripts/regular_express.txt 
18:google is the best tools for search keyword.
19:goooooogle yes!

练习

[root@localhost ~]# grep -v "/bin/bash$" /etc/passwd
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
......
slackware:x:2002:2018::/home/slackware:/bin/tcsh
[root@localhost ~]# grep "\<[0-9]\{2,3\}\>" /etc/passwd
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
......
basher:x:502:502::/home/basher:/bin/bash
grep  "^[[:space:]] \+[^[:space:]]" /etc/rc.d/rc.sysinit
[root@localhost ~]# netstat -tan | grep "LISTEN[[:space:]]*$"
tcp        0      0 0.0.0.0:43150               0.0.0.0:*                   LISTEN      
tcp        0      0 0.0.0.0:111                 0.0.0.0:*                   LISTEN      
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      
tcp        0      0 127.0.0.1:631               0.0.0.0:*                   LISTEN      
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      
tcp        0      0 :::111                      :::*                        LISTEN      
tcp        0      0 :::60309                    :::*                        LISTEN      
tcp        0      0 :::22                       :::*                        LISTEN      
tcp        0      0 ::1:631                     :::*                        LISTEN      
tcp        0      0 ::1:25                      :::*                        LISTEN     

三.egrep:

命令格式及常用选项

egrep [OPTIONS] PATTERN [FILE...]
OPTIONS:-i,-o,-v,-A,-B,-C
 -G:支持基本正则表达式     

四.扩展正则表达式的元字符:

1.字符匹配

.:任意单个字符
[]:指定范围内的任意单个字符
[^]:指定范围外的任意单个字符

2.次数匹配

*:任意次,0,1或多次;
?:0次或1次,其前面的字符是可有可无的;
+:其前面字符至少1次;
{m}:其前面的字符m次;
{m,n}:至少m次,至多n次;
{0,m}
{m,}

3.位置锚定

^:行首锚定;用于模式的最左侧;
$:行尾锚定;用于模式的最右侧;
\<或\b:词首锚定,用于单词模式的左侧;
\>或\b:词尾锚定,用于单词模式的右侧;
\<PATTERN\>:匹配完整单词;

4.分组及引用

a|b:a或者b;
C|cat:C或cat
(c|C)at:cat 或Cat

练习:

grep -i '^s' /proc/meminfo
grep '^[Ss]' /proc/meminfo
grep -E '^(S|s) /proc/meminfo
[root@localhost ~]# grep -i '^s' /proc/meminfo
SwapCached:            0 kB
SwapTotal:       1023992 kB
SwapFree:        1023992 kB
Shmem:               236 kB
Slab:              61920 kB
SReclaimable:      31508 kB
SUnreclaim:        30412 kB

[root@localhost ~]# grep -E '^(S|s)' /proc/meminfo
SwapCached:            0 kB
SwapTotal:       1023992 kB
SwapFree:        1023992 kB
Shmem:               236 kB
Slab:              61924 kB
SReclaimable:      31504 kB
SUnreclaim:        30420 kB
[root@localhost ~]# grep  '^[Ss]' /proc/meminfo
SwapCached:            0 kB
SwapTotal:       1023992 kB
SwapFree:        1023992 kB
Shmem:               236 kB
Slab:              61904 kB
SReclaimable:      31504 kB
SUnreclaim:        30400 kB
上一篇 下一篇

猜你喜欢

热点阅读