正则表达式入门

2017-06-05 本文已影响36人数据革命

正则表达式

关于正则表达式相信很多学计算机的人都听说过
尤其是做编程行业的人

那什么是正则表达式

正则表达式，又称规则表达式。（英语：Regular Expression，在代码中常简写为regex、regexp或RE），计算机科学的一个概念。正则表通常被用来检索、替换那些符合某个模式(规则)的文本

正则表达式的“鼻祖”或许可一直追溯到科学家对人类神经系统工作原理，好了不啰嗦了本章是说怎么学正则表达式
我们本章说的是linux正则表达式的学习
想学正则表达式首先你得会文本命令如三剑客grep,sed,awk
还有文本一些重要的文本命令来相互结合才能说学会一点点
因为正则表达式灵活性非常强大，在编程界有百灵鸟之称
那我们开始说说他开始怎么学吧，我们先从最简单的开始
正则表达式是你定义的、Linux工具用来过滤文本的模式模板。Linux工具（如grep，egrep）能
够在数据流向工具时对数据进行正则表达式模式匹配。如果数据匹配模式，它就会被接受并进一
步处理。如果数据不匹配模式，它就会被过滤掉。示意图如下：

文本处理工具

cat 命令

这个命令我们用的非常多，一般拿来查看文件，但是他的选项我们到不是经常用，这里给大家多啰嗦啰嗦说说他的选项，也是为了加深我学习记忆

  -A, --show-all               等价于 -vET
  -b, --number-nonblank        对非空输出行编号
  -e                           等价于 -vE
  -E, --show-ends              在每行结束处显示 $
  -n, --number                 对输出的所有行编号
  -s, --squeeze-blank          不输出多行空行
  -t                           与 -vT 等价
  -T, --show-tabs              将跳格字符显示为 ^I
  -v, --show-nonprinting       使用 ^ 和 M- 引用，除了 LFD 和 TAB 之外
  --help                       显示此帮助信息并退出
  --version                    输出版本信息并退出

[root@localhost ~]#cat test        #普通输出
1111111111


2222222222

3333333333

[root@localhost ~]#cat -n test     #开头显示行号
     1  1111111111
     2
     3
     4  2222222222
     5
     6  3333333333

[root@localhost ~]#cat -E test     #以$结束
1111111111$
$
$
2222222222$
$
3333333333$

[root@localhost ~]#cat -s test     #超过二个空行，合并成一个
1111111111

2222222222

3333333333

[root@localhost ~]#cat -ns test   #去空行，加行号
     1  1111111111
     2
     3  2222222222
     4
     5  3333333333


[root@localhost ~]#cat x* > google_bak.tar.gz   #合并文件

[root@localhost ~]#cat test.tar.gz_?? > test.tar.gz   #可以用cat命令将被切割的多个压缩包合并成一个

[root@localhost ~]#tar -xvzf test.tar.gz            #再用tar命令解压

[root@localhost ~]#cat > aa  #从键盘录入内容到文件，回车是保存，退出Ctrl+z
4234234
234234


[root@localhost ~]#cat file1 file2 > file  #合并二个文件为一个

tail 命令

-c, --bytes=N                                        输出最后N个字节
-f, --follow[={name|descriptor}]            当文件增长时,输出后续添加的数据
-n, --lines=N                                         输出最后N行,而非默认的最后10行
--pid=PID                                              与-f合用,表示在进程ID,PID死掉之后结束.
-q, --quiet, --silen                                 从不输出给出文件名的首部
-s, --sleep-interval=S                            与-f合用,表示在每次反复的间隔休眠S秒
-v, --verbose                                         总是输出给出文件名的首部
--help                                                   显示帮助信息后退出
--version                                               输出版本信息后退出


[root@localhost ~]#tail /etc/passwd                              默认，显示最后10 行。

[root@localhost ~]#tail -n 2 /etc/passwd                     显示最后2行

[root@localhost ~]#tail -q -n k file1 file2 file3            显示多文件最后k行，并且不显示文件名的文件头

[root@localhost ~]#tail -n +k /etc/passwd                  从开头第k行处开始输出。

[root@localhost ~]#tail -f /var/log/messages              参数-f使tail不停地去读最新的内容，因此有实时监视的效果，用Ctrl＋c来终止

 tail -n+10 file.txt | head -1                                   显示file.txt的第10行

[root@localhost ~]#cat 1.txt  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

[root@localhost ~]#tail 1.txt  
11
12
13
14
15
16
17
18
19
20

[root@localhost ~]#tail -3 1.txt  
18
19
20

[root@localhost ~]#tail -n 3 1.txt  
18
19
20

[root@localhost ~]#tail --lines=3 1.txt  
18
19
20

[root@localhost ~]#tail -n +14 1.txt  
14
15
16
17
18
19
20

more 命令

- more命令是用来和man命令查看帮助用
-  空格或者f是下一页
-  按Space键：显示文本的下一屏内容。 按Enier键：只显示文本的下一行内容。 
-  按斜线符|：接着输入一个模式，可以在文本中寻找下一个相匹配的模式。
-   按H键：显示帮助屏，该屏上有相关的帮助信息。 按B键：显示上一屏内容。
-   按Q键：退出rnore命令。

less命令

less命令文件内容查看 less命令的作用与more十分相似，都可以用来浏览文字档案的内容，不同的是less命令允许用户向前或向后浏览文件，而more命令只能向前浏览。用less命令显示文件时，用PageUp键向上翻页，用PageDown键向下翻页。要退出less程序，应按Q键。

-e：文件内容显示完毕后，自动退出
-f：强制显示文件
-g：不加亮显示搜索到的所有关键词，仅显示当前显示的关键字，以提高显示速度；
-l：搜索时忽略大小写的差异
-N：每一行行首显示行号
-s：将连续多个空行压缩成一行显示
-S：在单行显示较长的内容，而不换行显示；
-x<数字>：将TAB字符显示为指定个数的空格字符

tr转换命令

tr命令可以对来自标准输入的字符进行替换、压缩和删除。它可以将一组字符变成另一组字符，经常用来编写优美的单行命令，作用很强大。

用法：tr [选项]... SET1 [SET2]
从标准输入中替换、缩减和/或删除字符，并将结果写到标准输出。

  -c, -C, --complement             首先补足SET1
  -d, --delete                     删除匹配SET1 的内容，并不作替换
  -s, --squeeze-repeats            如果匹配于SET1 的字符在输入序列中存在连续的
                                   重复，在替换时会被统一缩为一个字符的长度
  -t, --truncate-set1              先将SET1 的长度截为和SET2 相等

字符	介绍
`\\`	反斜杠
\a	终端鸣响
\b	退格
\f	换页
\n	换行
\r	回车
\t	水平制表符
\v	垂直制表符

[root@localhost ~]#echo "TANK" |tr A-Z a-z   大写字母转小写
tank

[root@localhost ~]#echo 'tank zhang' | tr a-z A-Z    小写字线转大写
TANK ZHANG

[root@localhost ~]#cat aaa.txt       原文件
aaa

bbb

[root@localhost ~]#cat aaa.txt|tr 'a' 'c'     字母c替换字母a
ccc

bbb

[root@localhost ~]#cat aaa.txt|tr -d 'a'    删除所有字母a


bbb

[root@localhost ~]#cat aaa.txt|tr -d '\n\t' 
aaabbb
删除文件file中出现的换行'\n'、制表'\t'字符

[root@localhost ~]#cat aaa.txt|tr -s [a-zA-Z]   删除重复的字母
a

b

[root@localhost ~]#cat aaa.txt|tr -s '\n'    删除空行
aaa
bbb

[root@localhost ~]#cat aaa.txt |tr -s '\011' '\040'   用空格符\040替换制表符\011
aaa

bbb

[root@localhost ~]#tr a c < test     将test文件中的a变成c

字符	所有和指定字符相等的字符
[:alnum:]	所有的字母和数字
[:alpha:]	所有的字母
[:blank:]	所有呈水平排列的空白字符
[:cntrl:]	所有的控制字符
[:digit:]	所有的数字
[:graph:]	所有的可打印字符，包括空格
[:lower:]	所有的小写字母
[:print:]	所有的可打印字符，包括空格
[:space:]	所有呈水平或垂直排列的空白字符
[:upper:]	所有的大写字母
[:xdigit:]	所有的十六进制数

cut命令

在文件的每一行中提取片断

-b, --bytes=LIST                                     输出 这些 字节 
-c, --characters=LIST                             输出 这些 字符 
-d, --delimiter=DELIM                          使用 DELIM 取代 TAB 做 字段(field) 分隔符 
-f, --fields=LIST                                     输出 这些 字段 
-s, --only-delimited                              不显示 没有 分隔符 的 行 

[root@localhost ~]#cat /etc/passwd | cut -b 1 |head -5      #输出文件的第一个字节
r
b
d
a
l

[root@localhost ~]#cat /etc/passwd | cut -c 1-4 |head -5    #输出文件的前四个字符
root
bin:
daem
adm:
lp:x

[root@localhost ~]#cat /etc/passwd | cut -f1 -d ':' |head -5   #以:分割文件，输出第一个字段
root
bin
daemon
adm
lp


[root@localhost ~]#cat a.txt |cut -f1,3 -d $'\t'   #1，3列
ssss    dddd
rrr     adfa


    
[root@localhost ~]#cut -c4 file.txt #将所有行的第四个字符打印出来。
x  
u  
l  

[root@localhost ~]#cut -c4,6 file.txt   #将每一行的第四个和第六个字符打印出来
xo  
ui  
ln  

[root@localhost ~]#cut -c4-7 file.txt  #将第四个到第七个字符打印出来，注意是闭区间。
x or  
unix  
linu  


[root@localhost ~]#cut -c-6 file.txt   #将每一行的前六个字符都打印出来
unix o  
is uni  
is lin  

[root@localhost ~]#cut -c10- file.txt  #将从起始位置到行末的所有文本都打印出来
inux os  
ood os  
good os  

[root@localhost ~]#cut -d ' ' -f2 file.txt   #定义空格为一行的分隔符，并将每一行的第二个字段打印出来
or  
unix  
linux  

[root@localhost ~]#cut -d ' ' -f2,3 file.txt    #将第二个字段和第三个字段打印出来
or linux  
unix good  
linux good 

 [root@localhost ~]#cut -d ' ' -f1-3 file.txt    #将第一个字段、第二个字段、第三个字段的内容都打印出来

[root@localhost ~]#cut -d ' ' -f-3 file.txt     #将前三个字段都打印出来

paste命令

用法：paste [选项]... [文件]...
-d, --delimiters=列表改用指定列表里的字符替代制表分隔符
-s, --serial 不使用平行的行目输出模式，而是每个文件占用一行

[root@localhost ~]#paste test1 test     合并输出二文件
asdfasdfas  1234
asdfasdf    

[root@localhost ~]#echo -n "aaa" | paste -s   对输出的内容独立占一行
aaa

wc命令

wc命令的功能为统计指定文件中的字节数、单词数、行数, 并将统计结果显示输出

-c, --bytes                    印字节数
-m, --chars                    打印字符数 
-l, --lines                    打印行数
-L, --max-line-length          打印最长行的长度
-w, --words                    打印单词数

[root@localhost ~]#cat /etc/passwd |wc -l    查看passwd文件有多少行
38

[root@localhost ~]#echo "aaa bbb ccc" |wc -w    查看输出有多少个单词
3

[root@localhost ~]#echo "12344" |wc -m  查看输出有多少个字符
6

sort命令

用法：sort [选项]... [文件]...

 -b, --ignore-leading-blanks           略前导的空白区域
 -d, --dictionary-order                只考虑空白区域和字母字符
 -f, --ignore-case                     忽略字母大小写
 -g, --general-numeric-sort            按照常规数值排序
 -i, --ignore-nonprinting              只排序可打印字符
 -h, --human-numeric-sort              使用易读性数字(例如： 2K 1G)
 -n, --numeric-sort                    根据字符串数值比较
 -R, --random-sort                     根据随机hash 排序
      --random-source=文件              从指定文件中获得随机字节
 -r, --reverse                         逆序输出排序结果
 -V, --version-sort                    在文本内进行自然版本排序
 -r                                    执行反方向（由上至下）整理
 -n                                    执行按数字大小整理
 -t                                    c 选项使用c 做为字段界定符
 -k                                    X 选项按照使用c 字符分隔的X

[root@localhost ~]#cat /etc/passwd | sort                 

 sort 是默认以第一个数据来排序，而且默认是以字符串形式来排序,所以由字母 a 开始升序排序。

[root@localhost ~]#cat /etc/passwd | sort -t ':' -k 3       
/etc/passwd 内容是以 : 来分隔的，我想以第三栏来排序，该如何

[root@localhost ~]#cat /etc/passwd | sort -t ':' -k 3n      
用数字排序，默认是以字符串来排序的

[root@localhost ~]#cat /etc/passwd | sort -t ':' -k 3nr      
倒序排列，默认是升序排序

[root@localhost ~]#cat /etc/passwd | sort -t':' -k 6.2,6.4 -k 1r      
对/etc/passwd,先以第六个域的第2个字符到第4个字符进行正向排序，再基于第一个域进行反向排序

[root@localhost ~]#cat /etc/passwd |  sort -t':' -k 7 -u      
查看/etc/passwd有多少个shell:对/etc/passwd的第七个域进行排序，然后去重

uniq命令

用法：uniq [选项]... [文件]
从输入文件或者标准输入中筛选相邻的匹配行并写入到输出文件或标准输出，不附加任何选项时匹配行将在首次出现处被合并，相近的行将会删除

 -c, --count                          在每行前加上表示相应行目出现次数的前缀编号
  -d, --repeated                      只输出重复的行
  -D, --all-repeated[=delimit-method  显示所有重复的行
  -f, --skip-fields=N                 比较时跳过前N 列
  -i, --ignore-case                   在比较的时候不区分大小写
  -s, --skip-chars=N                  比较时跳过前N 个字符
  -u, --unique                        只显示唯一的行
  -z, --zero-terminated               使用'\0'作为行结束符，而不是新换行
  -w, --check-chars=N                 对每行第N 个字符以后的内容不作对照

[root@localhost ~]#cat uniqtest    测试文件
this is a test  
this is a test  
this is a test  
i am tank  
i love tank  
i love tank  
this is a test  
whom have a try  
WhoM have a try  
you  have a try  
i want to abroad  
those are good men  
we are good men  

[root@localhost ~]#uniq -c uniqtest    uniq的一个特性，检查重复行的时候，只会检查相邻的行。重复数据，肯定有很多不是相邻在一起的
 3 this is a test
 1 i am tank
 2 i love tank
 1 this is a test          和第一行是重复的
 1 whom have a try
 1 WhoM have a try
 1 you? have a try
 1 i want to abroad
 1 those are good men
 1 we are good men

[root@localhost ~]#sort uniqtest |uniq -c      这样就可以解决上个例子中提到的问题
 1 WhoM have a try  
 1 i am tank  
 2 i love tank  
 1 i want to abroad  
 4 this is a test  
 1 those are good men  
 1 we are good men  
 1 whom have a try  
 1 you  have a try  

[root@localhost ~]# uniq -d -c uniqtest      uniq -d 只显示重复的行
 3 this is a test  
 2 i love tank  

[root@localhost ~]# uniq -D uniqtest       uniq -D 只显示重复的行，并且把重复几行都显示出来。他不能和-c一起使用
 this is a test  
 this is a test  
 this is a test  
 i love tank  
 i love tank  

[root@localhost ~]#uniq -f 1 -c uniqtest    在这里those只有一行，显示的却是重复了，这是因为，-f 1 忽略了第一列，检查重复从第二字段开始的。
 3 this is a test  
 1 i am tank  
 2 i love tank  
 1 this is a test  
 2 whom have a try  
 1 you  have a try  
 1 i want to abroad  
 2 those are good men     只有一行，显示二行  

[root@localhost ~]#uniq -i -c uniqtest     检查的时候，不区分大小写
 3 this is a test  
 1 i am tank  
 2 i love tank  
 1 this is a test  
 2 whom have a try  #一个大写，一个小写  
 1 you  have a try  
 1 i want to abroad  
 1 those are good men  
 1 we are good men  

[root@localhost ~]#uniq -s 4 -c uniqtest    检查的时候，不考虑前4个字符，这样whom have a try 就和 you have a try 就一样了。
 3 this is a test  
 1 i am tank  
 2 i love tank  
 1 this is a test  
 3 whom have a try    根上一个例子有什么不同  
 1 i want to abroad  
 1 those are good men  
 1 we are good men  

[root@localhost ~]#uniq -u uniqtest     去重复的项，然后全部显示出来
 i am tank  
 this is a test  
 whom have a try  
 WhoM have a try  
 you  have a try  
 want to abroad  
 those are good men  
 we are good men 

[root@localhost ~]#uniq -w 2 -c uniqtest  对每行第2个字符以后的内容不作检查，所以i am tank 根 i love tank就一样了。
 3 this is a test  
 3 i am tank  
 1 this is a test  
 1 whom have a try  
 1 WhoM have a try  
 1 you  have a try  
 1 i want to abroad  
 1 those are good men  
 1 we are good men  

[root@localhost ~]#grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/nginx/access.log |sort |uniq -c    查看nginx访问IP数
      1 101.200.78.64
      2 103.41.52.94
      1 106.185.47.161
      2 113.240.250.155
    260 13.0.782.215
      2 185.130.5.231
     26 192.168.10.16
      6 192.168.10.17
    148 192.168.10.2
    189 192.168.10.202
    270 192.168.10.222
     25 192.168.10.235
    291 192.168.10.3
     12 192.168.10.5
      2 23.251.63.45
     20 7.0.11.0

diff 和 patch命令

diff比较2个文件的区别

[root@localhost ~]#diff test1.rb test.rb            比较二个文件的不同



[root@localhost ~]#diff myweb/ html/                  比较二个文件夹的不同

patch

[root@localhost ~]#diff -Nrua linux-2.6.14/Makefile  linux-2.6.26/Makefile >c.patch #cat c.patch

grep命令

grep的工作方式是这样的，它在一个或多个文件中搜索字符串模板。如果模板包括空格，则必须被引用，模板后的所有字符串被看作文件名。搜索的结果被送到屏幕，不影响原文件内容。
grep 家族分为，三大类分别是grep ,egrep,fgrep，fgrep不支持正则表达式
我们这里只说grep，egrep下次在说

 --color=auto:  对匹配到的文本着色显示
 -v:  显示不被pattern 匹配到的行
 -i:  忽略字符大小写
 -n： ： 显示匹配的行号
 -c:  统计匹配的行数
 -o:  仅显示匹配到的字符串
 -q:  静默模式，不输出任何信息
 -A #: after,  后#行 行
 -B #: before,  前#行 行
 -C # ：context,  前后各#行 行
 -e ：实现多个选项间的逻辑or 关系
 -w ：匹配 整个单词
 -E ：使用ERE
 -F ：相当于fgrep
 -E, --extended-regexp     扩展正则表达式egrep
 -F, --fixed-strings       一个换行符分隔的字符串的集合fgrep
 -G, --basic-regexp        基本正则
 -P, --perl-regexp         调用的perl正则
 -e, --regexp=PATTERN      后面根正则模式，默认无
 -f, --file=FILE           从文件中获得匹配模式
 -w, --word-regexp         匹配整个单词
 -x, --line-regexp         匹配整行
 -z, --null-data           一个 0 字节的数据行，但不是空行

测试文件   /etc/passwd  里面用到了正则表达式和扩展正则表达式

root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/bin/false,aaa,bbbb,cccc,aaaaaa
DADddd:x:2:2:daemon:/sbin:/bin/false
mail:x:8:12:mail:/var/spool/mail:/bin/false
ftp:x:14:11:ftp:/home/ftp:/bin/false
&nobody:$:99:99:nobody:/:/bin/false
zhangy:x:1000:100:,,,:/home/zhangy:/bin/bash
http:x:33:33::/srv/http:/bin/false
dbus:x:81:81:System message bus:/:/bin/false
hal:x:82:82:HAL daemon:/:/bin/false
mysql:x:89:89::/var/lib/mysql:/bin/false
aaa:x:1001:1001::/home/aaa:/bin/bash
ba:x:1002:1002::/home/zhangy:/bin/bash
test:x:1003:1003::/home/test:/bin/bash
zhangying:*:1004:1004::/home/test:/bin/bash
policykit:x:102:1005:Po

a,匹配含有root的行

[root@localhost ~]#grep root test  
root:x:0:0:root:/root:/bin/bash  

b,匹配以root开头或者以zhang开头的行，注意反斜杠

[root@localhost ~]#cat test |grep '^\(root\|zhang\)'  
root:x:0:0:root:/root:/bin/bash  
zhangy:x:1000:100:,,,:/home/zhangy:/bin/bash  

c,匹配以root开头或者以zhang开头的行，注意反斜杠,根上面一个例子一样，-e默认是省去的

[root@localhost ~]#cat test |grep -e '^\(root\|zhang\)'  
root:x:0:0:root:/root:/bin/bash  
zhangy:x:1000:100:,,,:/home/zhangy:/bin/bash  

d,匹配以zhang开头，只含有字母

[root@localhost ~]#echo 'zhangying' |grep '^zhang[a-z]*$'  
zhangying  

e,匹配以bin开头的行,用的egrep，在这里可以换成-F,-G

[root@localhost ~]#cat test |grep -E '^bin'  
bin:x:1:1:bin:/bin:/bin/false,aaa,bbbb,cccc,aaaaaa  

f,在匹配的行前面加上该行在文件中，或者输出中所在的行号

[root@localhost ~]#cat test|grep -n zhangy  
7:zhangy:x:1000:100:,,,:/home/zhangy:/bin/bash  
13:ba:x:1002:1002::/home/zhangy:/bin/bash  
15:@zhangying:*:1004:1004::/home/test:/bin/bash 

g,不匹配以bin开头的行,并显示行号

[root@localhost ~]#cat test|grep -nv '^bin'  
root:x:0:0:root:/root:/bin/bash
DADddd:x:2:2:daemon:/sbin:/bin/false
mail:x:8:12:mail:/var/spool/mail:/bin/false
ftp:x:14:11:ftp:/home/ftp:/bin/false
&nobody:$:99:99:nobody:/:/bin/false
zhangy:x:1000:100:,,,:/home/zhangy:/bin/bash
http:x:33:33::/srv/http:/bin/false
dbus:x:81:81:System message bus:/:/bin/false
hal:x:82:82:HAL daemon:/:/bin/false
mysql:x:89:89::/var/lib/mysql:/bin/false
aaa:x:1001:1001::/home/aaa:/bin/bash
ba:x:1002:1002::/home/zhangy:/bin/bash
test:x:1003:1003::/home/test:/bin/bash
zhangying:*:1004:1004::/home/test:/bin/bash
policykit:x:102:1005:Po

h,显示匹配的个数，不显示内容

[root@localhost ~]#cat test|grep -c zhang  
3  

i,匹配system，没有加-i没有匹配到东西。

[root@localhost ~]#grep  system test  
[root@localhost ~]#grep -ni  system test  
9:dbus:x:81:81:System message bus:/:/bin/false  

j,匹配zhan没有匹配到东西，匹配zhangy能匹配到，因为在test文件中，有zhangy这个单词

[root@localhost ~]#cat test|grep -w zhan  
[root@localhost ~]#cat test|grep -w zhangy  
zhangy:x:1000:100:,,,:/home/zhangy:/bin/bash  
ba:x:1002:1002::/home/zhangy:/bin/bash  

k,在这里-x后面东西，和输出中的整行相同时，才会输出

[root@localhost ~]#echo "aaaaaa" |grep -x aaa  
[root@localhost ~]#echo "aaaa" |grep -x aaaa  
aaaa  

l,最多只匹配一次，如果把-m 1去掉的话，会有三个

[root@localhost ~]#cat test |grep -m 1 zhang  
zhangy:x:1000:100:,,,:/home/zhangy:/bin/bash  


m,匹配行的前面显示块号，这个块号是干什么的，不知道，有谁知道可否告诉我一下

[root@localhost ~]#cat test |grep -b zha  
241:zhangy:x:1000:100:,,,:/home/zhangy:/bin/bash  
480:ba:x:1002:1002::/home/zhangy:/bin/bash  
558:@zhangying:*:1004:1004::/home/test:/bin/bash  

n,多文件匹配时，在匹配的行前面加上文件名

[root@localhost ~]#grep -H 'root' test test2 testbak  
test:root:x:0:0:root:/root:/bin/bash  
test2:root  
testbak:root:x:0:0:root:/root:/bin/bash  

o,多文件匹配时，在匹配的行前面不加上文件名

[root@localhost ~]#grep -h 'root' test test2 testbak  
root:x:0:0:root:/root:/bin/bash  
root  
root:x:0:0:root:/root:/bin/bash  

p,多文件匹配时，显示匹配文件的文件名

[root@localhost ~]#grep -l 'root' test test2 testbak DAta  
test  
test2  
testbak  

q,没有-o时，有一行匹配，这一行里面有3个root，加上-o后，这个3个root就出来了

[root@localhost ~]#grep  'root' test  
root:x:0:0:root:/root:/bin/bash  
[root@localhost ~]#grep -o 'root' test  
root  
root  
root  

r,递归显示匹配的内容，在test目录下面建个mytest目录，copy test目录下面的test文件到mytest下面，能看到上面的结果

[root@localhost ~]#grep test -R /tmp/test/mytest  
/tmp/test/mytest/test:test:x:1003:1003::/home/test:/bin/bash  
/tmp/test/mytest/test:@zhangying:*:1004:1004::/home/test:/bin/bash  

s,显示匹配root后面的3行

[root@localhost ~]#cat test |grep -A 3 root  
root:x:0:0:root:/root:/bin/bash  
bin:x:1:1:bin:/bin:/bin/false,aaa,bbbb,cccc,aaaaaa  
daemon:x:2:2:daemon:/sbin:/bin/false  
mail:x:8:12:mail:/var/spool/mail:/bin/false  

 

 递归从所有文件中查询匹配的内容，文件名可不同

 [root@localhost ~]#grep -R C1079651000621  *   
20150727/503/20150701000104001317.xml:            C1079651000621
20150727/503/20150701000104001317.xml:            C1079651000621
20150727/503/20150701000104001333.xml:            C1079651000621

正则表达式

基本正则表达式

我们先说字符匹配
正则表达式和我们以前所说的文件通配符很相似，正则表达式用来不是来处理文件的名称
他匹配的是文件的内容或者是字符串，不过千万要搞清楚正则表达式来处理的是文本文件，或者是字符串而不是文件名，通配符的匹配的是文件名，他是文件名里面的特定字符串，而正则表达式里面的是匹配的字符串，这个字符串可能是文本内容里面的字符串，当然也可能是命令执行结果里面的字符串。
-正则表达式的功能很强也分了若干类别分为，字符匹配，匹配次数，位置锚定，分组
其有用于字符匹配的，我们叫字符匹配
也有匹配某个字符重复次数的，我们叫匹配次数
也可以用来确实这个字符出现的位置的，我们叫位置锚定
也可以把多个字符用来合成一个整体，我们叫分组
所以正则表达式的功能非常强
正则表达式用到了一些特殊符号，我们叫元字符
如果想详细的看可以用帮助命令,man 7
我们先看字符匹配

字符匹配

. 字符

如果我要匹配一个的单一字符用. 就要可以了,.表示单一的一个字符
.是匹配一个文件中的内容中的一个单一的字符
比如说我要匹配一个内容中的一个字符

[root@localhost ~]# echo abcd|grep a.
abcd
[root@localhost ~]# echo abcd|grep a..
abcd
[root@localhost ~]# echo abcd|grep a...
abcd

我加一个点就多匹配一个，但是这个adcd匹配吗，匹配前给你列出来了，但是他只会给他，匹配到的加上颜色，汉字也会匹配哦，所以说他匹配的是任意字符，注意了点是匹配的任意字符，不代表他们匹配的是同一个字符

[root@localhost ~]# grep r..t /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin

看他上面匹配的是任意字符不只匹配root,r/ft都匹配到了，这就是匹配任意字符
同样点也表示一个字符

[ ]中括号字符

[]中括号在正则表达式中，他也是表示一个字符，表示中间的任意一个字符，不过他匹配的是字符串
下面这个命令表示可能是ar ,cr,br的命令

[root@localhost ~]# echo ar br cr |grep [ab]r
ar br cr

ar,cr 是匹配的cr我没说，所以不包含

这是[] 中括号中表示任意的单一的一个字符，他取的是中间的一个字符
而且还是二选一的，或者是多选一的
当然了我没还可以取反

[^ ] 取反字符

[^ ] 取反的意思就是匹配指定范围外的任意单个字符
取反意思就是，除了[]不要，其余的都要的意思
也就是只要不是这个范围内的，其他任意字符都行

[root@localhost ~]# echo ar br cr |grep [^ab]r
ar br cr

匹配次数

* 字符

* 的意思是不确定次数,包括0次
注意他重复的是前面这个字符任意次，不是单词任意次

[root@localhost ~]# grep goo*gle 11.txt
google
gooooooooooogle

. * 字符

.* 的含义是任意长度的字符串

[root@localhost ~]# grep g.*gle 11.txt
google
gooooooooooogle
gooooogle
goooooooogle

\ ? 字符

他的意思是匹配这个字符串前面的这个字符出现1次或者0次
简单的说就是0次就是没有，1次就是有

[root@localhost ~]# grep "go?gle" 11.txt
ggle
gogle

\ + 字符

当然了我们也可以表示1次以上,包含1次

[root@localhost ~]# grep "go+gle" 11.txt
google
gooooooooooogle
gooooogle
goooooooogle
gogle

{N} 字符

匹配前面N次
如果我们想匹配5次,或2次

[root@localhost ~]# grep "go{5}gle" 11.txt
gooooogle
[root@localhost ~]# grep "go{2}gle" 11.txt
google

他还可以表示几个以上

[root@localhost ~]# grep "go{2,}gle" 11.txt
google
gooooooooooogle
gooooogle
goooooooogle

还可以表示范围

[root@localhost ~]# grep "go{2,5}gle" 11.txt
google
gooooogle

还可以表示以下
[root@localhost ~]# grep "go{,5}gle" 11.txt
google
gooooogle
ggle
gogle
以上是匹配次数

位置锚定

^ 锚定首行

[root@localhost ~]# grep "^root"  /etc/passwd
root:x:0:0:root:/root:/bin/bash

在外面的托字符^意思是不以他开头
在里面的托字符 [^ ] 在里面是以他开头

$ 字符

$ 是显示从尾部显示

[root@localhost ~]# grep bash$  /etc/passwd
root:x:0:0:root:/root:/bin/bash
mageedu:x:1000:1000:mageedu:/home/mageedu:/bin/bash
mage:x:1001:1001::/home/mage:/bin/bash
wang:x:1002:1002::/home/wang:/bin/bash
shadow:x:0:1003::/home/shadow:/bin/bash
hyma:x:0:1004::/home/hyma:/bin/bash

也可以和其他来配合使用

[root@localhost ~]# grep ^google$ 11.txt
google

当然如果^$ 连着写是表示空行

[root@localhost ~]# grep -n ^$ 11.txt
1:
2:
3:
4:
5:
6:
7:
8:
15:
16:
17:
18:
19:
20:
22:
31:

还可以用他来排除空行

[root@localhost ~]# grep -nv ^$ 11.txt
9:google
10:gooooooooooogle
11:gooooogle
12:goooooooogle
13:ggle
14:gogle
21:        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
23:lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
24:        inet 127.0.0.1  netmask 255.0.0.0
25:        inet6 ::1  prefixlen 128  scopeid 0x10<host>
26:        loop  txqueuelen 1  (Local Loopback)
27:        RX packets 6  bytes 446 (446.0 B)
28:        RX errors 0  dropped 0  overruns 0  frame 0
29:        TX packets 6  bytes 446 (446.0 B)
30:        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

单词锚定

[root@localhost ~]# grep   "^\<lilin\>" /etc/passwd  
lilin:x:1003:1005::/home/lilin:/bin/bash

开头单词锚定

[root@localhost ~]# grep   "^\<lilin" /etc/passwd  
lilin:x:1003:1005::/home/lilin:/bin/bash

单词的左右边界

[root@localhost ~]# grep "^\<root\>"  /etc/passwd
root:x:0:0:root:/root:/bin/bash

也可以用\b来表示边界

[root@localhost ~]# grep "^\broot\b"  /etc/passwd 
root:x:0:0:root:/root:/bin/bash

分组

他是把多个字符串，作为一个整体来用

[root@localhost ~]# echo rererererre |grep "\(re\)\{3\}" 
rererererre

字符	说明
.	匹配任意单个字符
[ ]	匹配指定范围内的任意单个字符
[^ ]	匹配指定范围外的任意单个字符
[:alnum:]	字母和数字
[:alpha:]	代表任何英文大小写字符，亦即 A-Z, a-z
[:lower:]	小写字母 [:upper:] 大写字母
[:blank:]	空白字符（空格和制表符）
[:space:]	水平和垂直的空白字符（比[:blank:] 包含的范围广）
[:cntrl:]	不可打印的控制字符（退格、删除、警铃...））
[:digit:]	字十进制数字 [:xdigit:] 十六进制数字
[:graph:]	可打印的非空白字符
[:print:]	可打印字符

以上是linux基本正则表达式元字符
正则表达式分为两大类分别是BRE基本的正则表达式，ERE 扩展的正则表达式

|BRE元符| 说明 | 举例 |
| :-------- | --------:|| :-------- |
|. | 匹配单个字符 | a.c |
|[ ] | 匹配范围内任意单个字符 | [a-z]|
|[ ] | [[:digit:]]:匹配0-9之间的数字| 1[[:digit:]]|
|[ ] | [[:alpha:]]:匹配任意字母字符，不区分大小写| a[[:alpha:]]|
|[ ] | [[:alnum:]]:匹配任意字母数字字符0-9，a-z和 A-Z | a[[:alnum:]]789|
|[ ] | [[:blnk:]]:匹配空格或制表符 |Hello[[:blank:]]world |
|[ ] | [[:lower:]]:匹配小写字母a-z |abcde[[:lower:]]g |
|[ ] | [[:upper:]]:匹配小写字母A-Z|ABCDEF[[:upper:]]G |
|[ ] | [[:prit:]]:匹配任意可打印字符 | |
|[ ] | [[:punct:]]:匹配标点符号|attention[[:punct:]] |
|[ ] | [[:space:]]]:匹配任意空白字 | Hello[[:blank:]]world |
|[^ ] | 匹配范围外任意单个字符 | [^a-z]|
|* | 匹配求按摩的字符任意次(0,1或多次) | ab |
|. | .任意长度的任意字符 | . 整行 |
|+ | 匹配前面的字符至少1次| a+b |
|? | 匹配前面的0或1次 |a?b |
|{m} | 其前面的字符出现m次，m为非负整数 | a{2.4} |
|{m,n} | 其前面的字符出现m次，m为非负整数|b{2,4} |
|^ | 行首匹配 | ^Head |
| $| 行位匹配 | tail$ |
| <或>\b | 匹配单词左侧 | \Hello |
|>或\b | 匹配单词右侧 | hello> |
|(x) | 将此x匹配到的字符当做整体进行处理| |
|(x) | pat1(pat2)pat3(pat4(pat5)pat6) | \2 引用is |
|(x) | \n:第n个括号的匹配模式所匹配到的内容 | |

以上是正则表达式BRE的元字符
-以下是正则表达式ERE的元字符

|ERE元符| 说明 | 举例 |
| :-------- | --------:|| :-------- |
|. | 匹配单个字符 | a.c |
| [ ] | 匹配范围内任意单个字符 | [a-z]|
|[ ] | [[:digit:]]:匹配0-9之间的数字| 1[[:digit:]]|
|[ ] | [[:alpha:]]:匹配任意字母字符，不区分大小写| a[[:alpha:]]|
|[ ] | [[:alnum:]]:匹配任意字母数字字符0-9，a-z和 A-Z | a[[:alnum:]]789|
|[ ] | [[:blnk:]]:匹配空格或制表符 |Hello[[:blank:]]world |
|[ ] | [[:lower:]]:匹配小写字母a-z |abcde[[:lower:]]g |
|[ ] | [[:upper:]]:匹配小写字母A-Z|ABCDEF[[:upper:]]G |
|[ ] | [[:prit:]]:匹配任意可打印字符 | |
|[ ] | [[:punct:]]:匹配标点符号|attention[[:punct:]] |
|[ ] | [[:space:]]]:匹配任意空白字 | Hello[[:blank:]]world |
|[^ ] | 匹配范围外任意单个字符 | [^a-z]|
|* | 匹配求按摩的字符任意次(0,1或多次) | ab |
|. | .任意长度的任意字符 | . 整行 |
|？| 匹配前面的0次或1次 | a?b |
|{m} | 其前面的字符出现m次，m为非负整数 | a{4}|
|{m,n} | 其前面的字符出现m次，m为非负整数;[m,n] | b{2.4}|
|^ | 行首匹配 | ^Head |
| $| 行位匹配 | tail$ |
| <或\b| 匹配单词左侧| <hello> |
| >或\b | 匹配单词右侧| hello> |
|(x) | 将此x匹配到的字符当做整体进行处理| |
|(x) | pat1(pat2)pat3(pat4(pat5)pat6) | \2 引用is |
|(x) | \n:第n个括号的匹配模式所匹配到的内容 |

还有最后一个| 线由于编程的冲突不能填写进表格里面他的作用是，匹配左边或右边

正则表达式入门

正则表达式

那什么是正则表达式

文本处理工具

cat 命令

tail 命令

more 命令

less命令

tr转换命令

cut命令

paste命令

wc命令

sort命令

uniq命令

diff 和 patch命令

grep命令

正则表达式

基本正则表达式

字符匹配

. 字符

[ ]中括号字符

[^ ] 取反字符

匹配次数

* 字符

. * 字符

\ ? 字符

\ + 字符

{N} 字符

位置锚定

^ 锚定首行

$ 字符

分组

猜你喜欢

热点阅读

正则表达式入门

正则表达式

那什么是正则表达式

文本处理工具

cat 命令

tail 命令

more 命令

less命令

tr转换命令

cut命令

paste命令

wc命令

sort命令

uniq命令

diff 和 patch命令

grep命令

正则表达式

基本正则表达式

字符匹配

. 字符

[ ]中括号 字符

[^ ] 取反字符

匹配次数

* 字符

. * 字符

\ ? 字符

\ + 字符

{N} 字符

位置锚定

^ 锚定 首行

$ 字符

分组

猜你喜欢

热点阅读

[ ]中括号字符

^ 锚定首行