sed and awk 2022-10-13

2022-10-13  本文已影响0人  9_SooHyun

sed

stream editor 通常用来修改文件
sed [OPTION]... {script-only-if-no-other-script} [input-file]...
[]可选
{} 指定范围内必选
<> 必选

sed script-only-if-no-other-script 通常被单引号或者双引号包裹

sed [options] 'command' file(s) # 单引号所见即所得
sed [options] "command" file(s) # 双引号能够被shell解释器做$变量替换
sed [options] -f scriptfile file(s)

sed 会根据脚本命令来处理文本文件中的数据,这些命令要么从命令行中输入,要么存储在一个文件中
sed按以下顺序操作对象数据:

默认情况下,sed 命令会作用于文本数据的所有行,即sed的默认address是all lines
如果只想将命令作用于特定行或某些行,则必须在{script-only-if-no-other-script}指定address部分

address specified for sed

address用来确认sed的作用域
A sed command can specify zero, one, or two addresses.
An address can be a line number, a line addressing symbol, or a regular expression that describes a pattern.

To illustrate how addressing works, let's look at examples using the delete command, d .
A script consisting of simply the d command and no address: d produces no output since it deletes all lines.

one-address situation

When a line number is supplied as an address, the command affects only that line. For instance, the following example deletes only the first line: 1d The line number refers to an internal line count maintained by sed . This counter is not reset for multiple input files. Thus, no matter how many files were specified as input, there is only one line 1 in the input stream.

Similarly, the input stream has only one last line. It can be specified using the addressing symbol, $ . The following example deletes the last line of input: $d The $ symbol should not be confused with the $ used in regular expressions, where it means the end of the line.

When a regular expression is supplied as an address, the command affects only the lines matching that pattern. The regular expression must be enclosed by slashes ( / ). The following delete command: /^$/d deletes only blank lines. All other lines are passed through untouched.

two-address situation

If you supply two addresses, then you specify a range of lines over which the command is executed. The following example shows how to delete all lines surrounded by a pair of macros, in this case, .TS and .TE, that mark a table as tbl input: /^\.TS/,/^\.TE/d It deletes all lines beginning with the line matched by the first pattern up to and including the line matched by the second pattern. Lines outside this range are not affected. If there is more than one table (another .TS/.TE pair after the first), those tables will also be deleted.

The following command deletes from line 50 to the last line in the file: 50,$d You can mix a line address and a pattern address: 1,/^$/d This example deletes from the first line up to the first blank line

sed 进阶: hold space & pattern space

h H Copy/append pattern space to hold space.
g G Copy/append hold space to pattern space.
n N Read/append the next line of input into the pattern space.

When sed reads a file line by line, the line that has been currently read is inserted into the pattern buffer (pattern space). Pattern buffer is like the temporary buffer, the scratchpad where the current information is stored. When you tell sed to print, it prints the pattern buffer.

Hold buffer / hold space is like a long-term storage, such that you can catch something, store it and reuse it later when sed is processing another line. You do not directly process the hold space, instead, you need to copy it or append to the pattern space if you want to do something with it. For example, the print command p prints the pattern space only. Likewise, s operates on the pattern space.

Here is an example:

sed -n '1!G;h;$p'
(the -n option suppresses automatic printing of lines)

There are three commands here: 1!G, h and $p. 1!G has an address, 1 (first line), but the ! means that the command will be executed everywhere but on the first line. $p on the other hand will only be executed on the last line. So what happens is this:

echo -e "1\n2\n3\n4" | sed -n '1!G;h;$p'

sed for string replacing

sed s/oldcontent/newcontent/ :替换pattern space中的oldcontent为newcontent
速度优化:当由于某种原因(比如输入文件较大、处理器或硬盘较慢等)需要提高命令执行速度时,可以考虑在替换命令(“s/.../.../”)前面加上地址表达式来提高速度。举例来说:
sed 's/foo/bar/g' filename # 标准替换命令
sed '/foo/s/foo/bar/g' filename # 通过前置/foo/指定address,速度更快
sed '/foo/s//bar/g' filename # 简写形式

regex for sed

https://www.gnu.org/software/sed/manual/html_node/Extended-regexps.html
The only difference between basic and extended regular expressions is in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces (‘{}’)
basic regular expressions require these to be escaped if you want them to behave as special characters, when using extended regular expressions you must escape them if you want them to match a literal character.
例如:
要匹配“tt”,

sed 默认支持和使用POSIX.2 BREs
使用- E opt可以use extended regular expressions in the script

[root@TENCENT64 /]# echo "abc+def" | sed 's/+/--/g'
abc--def
-> 默认使用BRE,+表示字面义,因此被替换

[root@TENCENT64 /]# echo "abc+def" | sed -E 's/+/--/g'
sed: -e expression #1, char 8: Invalid preceding regular expression
-> 使用ERE,+表示1次或多次,但单一个+是非法的regular expression
[root@TENCENT64 /]# echo "abc+def" | sed -E 's/\+/--/g'
abc--def
-> 使用ERE,\+表示“+”字符
[root@TENCENT64 /]# echo "abc+def" | sed -E 's/c+/--/g'
ab--+def
-> 使用ERE,c+表示“c出现一次或多次”
[root@TENCENT64 /]# echo "abc+def" | sed -E 's/c\+/--/g'
ab--def
-> 使用ERE,c\+表示“c+”字符串
[root@TENCENT64 /]# 

more usage see info sed

awk

awk命令是强大的文本查找和提取命令,支持丰富的过滤和提取。使用awk就像使用一个小型数据库一样

NAME
       awk - pattern-directed scanning and processing language

SYNOPSIS
       awk [ -F fs ] [ -v var=value ] [ 'prog' | -f progfile ] [ file ...  ]

Awk scans each input file for lines that match any of a set of patterns
       specified literally **in prog or in one or more files specified as -f
       progfile**.  

-F:
The -F fs option defines the input field separator to be the regular expression fs. 
-F 定义了separator,它用来切割一行文本以获得若干fields。An input line is normally made up of fields separated by white space, or by the regular expression FS.  
The fields are denoted $1, $2, ..., while $0 refers to the entire line.  If FS is null, the input line is split into one field per character. 
$1, $2 等可以在后面的prog程序中被引用

-v:
The option -v followed by var=value is an assignment to be done before prog is executed; any number of -v options may be present.  

prog:
awk prog(程序语句)的格式如下:
pattern1 {action1} pattern2 {action2} …

With each pattern there can be an associated action that will be performed when a line of a file matches the pattern. 
Each line is matched against the pattern portion of every pattern-action statement; the associated action is performed for each matched pattern.  
对每一行,匹配pattern1的执行action1,匹配pattern2的执行action2...

eg
awk -F '$6 != 0{print $0}' filetest 一行的第六个字段不等于0,则打印这一行

awk的内置变量

NR 记录当前已经读取的行数(不是输出的行数)(NUM READ)
FNR 作用域是当前文件的NR(FILE NR)
NF 记录当前行的字段数
trick:当多个输入文件时,NR==FNR 即这一行在第一个文件中,NR>FNR即这一行不在第一个文件中。

awk的内置特殊pattern

BEGIN:匹配第一个输入文件第一行之前的位置
END:最后一个输入文件最后一行之后的位置

awk的action

action里面可以定运算,支持+ - × / % 五种运算。变量直接只用,不需要声明
action中如果有多条语句,那么可以用;隔开
awk中只有两种类型:数值(双精度浮点)、字符串。变量可以使用字符串拼接进行赋值,拼接时使用空格隔开即可

awk还支持一般编程语言中常见的控制结构if、while、for,和c中的写法一样

[root@VM-165-116-centos ~]# free
              total        used        free      shared  buff/cache   available
Mem:       16132456     2242816     1608600      274992    12281040    13378556
Swap:       1048572       64256      984316
[root@VM-165-116-centos ~]# free | awk '{print $2}'
used
16132456
1048572
# 这里used的值对应到total列去了,因为第一行的$2和剩余行的$2不一致

[root@VM-165-116-centos ~]# free | awk '{if(NR==1){print $2}else{print $3}}'
used
2245456
64256
[root@VM-165-116-centos ~]#
sed和awk的语法都支持脚本(sed的script,awk的prog),这使得它们可以实现相对复杂的逻辑,可以满足一般情况下的内容提取、查找和修改需要。另外,perl命令似乎更加强大,可以直接替代sed和awk,期待后续补充
上一篇下一篇

猜你喜欢

热点阅读