Python 正则表达式（三）

2020-12-01 本文已影响0人名本无名

前言

前面两节已经介绍了Python正则表达式的语法，接下来我们来看看 re 模块中各种函数的应用

常用函数

1、search

介绍

re.search(pattern, string, flags=0)

pattern: 正则匹配规则
string: 目标字符串
flags: 匹配模式

扫描整个 字符串 找到匹配样式的第一个位置，并返回一个相应的 匹配对象。

如果没有匹配到，就返回 None ；注意这和找到一个零长度匹配是不同的。

示例

ans = re.search('abc', 'abcdd')
if ans:
    print('Search result: ', ans.group())
else:
    print('No match')
# out: Search result:  abc`

2、match

介绍

re.match(pattern, string, flags=0)

参数含义同上

如果 string 开始的0或者多个字符匹配到了正则表达式，就返回一个相应的 匹配对象 。

如果没有匹配到，就返回 None ；注意它跟零长度匹配是不同的。

注意：即使在多行模式下， re.match()也只匹配字符串的开始位置，而不是匹配每行开始。

如果想在 string 的任何位置搜索，可以使用 search() 来替代

示例

ans = re.match('abc', 'abcdd')
if ans:
    print('match result: ', ans.group())
else:
    print('No match')
# out: Match result:  abc

ans = re.match('abc', 'babcdd')
if ans:
    print('match result: ', ans.group())
else:
    print('No match')
# out: No match`

3、fullmatch

介绍

re.fullmatch(pattern, string, flags=0)

整个 string 都要匹配到正则表达式

匹配到就返回一个相应的 匹配对象 。否则就返回一个 None

示例

ans = re.fullmatch('abc.dd', 'abcddd')

if ans:
    print('Match result: ', ans.group())
else:
    print('No match')
# out: Match result:  abcddd`

4、split

介绍

re.split(pattern, string, maxsplit=0, flags=0)

用 pattern 去分割 string 。

如果在 pattern 中捕获到括号，那么所有的组里的文字也会包含在列表里。

maxsplit 设定最多分隔次数，剩下的字符全部返回到列表的最后一个元素。

示例

# 用非文本字符（字母数字下划线）分割
re.split(r'\W+', 'Words, words, words.')
# out: ['Words', 'words', 'words', '']

# 分割字符串也会保留在结果列表中
re.split(r'(\W+)', 'Words, words, words.')
# out: ['Words', ', ', 'words', ', ', 'words', '.', '']

# 切割一次
re.split(r'\W+', 'Words, words, words.', 1)
# out: ['Words', 'words, words.']

# 以[a-f]之间的字符分割，且不区分大小写
re.split('(?i)[a-f]+', '0a3aB9')
re.split('[a-f]+', '0a3aB9', flags=re.IGNORECASE)
# out: ['0', '3', '9']`

5、findall

介绍

re.findall(pattern, string, flags=0)

从左到右进行扫描，匹配按找到的顺序返回。

如果样式里存在一个或多个组，就返回一个组合列表

空匹配也会包含在结果里。

前面两节都是使用 findall ，这里便不再举例啦。

6、finditer

介绍

re.finditer(pattern, string, flags=0)

与 findall 差不多，不一样的地方是：返回一个包含 匹配对象 的迭代器

示例

for ans in re.finditer(r'\w+', 'Words, words, words.'):
    print(ans.group(), end='\t')
# out: Words words words`

7、sub

介绍

re.sub(pattern, repl, string, count=0, flags=0)

使用 repl 替换 string 中匹配的子串，并返回替换后的字符串。

如果样式没有找到，则原样返回 string。

repl 可以是字符串或函数

字符串：任何反斜杠转义序列都会被处理，如 \n 会被转换为一个换行符，其他未知转义序列例如 \& 会保持原样。向后引用像是 \2 会用样式中第 2 组所匹配到的子字符串来替换。
函数：那它会对每个非重复的 pattern 进行调用。这个函数只有一个 匹配对象 参数，并返回一个替换后的字符串。

可选参数 count 是要替换的最大次数，非负，默认全部匹配

示例

re.sub('\w+', '123', 'hello, world, hello python')
# out: '123, 123, 123 123'

re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
       r'static PyObject*\npy_\1(void)\n{',
       'def myfunc():')
# out: 'static PyObject*\npy_myfunc(void)\n{'
"""
pattern：匹配 Python 函数定义
repl: 其中 \1 引用了捕获的函数名 myfunc，其他原样输出
"""

def dashrepl(matchobj):
    if matchobj.group(0) == '-': 
        return ' '
    else: 
        return '-'

re.sub('-{1,2}', dashrepl, 'pro----gram-files')
# out: 'pro--gram files'`

8、subn

介绍

re.subn(pattern, repl, string, count=0, flags=0)

与 sub() 相同，但是返回一个元组 (字符串, 替换次数).

示例

re.subn('\w+', '123', 'hello, world, hello python')
# out: ('123, 123, 123 123', 4)`

总结

好了好了，一下子讲了这么多函数，还没消化呢吧

今天就先讲到这里吧。

咱们明天见吧。

image