Python笔记:re模块处理正则表达式

2025-05-08  本文已影响0人  _百草_

学习建议:re模块学习时,再次会看正则表达式,以便理解

1、 re介绍

Python中,re模块来处理正则表达式(re是Python的标准库之一,无需额外安装)
import re # 导入模块
函数:re.func(pattern, string,flags=0)

r :字符串中特殊符号不会被转义。r"\n" 同"\n"


2、re函数

2.1 re.match

string="abcdfjkalas"

# def match(pattern, string, flags=0):
#     """Try to apply the pattern at the start of the string, returning
#     a Match object, or None if no match was found."""
# 从起始位置匹配模式。匹配失败返回None,否则返回对象Match object
res=re.match(r"\w",string)
# 匹配成功<re.Match object; span=(0, 1), match='a'>
print(res)
res=re.match(r"(?P<letters>[a-zA-Z]{2,3})(?P<num>\d+?)",string)
if res:
  print(res.group("letters"),res.group("num")) # fgh 1

if res: # 非None
    print(res.group())  # a

# 匹配标签
res=re.match(r"<(?P<tag>[a-zA-Z1-6]+?)>.*?</(?P=tag)>",string)
res=re.match(r"<([a-zA-Z1-6]+?)>.*?</\1>",string)

re.fullmatch(pattern, string, flags=0) 整个字符串完全匹配正则表达式,则返回匹配对象;否则返回None

2.2 re.search

扫描整个字符串并返回第一个成功的匹配(Match object 或None)

# def search(pattern, string, flags=0):
res=re.search(r"([a-zA-Z]{2,3})(\d+?)",string)
print(res) # <re.Match object; span=(5, 9), match='fgh1'>
if res:
    print(res.groups()) # ('fgh', '1')
    print(res.group(1)) # fgh
    print(res.group(2)) # 1

2.3 re.findall

扫描整个字符串并以列表形式返回所有匹配成功的子串

# re.findall
res=re.findall(r"(?P<letters>[a-zA-Z]{2,3})(?P<num>\d+?)",string)
print(res) # [('fgh', '1')]
res=re.findall(r"\d{2}",string)
print(res) # ['1', '2', '3', '4', '5']  # ['12', '34']

re.finditer(pattern,string,flags=0) 类似findall(),但是返回一个迭代器,每一个元素是一个match对象,可以获取匹配的具体信息

res=re.finditer(r"\S+", string)
print(res) # <callable_iterator object at 0x0000029382A600D0>
for match in res:
    # print(match) # <re.Match object; span=(0, 4), match='This'>
    print(match.group()) # This # is # my # string

2.4 re.sub

替换字符串中的匹配项,返回新字符串
def sub(pattern, repl, string, *args, count=0, flags=0):

res=re.sub(r"\d{9}","abc",string)
# 若无匹配项,count非0,报错DeprecationWarning: 'count' is passed as positional argument(因为没有那么多替换次数)
# 若无匹配项,count=0,则返回原来字符串

def jia1(matchobj):
    # print(matchobj)  # <re.Match object; span=(8, 9), match='1'>
    match_res=matchobj.group()
    if len(match_res)>1: # 报错TypeError: object of type 're.Match' has no len()
        string=''.join([chr(ord(i)+1) for i in match_res])
        # print(string)
        return string # ascii值+1,返回对应的字符
    return chr(ord(match_res)+1)
res=re.sub(r"\d{4}",jia1,string,)
print(res)


# 日期格式转换
import time,datetime
# print(time.time()) # 1747627002.7477674
string=str(datetime.datetime.now()).split(".")[0] # 2025-05-19 11:57:37.801868
# print(string) # 2025-05-19 12:02:54
print(re.sub(r"(\d+)-(\d+)-(\d+)",r"\1/\2/\3",string))

re.subn(pattern, repl, string, count=0, flags=0)sub()相同,但返回元组(字符串,替换次数)

2.5 re.split

根据正则表达式匹配分隔符分隔字符串,返回一个列表
split(pattern, string, *args, maxsplit=0, flags=0)

# re.split
# def split(pattern, string, *args, maxsplit=0, flags=0):
#     Split the source string by the occurrences of the pattern,
#     returning a list containing the resulting substrings.  If
#     capturing parentheses are used in pattern, then the text of all
#     groups in the pattern are also returned as part of the resulting
#     list.  If maxsplit is nonzero, at most maxsplit splits occur,
#     and the remainder of the string is returned as the final element
#     of the list.
string="This is my  string"
res=re.split(r"\s+", string) # 按\s+即空白字符分隔
print(res) # ['This', 'is', 'my', 'string']
res=re.findall(r"\S+", string) # 匹配非空白字符;与上述结果等同
print(res) # ['This', 'is', 'my', 'string']
res=re.split(r"\s+", string,maxsplit=1) # 分隔1次;超出后按最大次数分隔
print(res) # ['This', 'is my  string']

2.6 re.compile

编译正则表达式,生成Pattern对象,供match()和search()这
compile(pattern, flags=0):
返回一个Pattern对象

import re
pattern=re.compile(r"\d+\.\d+") # 匹配浮点数; # re.compile('\\d+\\.\\d+')
res=pattern.search("This is $5.12.") # <re.Match object; span=(9, 13), match='5.12'>
print(res.group()) # 5.12

Pattern对象有对应的属性和方法
属性:Pattern.flags、Pattern.groups、Pattern.groupindex、Pattern.pattern
方法:同re函数

2.7 re.escape

对字符串的特殊字符进行转义,使他们能够作为普通字符使用,返回转义后的字符串
re.escape(pattern) :pattern要转义的字符串

res=re.escape("<(?P<tag>[a-zA-Z1-6]+?)>")
print(res) # <\(\?P<tag>\[a\-zA\-Z1\-6\]\+\?\)>

2.8 re.purge

清除正则表达式缓存,re模块函数对已编译的正则表达式对象进行缓存,不同python版本中,缓存中已编译过正则表达式对象的数目不同

# 清除缓存
re.purge() # Clear the regular expression caches

3、匹配对象的方法和属性

MatchObj的方法:

# span|start|end|group
string="<h1>Hello World!</h1>"
res=re.search("<(?P<tag>[a-zA-Z1-6]+?)>",string)
print(res) # <re.Match object; span=(0, 4), match='<h1>'>
print(res.group("tag")) # h1
print(res.start(),res.end()) # 0 4
print(res.span()) # (0, 4)
# matchObj.groupdict(default=None) 返回一个字典,其中键名为匹配的分组名,值为分组匹配到的内容
print(res.groupdict()) # {'tag': 'h1'}
# 未命名的分组,则不返回
# 命名分组没有匹配内容,则值是default参数的值

# matchObj.expand(template)  使用指定模版格式匹配结果,模版中\g<name>或\number用于引用已命名的分组或捕获组
pattern=r"(?P<area_code>\d{3})-(\d+)-(\d+)"
text="400-881-8611"
res=re.search(pattern,text)
print(res.expand(r"(\g<area_code>) \2\3")) # (400) 8818611

MatchObj的属性:

print(res.string) # <h1>Hello World!</h1>
print(res.pos)  # 0
print(res.endpos) # 21  # ???不理解
print(res.lastgroup) # tag
print(res.lastindex) # 1 # 当前只有1个捕获组,所以最后一个捕获组索引是1

4、flags

import re

# re.I 忽略大小写
text="Cat cat cAt caT"
res=re.findall(r"cat",text,flags=re.I)
print(res) #['Cat', 'cat', 'cAt', 'caT']
res=re.findall(r"cat",text)
print(res) # ['cat']

# re.M 多行模式,注意针对^和$,匹配每一行的行头和行尾
text="cat\ncat\ncA\n caT"
res=re.findall(r"^cat",text,flags=re.M)
print(res) #['cat', 'cat']
res=re.findall(r"^cat",text)
print(res) # ['cat']

# re.X 为pattern添加注释
a=re.compile(
    r"""\d+ # the integral part
                \. # the decimal point
                \d* # some fractional part""",flags=re.X)
b=re.compile(r"\d+\.\d*")
print(a,b)

5、参考

1、Python 正则表达式 | 菜鸟教程
2、 python正则表达式(re模块)详解-CSDN博客
3、Python正则表达式:用"模式密码"解锁复杂字符串
4、正则表达式的入门用法及Python中的正则表达式-CSDN博客
5、Python re模块 用法详解-CSDN博客
6、python的re模块学习 - 博客园
7、python——正则表达式(re模块)详解_python re正则-CSDN博客
8、Python 正则表达式 re模块 最全面超详细总结-CSDN博客

上一篇 下一篇

猜你喜欢

热点阅读