golang规则表达式使用之重复和选择的优先级问题
2021-09-12 本文已影响0人
CodingCode
重复(Repetition)和选择(Alternation)的优先级差异问题。
简单说就是:
- 重复
*, +, {n,m}
是单个字符的重复,而 - 选择
|
是字符组合的选择。
意思是:
ab+ == a(b+) != (ab)+
ab|cd == (ab)|(cd) != a(b|c)d
所以,看几个例子:
text := "XXXabbYYYababZZZ"
fmt.Printf("%q\n", regexp.MustCompile(`ab+`).FindAllString(text, -1)) //["abb" "ab" "ab"]
fmt.Printf("%q\n", regexp.MustCompile(`a(b+)`).FindAllString(text, -1)) //["abb" "ab" "ab"]
fmt.Printf("%q\n", regexp.MustCompile(`(ab)+`).FindAllString(text, -1)) //["ab" "abab"]
text := "XXXababYYYacdZZZ"
fmt.Printf("%q\n", regexp.MustCompile(`ab|cd`).FindAllString(text, -1)) //["ab" "ab" "cd"]
fmt.Printf("%q\n", regexp.MustCompile(`(ab)|(cd)`).FindAllString(text, -1)) //["ab" "ab" "cd"]
fmt.Printf("%q\n", regexp.MustCompile(`a(b|c)d`).FindAllString(text, -1)) //["acd"]
这里要明白的是规则表达式操作符的优先级问题。
参照POSIX对规则表达式操作符优先级的定义:
Basic Regular Expressions Precedence
+---+----------------------------------------------------------+
| | ERE Precedence (from high to low) |
+---+----------------------------------------------------------+
| 1 | Collation-related bracket symbols | [==] [::] [..] |
| 2 | Escaped characters | \<special character> |
| 3 | Bracket expression | [] |
| 4 | Subexpressions/back-references | \(\) \n |
| 5 | Single-character-BRE duplication | * \{m,n\} |
| 6 | Concatenation | |
| 7 | Anchoring | ^ $ |
+---+-----------------------------------+----------------------+
+---+----------------------------------------------------------+
| | ERE Precedence (from high to low) |
+---+----------------------------------------------------------+
| 1 | Collation-related bracket symbols | [==] [::] [..] |
| 2 | Escaped characters | \<special character> |
| 3 | Bracket expression | [] |
| 4 | Grouping | () |
| 5 | Single-character-ERE duplication | * + ? {m,n} |
| 6 | Concatenation | |
| 7 | Anchoring | ^ $ |
| 8 | Alternation | | |
+---+-----------------------------------+----------------------+
这里可以看出选择'|'
的优先级是最低的。