golang规则表达式使用之重复和选择的优先级问题

2021-09-12  本文已影响0人  CodingCode

重复(Repetition)和选择(Alternation)的优先级差异问题。

简单说就是:

  1. 重复*, +, {n,m}是单个字符的重复,而
  2. 选择|是字符组合的选择。

意思是:

ab+     ==   a(b+)      !=   (ab)+
ab|cd   ==   (ab)|(cd)  !=   a(b|c)d

所以,看几个例子:

text := "XXXabbYYYababZZZ"
fmt.Printf("%q\n", regexp.MustCompile(`ab+`).FindAllString(text, -1))   //["abb" "ab" "ab"]
fmt.Printf("%q\n", regexp.MustCompile(`a(b+)`).FindAllString(text, -1)) //["abb" "ab" "ab"]
fmt.Printf("%q\n", regexp.MustCompile(`(ab)+`).FindAllString(text, -1)) //["ab" "abab"]
text := "XXXababYYYacdZZZ"
fmt.Printf("%q\n", regexp.MustCompile(`ab|cd`).FindAllString(text, -1))     //["ab" "ab" "cd"]
fmt.Printf("%q\n", regexp.MustCompile(`(ab)|(cd)`).FindAllString(text, -1)) //["ab" "ab" "cd"]
fmt.Printf("%q\n", regexp.MustCompile(`a(b|c)d`).FindAllString(text, -1))   //["acd"]

这里要明白的是规则表达式操作符的优先级问题。
参照POSIX对规则表达式操作符优先级的定义:
Basic Regular Expressions Precedence

+---+----------------------------------------------------------+
|   |             ERE Precedence (from high to low)            |
+---+----------------------------------------------------------+
| 1 | Collation-related bracket symbols | [==] [::] [..]       |
| 2 | Escaped characters                | \<special character> |
| 3 | Bracket expression                | []                   |
| 4 | Subexpressions/back-references    | \(\) \n              |
| 5 | Single-character-BRE duplication  | * \{m,n\}            |
| 6 | Concatenation                     |                      |
| 7 | Anchoring                         | ^ $                  |
+---+-----------------------------------+----------------------+

Extended Regular Expressions

+---+----------------------------------------------------------+
|   |             ERE Precedence (from high to low)            |
+---+----------------------------------------------------------+
| 1 | Collation-related bracket symbols | [==] [::] [..]       |
| 2 | Escaped characters                | \<special character> |
| 3 | Bracket expression                | []                   |
| 4 | Grouping                          | ()                   |
| 5 | Single-character-ERE duplication  | * + ? {m,n}          |
| 6 | Concatenation                     |                      |
| 7 | Anchoring                         | ^ $                  |
| 8 | Alternation                       | |                    |
+---+-----------------------------------+----------------------+

这里可以看出选择'|'的优先级是最低的。

上一篇下一篇

猜你喜欢

热点阅读