leetcode 10. 正则表达式匹配

Question

leetcode 10. 正则表达式匹配

xxleyi opened this issue 5 years ago · 0 comments

xxleyi commented 5 years ago

题：

给你一个字符串 s 和一个字符规律 p，请你来实现一个支持 '.' 和 '*' 的正则表达式匹配。

'.' 匹配任意单个字符
'*' 匹配零个或多个前面的那一个元素
所谓匹配，是要涵盖整个字符串 s的，而不是部分字符串。

说明:

s 可能为空，且只包含从 a-z 的小写字母。
p 可能为空，且只包含从 a-z 的小写字母，以及字符 . 和 *。

来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/regular-expression-matching
著作权归领扣网络所有。商业转载请联系官方授权，非商业转载请注明出处。

解：

这个题，我尝试了，但并没有靠自己的能力做出来。复盘原因的话，是没能找好「循环不变式」。或者说，对此题来说，没能找到描述状态转移的「递推方程」。

为什么没能找到呢？很简单，硬实力不够，积累不够，思维差点电火花，点不着。怎么办？还是得总结，得练。下面就是我用「循环不变式」视角进行的总结。

实现 . * 这两个正则匹配规则，其中 . 比较好办，但 * 的话，或许是因为我对正则不够熟悉的原因吧，我是没能想到实际的算法需要从后往前匹配，陷在了从前往后匹配的死胡同。

还有一点也挺关键，就是匹配时，以模式字符串为准，直到模式字符串为空，则匹配结束，得到结果。

剩下的就是准确定义初始值，以及循环的方式：

字符串 s 为空时，匹配也会照常进行，故而，循环初始位置为 s.length
对于模式字符串 p，我们应该从 p.length - 1 开始循环，但由于 * 匹配符的存在，我们同时也关心每一次当前循环位置之后的那个位置，也就是说，p.length 也会被使用到。

此外，循环之前的初始值不算在循环之内，所以我们 cache 变量的维度是 s.length + 2 x p.length + 2。

var isMatch = function(s, p) {

  let genCache = () => {
    // initialize loop invariant related variable cache with dimension s.length + 2 x p.length + 2
    // cache[s.length][p.length] is true, all other should be false to maintain loop invariant
    let cache = Array(s.length + 2)
    for (let i = 0; i < cache.length; i++) cache[i] = Array(p.length + 2).fill(false)
    cache[s.length][p.length] = true
    return cache
  }

  // cache is the loop invariant related variable
  function loop(i = s.length, j = p.length - 1 , cache = genCache()) {

    // loop termination: cache[0][0] will be our answer
    if (i < 0 || j < 0) return cache[0][0]
    
    // single character match result (include .)
    let singleMatch = p[j] === '.' && !!s[i] || p[j] === s[i]
    // zero match result for *
    let zeroMatch = cache[i][j + 2]
    // one character match result for *
    let oneMoreMatch = singleMatch && cache[i + 1][j]
    // last match result (here we loop from right to left, the last match is index bigger than current)
    let lastMatch = cache[i + 1][j + 1]

    // update cache to maintain loop invariant depends on whether the current pattern is * or not
    if (p[j + 1] === '*') {
      cache[i][j] = zeroMatch || oneMoreMatch
    } else {
      cache[i][j] = singleMatch && lastMatch
    }

    // next loop
    if (j > 0) {
      return loop(i, j - 1, cache)
    } else {
      return loop(i - 1, p.length - 1, cache)
    }
  }

  return loop()
};

上面我有意循环递归，为了刻意训练自己「循环不变式」的视角，它同时适用于递归和迭代，但递归显得更好玩一些。

可以发现，在动态规划类型的问题中，这个「循环不变式」相关的变量，必然囊括那个 dp table。

有了基于「循环不变式」的递归，转成循环就是比较容易的一件事了。

下面是我转为循环迭代的代码，你会发现，核心逻辑和注释无需任何修改。

var isMatch = function(s, p) {
  // initialize loop invariant related variable cache with dimension s.length + 2 x p.length + 2
  // cache[s.length][p.length] is true, all other should be false to maintain loop invariant
  let cache = Array(s.length + 2)
  for (let i = 0; i < cache.length; i++) cache[i] = Array(p.length + 2).fill(false)
  cache[s.length][p.length] = true

  for (let i = s.length; i >= 0; i--) {
    for (let j = p.length - 1; j >= 0; j--) {
      // single character match result (include .)
      let singleMatch = p[j] === '.' && !!s[i] || p[j] === s[i]
      // zero match result for *
      let zeroMatch = cache[i][j + 2]
      // one character match result for *
      let oneMoreMatch = singleMatch && cache[i + 1][j]
      // last match result (here we loop from right to left, the last match is index bigger than current)
      let lastMatch = cache[i + 1][j + 1]

      // update cache to maintain loop invariant depends on whether the current pattern is * or not
      if (p[j + 1] === '*') {
        cache[i][j] = zeroMatch || oneMoreMatch
      } else {
        cache[i][j] = singleMatch && lastMatch
      }
    }
  }
  
  // loop termination: cache[0][0] will be our answer
  return cache[0][0]
};

无论递归还是迭代，想要正确解题，三个阶段的「循环不变式」必须得到保证：

循环之前，正确初始化相关变量
每一次循环的开头和末尾，正确更新相关变量
循环体的结束后，「循环不变式」依然成立，且可从相关的变量中得到问题正确的解