数据结构和算法分析

Longest Tandem Subsequence - O(n

2019-11-11  本文已影响0人  江海小流

问题描述

对于一个字符串 ss[i:j] 表示的子串为 s[i],s[i+1],...,s[j-1]的连接。类似于一个半闭半开的区间。

Longest\ Tandem\ Subsequence 简称 LTS, LTS是一个长度为2ks的子序列 t,使得 t[0:k] = t[k:2k],且k最大。

给定 s ,求 t。假设 s 的长度为 n

思路

基于 LCS 的做法

复杂度为 O(n^2) 的做法

一些定义

该方法通过找 D_k(\cdot, \cdot)之间的规律,从而得到 F_k(\cdot, \cdot) 之间的关系,推导出复杂度为 O(n^2) 的算法。

具体的规律为(例子):

k=4 k=5 k=6 k=7 k=8

上述性质的证明方法:

对于 F_k(i, j)

  1. 如果s[i-1]=s[k-1],那么F_k(i, j) = F_{k-1}(i-1, j) + 1
  2. 如果s[i-1] \ne s[k-1],那么F_k(i, j) = max\{ F_{k}(i-1, j), F_{k-1}(i, j) \}
  3. 那么F_k(i,j) = max\{ F_{k-1}(\lambda_k(i) - 1, j) + 1, F_{k-1}(i, j) \}
    证明方法:
    • 如果 s[i-1] = s[k-1],那么 \lambda_k(i) = i-1,此时 F_k(i,j) = F_{k-1}(i-1, j) + 1 = max\{ F_{k-1}(i-1, j)+ 1, F_{k-1}(i, j) \}
    • 如果 s[i-1] \ne s[k-1],那么 F_k(i,j) = max\{ F_k(i-1, j), F_{k-1}(i, j) \} = max\{ F_k(i-2, j), F_{k-1}(i-1, j), F_{k-1}(i, j) \} = max\{ F_k(i-2, j), F_{k-1}(i, j) \} =max\{ F_{k}(\lambda_k(i), j), F_{k-1}(i, j) \} =max\{ F_{k-1}(\lambda_k(i)-1, j) + 1, F_{k-1}(i, j) \}

对于 D_k(i, j)

D_k(i, j) = F_k(i, j) - F_{k-1}(i, j)

  1. 如果 s[i-1] = s[k-1],那么:F_k(i, j) = F_{k-1}(i-1, j) + 1 F_k(i-1,j) = max\{ F_{k-1}(\lambda_k(i-1)-1, j) + 1, F_{k-1}(i-1,j )\}

    • w = F_{k-1}(\lambda_k(i-1)-1, j)d = \sum_{r=\lambda_k(i-1)}^{i-1}D_{k-1}(i,j)
    • 则有 D_k(i,j) = w + d + 1 - max\{w + 1, w + d \} = min\{d, 1\}
  2. 否则,令 w' = F_{k-1}(\lambda_k(i)-1, j)d'=\sum_{r=\lambda_{k}(i)}^{i-1}D_{k-1}(i,j)

    • 因为 \lambda_k(i) = \lambda_k(i-1)
    • 所以 F_k(i,j) = max\{ F_{k-1}(\lambda_k(i)-1,j)+1, F_{k-1}(i,j)\}=max\{w'+1, w'+d'+D_{k-1}(i,j)\} F_k(i-1,j) = max\{F_{k-1}(\lambda_k(i-1)-1,j) + 1, F_{k-1}(i-1,j)\} = max\{w'+1, w'+d'\}
    • 因此 D_k(i,j) = max\{w'+1, w'+d'+D_{k-1}(i,j)\} - max\{w'+1, w'+d'\} =max\{1, d'+D_{k-1}(i,j)\} - max\{1, d'\}

利用上述推导,可以得出如下结论

  1. 考虑D_{k-1}(i,j)为0 且 D_k(i,j) 为 1的条件:
    • s[i-1] = s[k-1] 且 (d' > 0d > 0)
  2. 考虑D_{k-1}(i,j)为1 且 D_k(i,j) 为 0的条件:
    • d' = 0d' = 0
  3. 利用上面两条性质,就可以得出example中的推论。

Demo 代码实现

验证代码,用于认证上述推论的正确性

def call_lts(s):
    n = len(s)
    pf = [0 for _ in range(n + 1)]
    f = [0 for _ in range(n + 1)]
    for k in range(1, n + 1):
        last_pos = dict()
        for i in range(1, k):
            if s[k-1] in last_pos:
                if s[i-1] == s[k-1]:
                    f[i] = max(pf[last_pos[s[k-1]]+1:i+1])
                else:
                    f[i] = max(pf[last_pos[s[k-1]]+1:i])
            else:
                if s[i-1] == s[k-1]:
                    f[i] = k - 1
                else:
                    f[i] = pf[i]

            last_pos[s[i-1]] = i - 1
        f, pf = pf, f
        print(pf) 


if __name__ == '__main__':
    call_lts('ABCABCAB')
    print('---')
    call_lts('BABBCA')

O(n^2)的实现,使用单调双端队列优化

from queue import deque

class OrderedDeque:

    def __init__(self, ref):
        self.data = deque()
        self.ref = ref
    
    def add(self, idx):
        while len(self.data) > 0 and self.ref[self.data[len(self.data)-1]] <= self.ref[idx]:
            self.data.pop()
        self.data.append(idx)
    
    def remove(self, idx):
        while len(self.data) > 0 and self.data[0] < idx:
            self.data.popleft()
    
    def biggest(self):
        return self.ref[self.data[0]]


def call_lts(s):
    n = len(s)
    pf = [0 for _ in range(n + 1)]
    f = [0 for _ in range(n + 1)]
    for k in range(1, n + 1):
        last_pos = dict()
        q = OrderedDeque(pf)
        for i in range(1, k):
            if s[k-1] in last_pos:
                q.remove(last_pos[s[k-1]]+1)
                if s[i-1] == s[k-1]:
                    q.add(i)
                    f[i] = q.biggest()
                else:
                    f[i] = q.biggest()
                    q.add(i)
            else:
                if s[i-1] == s[k-1]:
                    f[i] = k - 1
                    q.add(i)
                else:
                    f[i] = pf[i]
                    q.add(i)

            last_pos[s[i-1]] = i - 1
        f, pf = pf, f
        print(pf) 

    ans = 0
    for i in range(1, n + 1):
        update = 0
        for j in range(1, i+1):
            if pf[j] >= i:
                update += 1
        if update > ans:
            ans = update
    return ans


if __name__ == '__main__':
    print(call_lts('ABCABCAB'))
    print('---')
    print(call_lts('BABBCA'))

参考文献

  1. Kosowski A. An efficient algorithm for the longest tandem scattered subsequence problem[C]//International Symposium on String Processing and Information Retrieval. Springer, Berlin, Heidelberg, 2004: 93-100.
上一篇 下一篇

猜你喜欢

热点阅读