【DP】139. Word Break

2019-06-03 本文已影响0人牛奶芝麻

问题描述：

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

The same word in the dictionary may be reused multiple times in the segmentation.
You may assume the dictionary does not contain duplicate words.

Example 1:

Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Example 2:

Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".
             Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]
Output: false

解题思路：

题目大意是：给定一个非空字符串 s 和一个包含非空单词列表的字典 wordDict，判定 s 是否可以被空格拆分为一个或多个在字典中出现的单词。

注意： 字典中有可能有多余的单词。

想法1（贪婪算法，错误做法）：

使用两个索引 cur 和 nex 来遍历字符串，cur 指向当前字符，nex 每次向后移动一次；
判断 s[cur:nex] 是不是 wordDict 中的字符串。如果是，cur = nex 继续判断，直到 cur 移动到字符串末尾，即 cur = len(s)；
如果最后一个子串 s[cur:nex] 在 wordDict 中，说明 s 可以被完全划分，返回 True，否则为 False。

当然，我们可以将 wordDict 转化为 dict 加速判断。实现代码如下：

class Solution:
    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        dic = dict()
        for word in wordDict:
            dic[word] = True
        cur, nex = 0, 1
        while nex != len(s):
            if s[cur:nex] in dic:
                cur = nex
            nex += 1
        if s[cur:nex] in dic:
            return True
        return False

但是，这样的想法是错误的！！！ 我们使用这种贪心策略，会忽略这样的问题。

s = "aaaaaaa"
wordDict = ["aaaa","aaa"]

我们相当于每次取最短，上述代码最后的划分结果就是 ["aaa", "aaa", "a"]，从而返回 False 的错误答案。

想法2（DP，正确做法）：

使用一个 dp = [True] + [False] * len(s) 来标记字符串每个位置（相当于在每个字符串后加一个隔板）。dp[0] = True 是用来将字符串切出来的，dp[1] 到 dp[len(s)] 指向每个字符，初始化为 False。dp[i] 代表第 i 个位置是否可以划分。
遍历字符串每个位置，即 i = 1 to len(s) + 1，判断 s[i] 位置的 dp[i] 是否可以改写为 True。如果 s[i] 的位置可以划分，就要要求：
1、以 s[i] 结尾的子串在这个字典里（我们可以遍历 wordDict，来查看是否 s[i-len(word):i] == word）；
2、这个子串的前一个位置 dp[i-len(word)] 是 True 的状态。
最后，如果 dp[-1] 位 True，说明 s 可以被完全划分，否则为 False 代表不可以划分。

举例说明：

s = "catsandog", wordDict = ["cats", "cat", "and", "og", "aaa"]

首先 dp[0] = True；在遍历的过程中，dp[3] = True 因为以 t 结尾的 cat 在字典里面，且 dp[3-3] 为 True（这就是 dp[0] 为 True 的作用）；dp[4] = True 因为以 s 结尾的 cats 在字典里且 dp[4-4] 为 True；dp[7] 为 True 因为以 d 结尾的 and 在字典里且 dp[7-3] 为 True；dp[9] 为 True 因为以 g 结尾的 og 在字典里且 dp[9-2] 为 True；最后，dp[-1] = dp[9] = True，得到划分结果 ['cats', 'and', 'og']。

其实，可以发现 s[i] = True 将字符串划分为两部分，s[0:i] 为可以在字典中划分的，s[i:] 为待划分的字符串。

时间复杂度为 O(N*M)，空间复杂度 O(N)，其中 N 为字符串长度，M 为字典中 word 个数。

Python3 实现：

class Solution:
    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        dp = [True] + [False] * len(s)
        for i in range(1, len(s) + 1):
            for word in wordDict:
                if i >= len(word) and dp[i-len(word)] and word == s[i-len(word):i]:
                    dp[i] = True
        return dp[-1]