Word Break Problems

Word Break Problems
- Word Break
  - Description
  - Example
  - Analysis
  - AC Picture
  - Code
- Word Break II
  - Description
  - Example
  - Analysis
  - AC Picture
  - Code

Word Break

Description

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

The same word in the dictionary may be reused multiple times in the segmentation.
You may assume the dictionary does not contain duplicate words.

Example

Example One:

Input: s = “leetcode”, wordDict = [“leet”, “code”]
Output: true
Explanation: Return true because “leetcode” can be segmented as “leet code”.
Example Two:

Input: s = “applepenapple”, wordDict = [“apple”, “pen”]
Output: true
Explanation: Return true because “applepenapple” can be segmented as “apple pen apple”.
Note that you are allowed to reuse a dictionary word.
Example Three:

Input: s = “catsandog”, wordDict = [“cats”, “dog”, “sand”, “and”, “cat”]
Output: false

Analysis

题目意思为，给出一个字符串，和一个包含某些字符串的字典，判断给出字符串能否分割为字典内的字符串序列。如样例一，leetcode 和字典[“leet”, “code”]，这是可行的，因为 leetcode 可以分割为 “leet code”。

我们来试试 dp (虽然这个 dp 和普通循环的算法差别不是特别大，希望看看其他大佬的想法)。
设输入字符串为 $str[0…n-1]$，字典为 $dict=[…]$。构造 $dp[0…n]$，其中 $dp[m], m \in [0, n]$ 表示 $str[0…m]$ 能否被分割为合法的字典串序列。
原问题的解答就是 $dp[n]$。对于子问题：$dp[i]$ 的值，我们考量所有串 $str[j, i], j \in [0, i-1]$，当 $dp[j]$ 不合法时，那么无论串 $str[j, i]$ 是否合法，$dp[i]$ 都不合法；当 $dp[j]$ 合法时，那么 $dp[i]$ 合法性取决于串 $str[j, i]$ 的合法性。但是，对于 $dp[i]$ 的最终取值，它是所有 $j$ 决定的 $dp[i]$ 的值的或。可以考虑这个情况:
str = abcd，dict = [“a”, “bcd”]，dp[0…4] = {true, true, false, false, unknown}。
当前正在判断 $dp[4]$，当 $j = 3$ 时，计算得 $dp[4] = false$($dp[3] = fasle$)，但是，可以看到，继续计算到 $j = 1$ 时，计算得 $dp[4] = true$($dp[1] = true$ 且串 bcd 合法)，所以 $dp[4]$ 的值是所有求得的 $dp[4]$ 值的或。
对于 $dp[0]$，我们设为 true，因为空串属于字典是可以定义的。

下面是计算式：
$$dp[i] = \bigvee\limits_{0 <= j < i} (dp[j] \land valid(substr(j, i - j)))$$

根据计算式子，我们写出伪代码：

dp[0...n] = false
dp[0] = true
for i in [1 : n]:
  for j in [i-1 : 0]:
    # unnecessary to compute when dp[j] = false in OR computation
    if (dp[j]):
      if (valid(substr(j, i-j))):
        dp[i] = true
        # Valid break to next position because 
        # the number of solutions is not important
        break

return dp[n]

AC Picture

Code

bool wordBreak(string s, vector<string> &dict) {
  int size = s.size();
  vector<bool> dp(size + 1, false);
  dp[0] = true;
  
  for (int i = 1; i <= size; ++i) {
    for (int j = i - 1; j >= 0; --j) {
      if (dp[j]) { // unnecessary to compute when dp[j] = false in OR computation
        string word = s.substr(j, i - j);
        if (find(dict.begin(), dict.end(), word) != dict.end()) {
          dp[i] = true;
          // Valid break to next position because 
          // the number of solutions is not important
          break;
        }
      }
    }
  }
  
  return dp[size];
}

Word Break II

Description

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences.

Note:

The same word in the dictionary may be reused multiple times in the segmentation.
You may assume the dictionary does not contain duplicate words.

Example

Example One:

Input:
s = “catsanddog”
wordDict = [“cat”, “cats”, “and”, “sand”, “dog”]
Output:
[
“cats and dog”,
“cat sand dog”
]
Example Two:

Input:
s = “pineapplepenapple”
wordDict = [“apple”, “pen”, “applepen”, “pine”, “pineapple”]
Output:
[
“pine apple pen apple”,
“pineapple pen apple”,
“pine applepen apple”
]
Explanation: Note that you are allowed to reuse a dictionary word.
Example Three:

Input:
s = “catsandog”
wordDict = [“cats”, “dog”, “sand”, “and”, “cat”]
Output:
[]

Analysis

题目意思为输出给定串可分割的所有合法字典序列。我们可以利用上面的 dp 进行适当的改造，来构造答案：

增加一个二维数组，用于存储节点 i 的父节点；
当发现一个解时，存储其父节点值，并继续向前求解，而不是跳过；

算法伪代码如下：

dp[0...n] = false
dp[0] = true
int graph[0...n][] # store the parent node
graph[0] = [-1]
for i in [1 : n]:
  for j in [i-1 : 0]:
    # unnecessary to compute when dp[j] = false in OR computation
    if (dp[j]):
      if (valid(substr(j, i-j))):
        dp[i] = true
        graph[i].push(j)

return dfs(graph, n)

最终，我们可以得到一个存储了当前有解节点的父节点的二维数组，通过样例二来说明：
输入: str = “pineapplepenapple”, dict = [“apple”, “pen”, “applepen”, “pine”, “pineapple”]
当 i = 4，j = 0，得到 pine，dp[4] = true，graph[4] = [0]；
当 i = 9，j = 5，得到 apple，dp[9] = true，graph[9] = [5]；
当 i = 9，j = 0，得到 pineapple，dp[9] = true，graph[9] = [5，0]；
当 i = 12，j = 9，得到 pen，dp[12] = true，graph[12] = [9]；
当 i = 12，j = 4，得到 applepen，dp[12] = true，graph[12] = [9，4]；
当 i = 17，j = 12，得到 apple，dp[17] = true，graph[17] = [12]；
最终，我们得到：

dp[0...17] = {T,F,F,F,T,F,F,F,F,T,F,F,T,F,F,F,F,T}
graph[0...17] = [
  [-1],
  [],
  [],
  [],
  [0],
  [],
  [],
  [],
  [],
  [5,0],
  [],
  [],
  [9,4],
  [],
  [],
  [],
  [],
  [12]
]

根据 graph，我们利用 dfs 可以重构出所有的解序列。

# dfs current node(int)
# the order of the child node is not matter
void dfs(int graph[][], int &node) {
  if (node <= 0) return;

  int size = graph[node].size();
  for (int i = 0; i < size; ++i) {
    dfs(graph, graph[node][i]);

    # Process
  }
  return;
}

我们根据需要，每次 dfs 返回一个字符串数组，然后处理返回的字符串数组。对于返回的字符串数组，代表其子节点对应的解序列个数，每一个解序列，我们需要加上当前节点的字符串(有解序列则添加一个空格)。最终构造出所有的解序列。

AC Picture

Code

vector<string> wordBreak(string &s, vector<string>& dict) {
  int size = s.size();
  
  vector<vector<int> > graph;
  graph.resize(size + 1);
  graph[0].push_back(-1);
  
  vector<bool> dp(size + 1, false);
  dp[0] = true;
  
  for (int i = 1; i <= size; ++i) {
    for (int j = i - 1; j >= 0; --j) {
      if (dp[j]) {
        string word = s.substr(j, i - j);
        if (find(dict.begin(), dict.end(), word) != dict.end()) {
          dp[i] = true;
          graph[i].push_back(j);
        }
      }
    }
  }
  
  return dfs(s, graph, size);
}

// DFS to construct the result
// End node as origin node
vector<string> dfs(string &s, vector<vector<int> > &graph, int &node) {
  vector<string> ans;
  if (node <= 0) {
    ans.push_back("");
    return ans;
  }

  int size = graph[node].size();
  for (int i = 0; i < size; ++i) {
  // cur node string
    string str = s.substr(graph[node][i], node-graph[node][i]);
    vector<string> sub = dfs(s, graph, graph[node][i]);

    // Process the child node string with cur node string
    int loop = sub.size();
    for (int j = 0; j < loop; ++j) {
      sub[j] += (sub[j] == "" ? "" : " ") + str;
      ans.push_back(sub[j]);
    }
  }
  return ans;
}