String Problems

String Problems
- Unique Email Addresses
  - Description
  - Example
  - Analysis
  - Further
  - Comparison
- Longers Common Prefix

Unique Email Addresses

Description

very email consists of a local name and a domain name, separated by the @ sign.

For example, in `alice@leetcode.com,aliceis the local name, andleetcode.com` is the domain name.

Besides lowercase letters, these emails may contain '.'s or '+'s.

If you add periods ('.') between some characters in the local name part of an email address, mail sent there will be forwarded to the same address without dots in the local name. For example, "alice.z@leetcode.com" and "alicez@leetcode.com" forward to the same email address. (Note that this rule does not apply for domain names.)

If you add a plus ('+') in the local name, everything after the first plus sign will be ignored. This allows certain emails to be filtered, for example `m.y+name@email.comwill be forwarded tomy@email.com`. (Again, this rule does not apply for domain names.)

It is possible to use both of these rules at the same time.

Given a list of emails, we send one email to each address in the list. How many different addresses actually receive mails?

Example

Example One
- Input
  
  ["test.email+alex@leetcode.com","test.e.mail+bob.cathy@leetcode.com","testemail+david@lee.tcode.com"]
- Output
  
  2
- Explanation
  
  "testemail@leetcode.com" and "testemail@lee.tcode.com" actually receive mails
  
  Analysis

题目意思为给定一组邮件地址字符串，现在问相同的邮件地址数量是多少。邮件地址格式为形如 `user@mail.server.name的串，其中@前面为用户名，后面为域邮件名(*邮件地址的解释与题目有略微出入，但是意思是一样的，根据@` 符号划分*)。相同地址定义为：用户名和域名相同。不过用户名的匹配规则有如下两条：

'.' 符号会被自动忽略；
'+' 符号后的串会被忽略，且用户名不保留 '+' 符号

也就是说，`m.y+name@email.com和my@email.com地址是一样的，而m.y+name@em.ail.com和my@email.com` 是不一样的。两条规则可以同时生效，但仅对用户名生效。

很简单的一道题，最原始的想法是这样的：对每个字符串，找到 '+' 位置和 '@' 位置，'+' 位置前面的去除 '.'，'@'位置开始的，全部加入串。这样得到一个地址字符串，统计是否出现即可。上面所有的功能，都可以通过 c++ string 内置函数实现：

int numUniqueEmails(vector<string>& emails) {
  set<string> s;
  int plus_index = -1, at_index = -1;
  string temp = "";
  for (auto &email : emails) {
    temp = "", plus_index = email.find('+'), at_index = email.find('@');
    for (auto &c : email.substr(0, (plus_index == -1 ? at_index : plus_index))) {
      if (c != '.')   temp += c;
    }
    temp += email.substr(at_index);
    if (s.count(temp) == 0) {
      s.insert(temp);
    }
  }
  return s.size();
}

AC 截图：

上面的算法的复杂度，依赖 c++ 的内置库，比如，substr 的复杂度，在 cplusplus 中是这样描述的：Unspecified, but generally linear in the length of the returned object. 这是不定的，但和字符串长线性相关。不过，从下标开始取子字符串，那么我们依旧是在做一次扫描，以为之前的扫描是不用重复做的。但是find 函数必然是通过扫描字符串来实现的。所以推测时间复杂度为 $O(3mn)$ m = 字符串个数，n = 字符串长度。空间复杂度的话，没有递归/迭代调用，临时变量仅创建一次，所以是 $O(1)$

Further

能不能优化呢？可以的。我们抛弃内置函数，直接从头到尾扫描。当扫描到 '.'，直接跳过，如果扫描到 '+'，略过后面部分，直到 '@' 出现， '@' 及其后面的串，直接计入邮件地址。

int numUniqueEmails(vector<string>& emails) {
  set<string> s;
  for (auto &email : emails) {
    string temp = "";
    int size = email.size();
    bool flag = true;
    for (int i = 0; i < size; ++i) {
      if (flag) {
        if (email[i] == '.')  continue;
        else if (email[i] == '+') flag = false;
        else  temp += email[i];
      } else {
        if (email[i] == '@') {
          temp += email.substr(i);
          break;
        }
      }
    }
    if (s.count(temp) == 0) {
      s.insert(temp);
    }
  }
  return s.size();
}

AC 截图：

算法从头到尾对一个字符串，只扫描了一遍(虽然调用了一个 substr，但调用后直接跳出扫描循环，所以依旧只是从头到尾的一遍扫描)。所以时间复杂度为 $O(mn)$。空间复杂度为 $O(1)$。

Comparison

调库的 AC 运行时间为 32ms，而一遍扫描的运行时间为 20ms，虽然其中有随机样例的问题，但是，多次提交的情况来看，平均下来，一遍扫描的时间是比调库要快的。调库虽然方便，但是如果需要性能，还是需要自己手动实现。

Longers Common Prefix

Description

Write a function to find the longest common prefix string amongst an array of strings.

If there is no common prefix, return an empty string “”.

Note:

All given inputs are in lowercase letters a-z.

Example

Example 1:
- Input
  
  [“flower”,”flow”,”flight”]
- Output
  
  “fl”
Example 2:
- Input
  
  [“dog”,”racecar”,”car”]
- Output
  
  “”
- Explanation
  
  There is no common prefix among the input strings.

Analysis

意思简单明了：找字符串集中的最长公共前缀，且前缀指定从下标 0 开始。
那么算法很简单了，每次循环，扫描字符串集合中的每个字符串的一个元素，如果存在不相同，退出，返回结果；相同，则加入最长公共前缀。

string longestCommonPrefix(vector<string>& strs) {
  string ans = "";
  int size = strs.size();
  if (size == 0)  return ans;
  if (size == 1)  return strs[0];
  
  bool flag = true;
  int index = 0;
  char c;
  while (flag) {
    c = strs[0][index];
    for (int i = 1; i < size; ++i) {
      if (c != strs[i][index]) {
        flag = false;
        break;
      }
    }
    if (flag)   ans += c;
    index++;
  }
  return ans;
}

AC 截图：

算法的复杂度的话，空间复杂度为 $O(1)$，时间复杂度的话，由于我们每次循环扫描一遍集合，循环最多扫描次数最短字符串长度t，所以时间复杂度为 $O(tn)$。