2020-07-09LeetCode7 minutes read (About 1039 words) 0 visits

LeetCode/面试题17.13.恢复空格

面试题 17.13. 恢复空格

哦，不！你不小心把一个长篇文章中的空格、标点都删掉了，并且大写也弄成了小写。像句"I reset the computer. It still didn’t boot!"已经变成了"iresetthecomputeritstilldidntboot"。在处理标点符号和大小写之前，你得先把它断成词语。当然了，你有一本厚厚的词典dictionary，不过，有些词没在词典里。假设文章用sentence表示，设计一个算法，把文章断开，要求未识别的字符最少，返回未识别的字符数。

注意：本题相对原题稍作改动，只需返回未识别的字符数

示例：

输入：
dictionary = ["looked","just","like","her","brother"]
sentence = "jesslookedjustliketimherbrother"
输出： 7
解释： 断句后为"jess looked just like tim her brother"，共7个未识别字符。

提示：

0 <= len(sentence) <= 1000
dictionary中总字符数不超过 150000。
你可以认为dictionary和sentence中只包含小写字母。

来源：力扣（LeetCode）
著作权归领扣网络所有。商业转载请联系官方授权，非商业转载请注明出处。

题解：

本题直接借鉴官方题解，就是字符串+动态规划，其他的方法都不够通俗易懂。基本思想就是，用一个 dp 数组存储前i个字符中最少未识别的字符的数量，定义两个指针i,j，判断 sentence 中 j-1 到 i-1 的子串是否在 dictionary 中，如果在 dictionary 中，那么状态转移方程为 $dp[i]=min(dp[i], dp[j-1])$ ；如果不存在，状态转移方程为$dp[i]=dp[i-1]+1$，即为前 i-1 个未识别的字符数量，加上当前 i 结点的一个字符。

关于如何存储 dictionary 中的单词，由于暴力破解每次都需要重复遍历，浪费了大量的时间，所以这里官方采用 Trie (字典树) 来存储字典中的单词，将单词的每个字母逆序插入到字典树中，每次状态转移的时候就从 Trie 的根节点出发，如果 sentence[j] 在 Trie 没有出现，则 sentence[j…i-1] 不在字典中，否则需要判断该字串是不是 dictionary 中的一个单词，通过加入 isEnd 标记来判断是否成词。

具体代码如下：

class Solution {
    public int respace(String[] dictionary, String sentence) {
        //获取句子长度
        int len = sentence.length();

        //建立字典树
        Trie root = new Trie();
        for (String word : dictionary) {
            root.buildTrie(word);
        }

        int[] dp = new int[len + 1];  //定义dp数组，用来存放最少未识别的字符的个数
        Arrays.fill(dp, Integer.MAX_VALUE); //由于后面需要比较，所以这里先将dp数组的默认值设置为最大
        dp[0] = 0;  //dp[0] = 0表示空字符串的最少未识别字符个数为0个
        for (int i = 1; i <= len; i++) {
            //默认没有找到字串在dictionary中出现
            //所以就是前i-1的状态再加上当前未识别的字符
            dp[i] = dp[i - 1] + 1;

            Trie curr = root;
            for (int j = i; j >= 1; j--) {
                int t = sentence.charAt(j - 1) - 'a';
                //当前元素根本不在字典树中，直接跳出循环
                if (curr.next[t] == null)
                    break;
                //这种情况表示sentence中[j-1,i-1]字串是dictionary中的一个单词
                //这时只需要判断dp[i]和dp[j-1]中较小值即可
                if (curr.next[t].isEnd == true)
                    dp[i] = Math.min(dp[i], dp[j - 1]);
                curr = curr.next[t];
            }
        }
        return dp[len];
    }
}

class Trie {
    public Trie[] next;
    public boolean isEnd; //添加isEnd标志是为了判断当前字母在字典树中是否为最后一个单词的字母

    public Trie() {
        next = new Trie[26]; //这里因为全是小写字母，所以一个结点的子结点最多有26个
        isEnd = false;
    }

    public void buildTrie(String s) {
        Trie currentPoints = this;
        
        //将所有的单词，逆序插入字典树中
        for (int i = s.length() - 1; i >= 0; i--) {
            int temp = s.charAt(i) - 'a'; //将字符转换为整数方便处理
            if (currentPoints.next[temp] == null)
                currentPoints.next[temp] = new Trie();
            currentPoints = currentPoints.next[temp];
        }
        currentPoints.isEnd = true; //将该单词的最后一个字母标记为结尾单词
    }
}