LeetCode HashTable 30 Substring with Concatenation of All Words

Posted 2020-10-01 梦_星_空

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了LeetCode HashTable 30 Substring with Concatenation of All Words相关的知识，希望对你有一定的参考价值。

You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

For example, given:
s: "barfoothefoobarman"
words: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).

　　这道题让我们求串联所有单词的子串，就是说给定一个长字符串，再给定几个长度相同的单词，让我们找出串联给定所有单词的子串的起始位置，我目前的能力只是写出了一种可以通过168组数据的算法，还是的参考别人的算法思路和代码，这里先贴一下自己的求解的思路（为了不误人子弟，可以跳过下面一段）。

　　首先我们需要使用一个哈希表存取在words数组里面出现过的单个字符串key，在value中存放字符串在words数组中出现的次数，使用计数器count记录子串中出现的words里面的字符串的个数，当count = words.length时，成功找到一个子串。定义left指针从左开始一个字符一个字符遍历，right只向子串的右部，第二层循环每次从S中取出给定长度的子串与哈希表中的键值进行比较：

　　如果取出的子串是哈希表的键值：判断该键值对应的value>0? 如果大于0，说明该键值在left--right子串中没有出现，或者是在words数组中有重复，且目前允许重复。count++，right向右移动一定长度。否则说明left--right子串中不允许再出现该字符串了，这时break，跳出循环，重新初始化哈希表，count置0。

　　如果取出的子串不是哈希表的键值，说明出现了不允许出现的字符，这个连续的子串已经不符合要求了，跳出循环，重新初始化哈希表，count置0。

　　内层循环结束之后，说明重left开始的子串已经判断完成，如果count == words.length则说明找到了一个符合条件的子串，否则判断，count是否大于0，如果不大于0，说明哈希表没有被更改过没不需要重新初始化，否则初始化哈希表。

代码如下：

public class Solution {
    public List<Integer> findSubstring(String s, String[] words) {
        int wlen = words[0].length();
        List<Integer> li = new ArrayList<Integer>();
        if(s.length()/wlen < words.length) // 长度不足，直接返回
            return li;
        HashMap<String,Integer> map = new HashMap<String,Integer>();
        initValue(map,words);
        int slen = s.length();
        int wordslen = wlen*words.length;
        for(int left = 0;left<=slen-wordslen;left++){
            int count = 0;
            int right = left;
            String sub = s.substring(right,right+wlen);
            while( map.containsKey(sub) && map.get(sub)>0 ){ //对于每一个；left right向后判断子串是否符合条件
                map.put( sub,map.get(sub)-1 );
                count++;
                right += wlen;
                if(right + wlen>s.length()) break;
                sub = s.substring(right,right+wlen);
            }
            
            if(count == words.length)
                li.add(left);
            
            if(count>0){
                map.clear();
                initValue(map,words);
            }
        }
        return li;
        
    }
    public void initValue(HashMap map,String []words){ //初始化哈希表
        for(int i=0;i<words.length;i++){
            if( map.containsKey(words[i]) ){
                map.put( words[i],(int)map.get(words[i])+1 );
            }
            else{
                map.put( words[i],1 );
            }
        }
    }
}

　　这种思路超时，后来查看别人的算法，才发现我们这是一个字符一个字符的遍历，而更为巧妙的方式是一个单词一个单词的遍历，感觉别人写的思路挺明晰的，这里引用一下[LeetCode] Substring with Concatenation of All Words 串联所有单词的子串的分析思路：

　　这道题还有一种O(n)时间复杂度的解法，设计思路非常巧妙，但是感觉很难想出来。这种方法不再是一个字符一个字符的遍历，而是一个词一个词的遍历，比如根据题目中的例子，字符串s的长度n为18，words数组中有两个单词(cnt=2)，每个单词的长度len均为3，那么遍历的顺序为0，3，6，8，12，15，然后偏移一个字符1，4，7，9，13，16，然后再偏移一个字符2，5，8，10，14，17，这样就可以把所有情况都遍历到，我们还是先用一个哈希表m1来记录words里的所有词，然后我们从0开始遍历，用left来记录左边界的位置，count表示当前已经匹配的单词的个数。然后我们一个单词一个单词的遍历，如果当前遍历的到的单词t在m1中存在，那么我们将其加入另一个哈希表m2中，如果在m2中个数小于等于m1中的个数，那么我们count自增1，如果大于了，那么需要做一些处理，比如下面这种情况, s = barfoofoo, words = {bar, foo, abc}, 我们给words中新加了一个abc，目的是为了遍历到barfoo不会停止，那么当遍历到第二foo的时候, m2[foo]=2, 而此时m1[foo]=1，这是后已经不连续了，所以我们要移动左边界left的位置，我们先把第一个词t1=bar取出来，然后将m2[t1]自减1，如果此时m2[t1]<m1[t1]了，说明一个匹配没了，那么对应的count也要自减1，然后左边界加上个len，这样就可以了。如果某个时刻count和cnt相等了，说明我们成功匹配了一个位置，那么将当前左边界left存入结果res中，此时去掉最左边的一个词，同时count自减1，左边界右移len，继续匹配。如果我们匹配到一个不在m1中的词，那么说明跟前面已经断开了，我们重置m2，count为0，左边界left移到j+len，

代码如下：

public class Solution {  
    public ArrayList<Integer> findSubstring(String S, String[] L) {  
    ArrayList<Integer> res = new ArrayList<Integer>();  
    if(S==null || S.length()==0 || L==null || L.length==0)  
        return res;  
    HashMap<String,Integer> map = new HashMap<String,Integer>();  
    for(int i=0;i<L.length;i++)   // 初始化哈希表，是每次判断过程中的标准
    {  
        if(map.containsKey(L[i]))  
        {  
            map.put(L[i],map.get(L[i])+1);  
        }  
        else  
        {  
            map.put(L[i],1);  
        }  
    }  
    for(int i=0;i<L[0].length();i++)   // l[0].length代表 L 中每个字符串的长度，因为要通过，每次偏移一个字符来实现遍历
    {  
        HashMap<String,Integer> curMap = new HashMap<String,Integer>(); // 心得哈希表是在每次向右遍历过程中动态变化的
        int count = 0;  
        int left = i;  //left,right -->right 到达s的右侧，代表偏移一个字符，所需要的遍历结束
        for(int j=i;j<=S.length()-L[0].length();j+=L[0].length())
        {  
            String str = S.substring(j,j+L[0].length());  

            if(map.containsKey(str)) //取出的子串如果在words中 则更新 动态变化的哈希表
            {  
                if(curMap.containsKey(str))    //存在该键值
                    curMap.put(str,curMap.get(str)+1);
                else  
                    curMap.put(str,1);     //不存在该键值
                if(curMap.get(str)<=map.get(str)) //子串中 str 键值代表的 value 没有达到饱和
                    count++;  
                else  
                {  //如果str代表的键值达到饱和，说明连续的子串中出现了字符串的重复，需要跳过之前出现过的str,通过循环left
                    while(curMap.get(str)>map.get(str))  //跳过其后的子串，更新curmap和count，并判断其后指定长度的子串是                 
                                         //否是str,如果不是，则继续向右跳动指定长度，如果是，说明left后的子串left--right已经符合要求，达到了 == 刚好饱和
                    {  
                        String temp = S.substring(left,left+L[0].length());  
                        if(curMap.containsKey(temp))  
                        {  
                            curMap.put(temp,curMap.get(temp)-1);  
                            if(curMap.get(temp)<map.get(temp))  
                                count--;  
                        }  
                        left += L[0].length();  
                    }  
                }  
                if(count == L.length)  //如果找到一个子串，left向右跳动指定长度，继续遍历。
                {  
                    res.add(left);  
                    String temp = S.substring(left,left+L[0].length());  
                    if(curMap.containsKey(temp))  
                        curMap.put(temp,curMap.get(temp)-1);  
                    count--;  
                    left += L[0].length();  
                }  
            }  
            else  //连续子串中出现了不允许出现的字符串，right之前的子串已经不可能符合条件，left直接跳到right+L[0].length()
            {  
                curMap.clear();  
                count = 0;  
                left = j+L[0].length();  
            }  
        }  
    }  
    return res;  
    }  
}

以上是关于LeetCode HashTable 30 Substring with Concatenation of All Words的主要内容，如果未能解决你的问题，请参考以下文章