lihe/Leetcode

Leetcode_187_Repeated DNA Sequences

Opened this issue · 0 comments

lihe commented

Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

题意:将DNA序列看成一个只包含[A,B,C,D]四个字符的字符串,给一个DNA字符串,找出所有长度为10的且出现次数超过一次的字符串。

算法思路:

方法一:

  1. 枚举DNA字符串中所有的长度为10的子串,将其插入到哈希Map中,并记录子串的数量;
  2. 遍历哈希Map,将所有出现次数超过一次的子串存储到结果中

算法复杂度为O(n)

class Solution{
public:
    std::vector<std::string> findRepeatedDnaSequences(std::string s){
        std::map<std::string, int> word_map;
        std::vector<std::string> result;
        for(int i = 0; i < s.length(); i++){
            std::string word = s.substr(i, 10);
            if(word_map.find(word) != word_map.end()){
                word_map[word] += 1;
            }
            else{
                word_map[word] = 1;
            }
        }
        std::map<std::string, int> :: iterator it;
        for(it = word_map.begin(); it != word_map.end(); it++){
            if(it -> second > 1){
                result.push_back(it -> first);
            }
        }
        return result;
    }
};

方法二:

  1. 设置全局变量哈希int g_hash_map[1028576] 1048576 = 2^20,表示所有的长度为10的DNA序列;
  2. 将DNA字符串的前十个字符使用左移运算转化为整数keyg_hash_map[key]++;
  3. 从DNA的第11个字符开始,按顺序遍历各个字符,遇到一个字符即将key右移二位(去掉最低位),并将新的DNA字符s[i]转化为整数后,或到最后两位,g_hash_map[key]++;
  4. 遍历哈希表g_hash_map,若g_hash_map[i] > 1,将i从低到高位转化为10个DNA序列,push到结果数组。
int g_hash_map[1048576] = {0};

std::string change_int_to_DNA(int DNA){
    static const char DNA_CHAR[] = {'A', 'C', 'G', 'T'};
    std::string str;
    for(int i = 0; i < 10; i++){
        str += DNA_CHAR[DNA & 3];
        DNA = DNA >> 2;
    }
    return str;
}

class Solution{
public:
    std::vector<std::string> findRepeatedDnaSequences(std::string s){
        std::vector<std::string> result;
        if(s.length() < 10){
            return result;
        }
        for(int i = 0; i < 1048576; i++){
            g_hash_map[i] = 0;
        }
        int char_map[128] = {0};
        char_map['A'] = 0;
        char_map['C'] = 1;
        char_map['G'] = 2;
        char_map['T'] = 3;
        int key = 0;
        for(int i = 9; i >= 0; i--){
            key = (key << 2) + char_map[s[i]];
        }
        g_hash_map[key] = 1;
        for(int i = 10; i < s.length(); i++){
            key = key >> 2;
            key = key | (char_map[s[i]] << 18);
            g_hash_map[key]++;
        }
        for(int i = 0; i < 1048576; i++){
            if(g_hash_map[i] > 1){
                result.push_back(change_int_to_DNA(i));
            }
        }
        return result;
    }
};