Leetcode_187_Repeated DNA Sequences
Opened this issue · 0 comments
lihe commented
Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]
题意:将DNA序列看成一个只包含[A,B,C,D]
四个字符的字符串,给一个DNA字符串,找出所有长度为10的且出现次数超过一次的字符串。
算法思路:
方法一:
- 枚举DNA字符串中所有的长度为10的子串,将其插入到哈希Map中,并记录子串的数量;
- 遍历哈希Map,将所有出现次数超过一次的子串存储到结果中
算法复杂度为O(n)
class Solution{
public:
std::vector<std::string> findRepeatedDnaSequences(std::string s){
std::map<std::string, int> word_map;
std::vector<std::string> result;
for(int i = 0; i < s.length(); i++){
std::string word = s.substr(i, 10);
if(word_map.find(word) != word_map.end()){
word_map[word] += 1;
}
else{
word_map[word] = 1;
}
}
std::map<std::string, int> :: iterator it;
for(it = word_map.begin(); it != word_map.end(); it++){
if(it -> second > 1){
result.push_back(it -> first);
}
}
return result;
}
};
方法二:
- 设置全局变量哈希
int g_hash_map[1028576]
1048576 = 2^20,表示所有的长度为10的DNA序列; - 将DNA字符串的前十个字符使用左移运算转化为整数
key
,g_hash_map[key]++
; - 从DNA的第11个字符开始,按顺序遍历各个字符,遇到一个字符即将
key
右移二位(去掉最低位),并将新的DNA字符s[i]
转化为整数后,或到最后两位,g_hash_map[key]++
; - 遍历哈希表
g_hash_map
,若g_hash_map[i] > 1
,将i
从低到高位转化为10个DNA序列,push
到结果数组。
int g_hash_map[1048576] = {0};
std::string change_int_to_DNA(int DNA){
static const char DNA_CHAR[] = {'A', 'C', 'G', 'T'};
std::string str;
for(int i = 0; i < 10; i++){
str += DNA_CHAR[DNA & 3];
DNA = DNA >> 2;
}
return str;
}
class Solution{
public:
std::vector<std::string> findRepeatedDnaSequences(std::string s){
std::vector<std::string> result;
if(s.length() < 10){
return result;
}
for(int i = 0; i < 1048576; i++){
g_hash_map[i] = 0;
}
int char_map[128] = {0};
char_map['A'] = 0;
char_map['C'] = 1;
char_map['G'] = 2;
char_map['T'] = 3;
int key = 0;
for(int i = 9; i >= 0; i--){
key = (key << 2) + char_map[s[i]];
}
g_hash_map[key] = 1;
for(int i = 10; i < s.length(); i++){
key = key >> 2;
key = key | (char_map[s[i]] << 18);
g_hash_map[key]++;
}
for(int i = 0; i < 1048576; i++){
if(g_hash_map[i] > 1){
result.push_back(change_int_to_DNA(i));
}
}
return result;
}
};