重复的DNA序列

2018-05-10 本文已影响0人徐凯_xp

将DNA序列看作是只包含['A', 'C', 'G', 'T']4个字符的字符串，给一个DNA字符串，找到所有长度为10的且出现超过1次的子串。
例如:
s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
s = "AAAAAAAAAAA",
Return: ["AAAAAAAAAA"].
LeetCode 187. Repeated DNA Sequences

方法一：

class Solution{
public:
    std::vector<std::string> findRepeatedDnaSequences(std::string s){
        std::map<std::string,int> word_map;
        std::vector<std::string> result;
        for(int i= 0; i < s.length(); i++){
            std::string word = s.substr(i,10);//
            if( word_map.find(word) != word_map.end()){
                word_map[word]  += 1;
            }
            else{
                word_map[word] = 1; 
            }
        }
        std::map<std::string,int> :: iterator it;//遍历哈希表中所有
        for( it = word_map.begin(); it != word_map.end(); it ++){
            if( it-> second > 1){
                result.push_back(it->first);
            }
        }
        return result;
    }
};

方法二

将长度为10的DNA序列进行整数编码:
[‘A’, ‘C’, ‘G’, ‘T’]4个字符分别用[0, 1, 2, 3](二进制形式(00, 01, 10, 11)所表示，故长度为10的DNA序列可以用20个比特位的整数所表示，如:

1.设置全局整数哈希int g_hash_map[1048576]; 1048576 = 2^20，表示所有的长度为10的 DNA序列。
2.将DNA字符串的前10个字符使用左移位运算转换为整数key，g_hash_map[key]++。
3.从DNA的第11个字符开始，按顺序遍历各个字符，遇到1个字符即将key右移2位 (去掉最低位)，并且将新的DNA字符s[i]转换为整数后，或运算最高位(第19 、20位)，g_hash_map[key]++。
4.遍历哈希表g_hash_map，若g_hash_map[i] > 1，将i从低到高位转换为10个字符的DNA 序列，push至结果数组。

int g_hash_map[1048576] = {0};
std:: string change_int_to_DNA(int DNA){
    static const char DNA_CHAR[] = {'A', 'C','G', 'T'};
    std::string str;
    for(int i = 0; i < 10; i++){
        str += DNA_CHAR[DNA & 3];//3二进制为0000000011，匹配到最低一位
        DNA = DNA >>2;
    }
    return str;
}
class Solution{
public:
    std::vector<std::string> findRepeatDnaSequences(std::string s){
        std::vector<std::string> result;
        if(s.length() < 10){
            return result;
        }
        for(int i = 0; i < 1048576; i++){
            g_hash_map[i] = 0;
        }
        int char_map[128] = {0};
        char_map['A']= 0;
        char_map['C'] = 1;
        char_map['G']= 2;
        char_map['T']= 3;
        int key = 0;
        for(int i =9; i > = 0; i --){
            key = (key << 2) + char_map[s[i]];
        g_hash_map[key] = 1;
        for(i = 10 ; i < s.length(); i++){
            key = key >> 2;
            key = key|(char_map[s[i]] << 18);
            g_hash_map[key]++;
        }
        for(int i =0; i < 1048576; i ++ ){
            if(g_hash_map[i] > 1){
                result.push_back(change_int_to_DNA(i));
             }
        }
        return result;
        }

    }
};

重复的DNA序列

方法一：

方法二

猜你喜欢

热点阅读