[刷题防痴呆] 0187 - 重复的DNA序列 (Repeate
2022-01-26 本文已影响0人
西出玉门东望长安
题目地址
https://leetcode.com/problems/repeated-dna-sequences/description/
题目描述
187. Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]
思路
- hash加滑动窗口.
关键点
- 注意, hashset里可以存数, 把DNA当做一个4进制的数. 这样可以节省空间.
代码
- 语言支持:Java
// 直接string hashset
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
Set<String> set = new HashSet<>();
Set<String> resSet = new HashSet<>();
int len = 10;
for (int i = 0; i + len - 1 < s.length(); i++) {
String sub = s.substring(i, i + len);
if (set.contains(sub)) {
resSet.add(sub);
} else {
set.add(sub);
}
}
return new ArrayList<>(resSet);
}
}
// 也可以用encode来将dna转换成integer, 本质上是一个四进制数
public List<String> findRepeatedDnaSequences(String s) {
HashSet<Integer> seen = new HashSet<>();
HashSet<String> repeated = new HashSet<>();
int l = 10;
for (int i = 9; i < s.length(); i++) {
String sub = s.substring(i - 9, i + 1);
int encoded = encode(sub);
if (seen.contains(encoded)) {
repeated.add(sub);
} else {
seen.add(encoded);
}
}
return new ArrayList(repeated);
}
private int encode(String s) {
int n = 0;
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == 'A') {
n = n * 4;
} else if (s.charAt(i) == 'C') {
n = n * 4 + 1;
} else if (s.charAt(i) == 'G') {
n = n * 4 + 2;
} else {
n = n * 4 + 3;
}
}
return n;
}
}