数据结构:重叠区间的个数
2016-12-09 本文已影响60人
Bioconductor
本文首发为CSDN博客,地址为:http://blog.csdn.net/xxzhangx/article/details/53544822
欢迎关注,谢谢!引用转载请注明作者和地址!
题目:给定多个可能的重叠的区间,找出重叠区间的个数。
伪代码:
区间的定义如下:
class Interval{
int start; //起点
int end; //止点
Interval (int a,int b){
start =a;
end = b;
}
}
首先,要定义区间的类,实现Comparable接口,含有起点与止点的值和类型,还要重写用于排序的compareTo函数。
class Point implements Comparable<Point>{
int value;//数值
int type;//点的类型,0为起点,1为止点
Point (int v,int t){
value = v;
type = t;
}
//还需要实现compareTo函数,以便排序
public int compareTo(Point p){
if (this.value == p.value){
return 0;
}else if (this.value > p.value){
return 1;
}else {
return -1;
}
}
}
其次,区间转换为点,并将点排序,然后统计重叠的个数。
int getOverlappingCount(Interval[] A){
int max =0,count = 1;
if (A == null || A.length == 0) return max;
Point [] points = new Point[A.length*2];
for (int i=0;i<A.length;i++){//转为可排序的点
points[2*i] = new Point(A[i].start,0);
points[2*i+1] = new Point(A[i].end,1);
}
Collections.sort(points);//排序
for(int i =0;i<points.length;i++){
if(points[i].type==0){
count++;//起点
max = Math.max(max,count);
}else{
count--;
}
}
return max;
}
R语言
这里没有按照伪代码中给的样式来写,源于没有发现R语言中能采用这种方式来写,在谷歌是发现一个R包intervals,它可以实现overlap功能,也就是我们这里将用到的交集。
两个区间集合之间的重叠个数计算:
> a=matrix(c(1:16),ncol = 2, byrow = TRUE)
> a
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
[4,] 7 8
[5,] 9 10
[6,] 11 12
[7,] 13 14
[8,] 15 16
> to <- Intervals(a,closed = c( TRUE, FALSE ),type = "R")
> #集合to
> to
Object of class Intervals
8 intervals over R:
[1, 2)
[3, 4)
[5, 6)
[7, 8)
[9, 10)
[11, 12)
[13, 14)
[15, 16)
> dim(to)
[1] 8 2
> b <- matrix(c(2.121, 8,8, 9,6, 9,11, 12,3, 3),ncol = 2, byrow = TRUE)
> b
[,1] [,2]
[1,] 2.121 8
[2,] 8.000 9
[3,] 6.000 9
[4,] 11.000 12
[5,] 3.000 3
> from <- Intervals(b,closed = c( FALSE, FALSE ),type = "R")
> # 集合from
> from
Object of class Intervals
5 intervals over R:
(2.121, 8)
(8, 9)
(6, 9)
(11, 12)
(3, 3)
> rownames(from) <- c(1:nrow(from))
> empty(to)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> empty(from)
[1] FALSE FALSE FALSE FALSE TRUE
> b1 <- interval_overlap(from, to)
Warning message:
Some empty 'from' intervals encountered. Setting to NA...
> b1
$`1`
[1] 2 3 4
$`2`
integer(0)
$`3`
[1] 4
$`4`
[1] 6
$`5`
integer(0)
> sum(lengths(b1))
[1] 5
关于区间是开区间还是闭区间,可以通过对closed 设置来改变,不同的区间还可以用append 来附加。
对于输入的是一个集合,计算一个集合内的区间重叠数
例子1
> b <- matrix(c(2, 8,8, 9,6, 9,11, 12,3, 3),ncol = 2, byrow = TRUE)
> b
[,1] [,2]
[1,] 2 8
[2,] 8 9
[3,] 6 9
[4,] 11 12
[5,] 3 3
> from <- Intervals(b,closed = c( T, T ),type = "R")
> from
Object of class Intervals
5 intervals over R:
[2, 8]
[8, 9]
[6, 9]
[11, 12]
[3, 3]
> b1 <- interval_overlap(from, from)
> b1
[[1]]
[1] 1 2 3 5
[[2]]
[1] 1 2 3
[[3]]
[1] 1 2 3
[[4]]
[1] 4
[[5]]
[1] 1 5
> (sum(lengths(b1))-nrow(b))/2
[1] 4
例子2
> b <- matrix(c(1, 5,10, 15,5, 10,20, 30),ncol = 2, byrow = TRUE)
> b
[,1] [,2]
[1,] 1 5
[2,] 10 15
[3,] 5 10
[4,] 20 30
> from <- Intervals(b,closed = c( T, T ),type = "R")
> from
Object of class Intervals
4 intervals over R:
[1, 5]
[10, 15]
[5, 10]
[20, 30]
> b1 <- interval_overlap(from, from)
> b1
[[1]]
[1] 1 3
[[2]]
[1] 2 3
[[3]]
[1] 1 2 3
[[4]]
[1] 4
> (sum(lengths(b1))-nrow(b))/2
[1] 2
啦啦啦啦,发现书上的结果错了。
python
这里写代码片