4. Median of Two Sorted Arrays
Description:
There are two sorted arrays nums1 and nums2 of size m and n respectively.
Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).
Example 1:
nums1 = [1, 3]
nums2 = [2]
The median is 2.0
Example 2:
nums1 = [1, 2]
nums2 = [3, 4]
The median is (2 + 3)/2 = 2.5
Solution:
What is a median?
"The middle number in a given sequence of numbers, taken as the average of the two middle numbers when the sequence has an even number of numbers:
4 is the median of 1, 3, 4, 8, 9." (found in dictionary.com)
If a sequence is 1, 3, 4, 5, 8, 9, it's median is (4+5)/2=4.5.
So we must sort the sequence if it is unordered to find the median.
First let's see how we can find the median of one array and then we deal with two arrays.
Find the median of one sorted array
Let's say that "even case" is that the length of the array is even and "odd case" is that the sum length of the array is odd.
In even case, the median is the average value of the middle 2 elements and in odd case, it is just the value of the middle element.
- even case
Since the array is sorted, we just need to find the middle 2 elements. Preparing for the 2 arrays problem, we can split the array into 2 equal length parts. This way, all the elements in the left part are no greater than right. For example, if arrayx
has 6 elements, it can be split as:
left right
x x[0] x[1] x[2] x[3] x[4] x[5]
Then the median is (x[2] + x[3])/2
. So in even case, if an array's length is n
, the median is (x[n/2]+x[n/2+1])/2
.
Meanwhile, x[n/2]
is the first element of the right subset, and x[n/2-1]
is the last element of the left subset.
- odd case
In odd case, the array cannot be split into 2 length-equal subsets. Say that arrayx
has 7 elements, it can be split as:
left right
x x[0] x[1] x[2] x[3] x[4] x[5] x[6]
But all the elements in the left part is still no greater than right. In this case, the median is x[3]
. And clearly we know x[3]
is the first element of the right subset.
However, we can just use x[3] = x[7/2]
to get the median. So in odd case, if an array's length is n
, the median is x[n/2]
.
Meanwhile, x[n/2]
is the first element of the right subset and x[n/2-1]
is the last element of the left subset.
Find the median of 2 sorted array
Let's say the "even case" is that the sum length of the arrays is even and the same meaning with the "odd case".
From the above we know that if we can split an array into 2 subsets that is equal or "nearly equal", the median can be easily gotten. In the same way, we can split each array into 2 subsets.
We assume the arrays are x
and y
. After split, x
and y
are as follows:
left right
x x[0]... x[i-1] x[i]... x[n-1]
y y[0]... y[j-1] y[j]... y[m-1]
To find the median, there are 2 conditions to be met.
- the sum length of the left is equal or "nearly equal" to the right.
- all the elements in the left subset are no greater than the right
This way we can get the median by comparing or calculating with x[i-1], x[i], y[j-1], y[j]
. We'll talk about it later.
To meet the first condition, we calculate the sum length of the subsets is i+j
, the right n-i+m-j
. If left length is equal to right, then i+j = n-i+m-j
.
So i
can be expressed as
i = (m+n)/2 - j
But how do we split the 2 arrays. At first we just let
lo = 0;
hi = m;
j = (lo + hi)/2 = m/2;
and then we can get i
through that equation. One more thing to note is that n >= m is a must. This is because:
i = (m+n)/2 - j = (m+n)/2 - m/2 = (n-m)/2
If n < m, i
will be negative. This will cause ArrayIndexOutOfBoundaryException. To deal with this issue, we can just change x
and y
and call the function again if n < m. That is :
if (n < m) return f(y, x);
To meet the second condition, we let xl = x[i-1], xh = [xi], yl = y[j-1], yh = y[j]
. Since xl < xh
and yl < yh
are definitely correct. We just need
xl < yh
and yl < xh
. If these two cannot be met, we need to use binary search to adjust the search range. We woulddiscuss these cases in 3 parts:
- if
xl < yh
andyl < xh
This means the median is found. To find the median, however, the method of the even case is a little different from the odd case.
- even case
We have to find the 2 middle elements and calculate the average. That is, we have to find the maximum element in the left subset and the minimum elements in the right subset.
As the arrays are sorted and split, this can be down by following steps:
leftMax = max(xl, yl);
minRight = min(xh, yh);
median = (leftMax + minRight)/2;
- odd case
As discussed in the one sorted array case, the middle element is the median, which is the same in the two sorted arrays. Now, the key becomes how can we know which element is the middle element? Can we make sure that the middle element is the one in[xl, yl, xh, yr]
? More discussion is needed.
Calling that at firstj = m/2
andi = (m+n)/2 - j
, we can use two example to make it simple.-
x
's length is odd andy
's length is even
That isn
is odd andm
is even. Som%2 = 0
andy
can be perfectly split into 2 length-equal subset. However,i = (m+n)/2 - j = (m+n)/2 - m/2 = n/2
. Sincen
is odd, son/2
actually means(n-1)/2
, which would lead the right subset ofy
is longer than the left (for example,n = 5
, thenn/2 = (n-1)/2 = 2/2 = 2
, so the left subset is[x[0], x[1]]
and the right is[x[2], x[3], x[4]]
). The split result can be shown as following:
-
left right
x x[0] ... x[(n-1)/2-1](x[i-1]) (x[i])x[(n-1)/2] ... x[n-1]
y y[0] ... y[m/2-1](y[j-1]) (y[j])y[m/2] ... y[m-1]
\_________________________/ \________________________/
left_length = (m+n-1)/2 right_length = (m+n+1)/2
Then right_length - left_length = 1
. As discussed above, the middle element is the minimum element of the right subset. So,
median = min(xh, yh)
- `x`'s length is odd and `y`'s length is even
I'll skip because the result is also
median = min(xh, yh)
You can do it yourself.
-
if
xl > yh
It means thaty[j]
is too small, we need to splity
in a way which make s the right subset shorter and left subset longer so thaty[j]
can be big enough, in the same wayi
will decrease andx[i-1]
will be smaller. How can we do that?
We can use binary search method. Sincej = (lo + hi)/2
, letlo = j+1
, thenj
would be bigger andy[j]
is too. -
if
yl > xh
It means thaty[j-1]
is too big, in contrast we sethi = j-1
and calculatej
again to makey[j-1]
smaller.
By updating lo
and hi
and then checking if xl < yh
and yl < xh
are met. We can finally find the median.
More discussion: boundary condition
From the above we know j
and i
can be change if some condition can not be met. Meanwhile, xl = x[i-1], xh = x[i], yl = y[j-1], yh = y[j]
. Here comes the problem: what if i = 0 or n, j = 0 or m
? This would make x[-1], x[n], y[-1], y[m]
senseless.
A little trick could handle this. We just need to extend both x
and y
, which means x[-1] = y[-1] = Integer.MIN_VALUE
and x[n] = y[m] = Integer.MAX_VALUE
. The sample code is:
xl = (i == 0 ? Integer.MIN_VALUE : x[i-1]);
xh = (i == n ? Integer.MAX_VALUE : x[i]);
yl = (j == 0 ? Integer.MIN_VALUE : y[j-1]);
yh = (j == m ? Integer.MAX_VALUE : y[j]);
At last, an accepted code is posted:
public class Solution {
public double findMedianSortedArrays(int[] x, int[] y) {
int n = x.length, m = y.length;
// must note that 'return' is needed or the programme
// would continue if n < m and this will cause the
// programme to fail
if (n < m) return findMedianSortedArrays(y, x);
// set the initial lo and hi
int lo = 0, hi = m;
while (lo <= hi) {
int j = (lo+hi)/2, i = (m+n)/2-j; // i+j == n-i+m-j
// note that how to deal with the boundary condition
int xl = (i == 0 ? Integer.MIN_VALUE : x[i-1]);
int xh = (i == n ? Integer.MAX_VALUE : x[i]);
int yl = (j == 0 ? Integer.MIN_VALUE : y[j-1]);
int yh = (j == m ? Integer.MAX_VALUE : y[j]);
if (xl > yh) lo = j+1; // y[j] too small, j need to be greater
else if (yl > xh) hi = j-1; // y[j-1] too big, j need to be smaller
else {
if ((m+n)%2 == 1) return xh < yh ? xh : yh; // odd case
int leftMax = xl > yl ? xl : yl;
int rightMin = xh < yh ? xh : yh;
return (double)(leftMax + rightMin)/2; // even case
}
}
return -1; // actually never execute
}
}