4. Median of Two Sorted Arrays

2017-07-14  本文已影响0人  CharlieGuo

Description:

There are two sorted arrays nums1 and nums2 of size m and n respectively.
Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Example 1:

nums1 = [1, 3]
nums2 = [2]
The median is 2.0

Example 2:

nums1 = [1, 2]
nums2 = [3, 4]
The median is (2 + 3)/2 = 2.5

Solution:

What is a median?

"The middle number in a given sequence of numbers, taken as the average of the two middle numbers when the sequence has an even number of numbers:
4 is the median of 1, 3, 4, 8, 9." (found in dictionary.com)
If a sequence is 1, 3, 4, 5, 8, 9, it's median is (4+5)/2=4.5.

So we must sort the sequence if it is unordered to find the median.

First let's see how we can find the median of one array and then we deal with two arrays.

Find the median of one sorted array

Let's say that "even case" is that the length of the array is even and "odd case" is that the sum length of the array is odd.
In even case, the median is the average value of the middle 2 elements and in odd case, it is just the value of the middle element.

                 left       right
x    x[0] x[1] x[2]       x[3] x[4] x[5]

Then the median is (x[2] + x[3])/2. So in even case, if an array's length is n, the median is (x[n/2]+x[n/2+1])/2.
Meanwhile, x[n/2] is the first element of the right subset, and x[n/2-1] is the last element of the left subset.

                 left       right
x    x[0] x[1] x[2]       x[3] x[4] x[5] x[6]

But all the elements in the left part is still no greater than right. In this case, the median is x[3]. And clearly we know x[3] is the first element of the right subset.
However, we can just use x[3] = x[7/2] to get the median. So in odd case, if an array's length is n, the median is x[n/2].
Meanwhile, x[n/2] is the first element of the right subset and x[n/2-1] is the last element of the left subset.

Find the median of 2 sorted array

Let's say the "even case" is that the sum length of the arrays is even and the same meaning with the "odd case".
From the above we know that if we can split an array into 2 subsets that is equal or "nearly equal", the median can be easily gotten. In the same way, we can split each array into 2 subsets.
We assume the arrays are x and y. After split, x and y are as follows:

               left   right
x    x[0]... x[i-1]   x[i]... x[n-1]
y    y[0]... y[j-1]   y[j]... y[m-1]

To find the median, there are 2 conditions to be met.

  1. the sum length of the left is equal or "nearly equal" to the right.
  2. all the elements in the left subset are no greater than the right

This way we can get the median by comparing or calculating with x[i-1], x[i], y[j-1], y[j]. We'll talk about it later.

To meet the first condition, we calculate the sum length of the subsets is i+j, the right n-i+m-j. If left length is equal to right, then i+j = n-i+m-j.
So i can be expressed as

i = (m+n)/2 - j

But how do we split the 2 arrays. At first we just let

lo = 0;
hi = m;
j = (lo + hi)/2 = m/2;

and then we can get i through that equation. One more thing to note is that n >= m is a must. This is because:

i = (m+n)/2 - j = (m+n)/2 - m/2 = (n-m)/2

If n < m, i will be negative. This will cause ArrayIndexOutOfBoundaryException. To deal with this issue, we can just change x and y and call the function again if n < m. That is :

if (n < m) return f(y, x);

To meet the second condition, we let xl = x[i-1], xh = [xi], yl = y[j-1], yh = y[j]. Since xl < xh and yl < yh are definitely correct. We just need
xl < yh and yl < xh. If these two cannot be met, we need to use binary search to adjust the search range. We woulddiscuss these cases in 3 parts:

  1. if xl < yh and yl < xh
    This means the median is found. To find the median, however, the method of the even case is a little different from the odd case.
leftMax = max(xl, yl);
minRight = min(xh, yh);
median = (leftMax + minRight)/2;
                                    left    right
x   x[0] ... x[(n-1)/2-1](x[i-1])    (x[i])x[(n-1)/2] ... x[n-1]
y   y[0] ... y[m/2-1](y[j-1])        (y[j])y[m/2] ... y[m-1]
           \_________________________/      \________________________/
             left_length = (m+n-1)/2         right_length = (m+n+1)/2

Then right_length - left_length = 1. As discussed above, the middle element is the minimum element of the right subset. So,

median = min(xh, yh)
- `x`'s length is odd and `y`'s length is even

I'll skip because the result is also

median = min(xh, yh)

You can do it yourself.

  1. if xl > yh
    It means that y[j] is too small, we need to split y in a way which make s the right subset shorter and left subset longer so that y[j] can be big enough, in the same way i will decrease and x[i-1] will be smaller. How can we do that?
    We can use binary search method. Since j = (lo + hi)/2, let lo = j+1, then j would be bigger and y[j] is too.

  2. if yl > xh
    It means that y[j-1] is too big, in contrast we set hi = j-1 and calculate j again to make y[j-1] smaller.

By updating lo and hi and then checking if xl < yh and yl < xh are met. We can finally find the median.

More discussion: boundary condition

From the above we know j and i can be change if some condition can not be met. Meanwhile, xl = x[i-1], xh = x[i], yl = y[j-1], yh = y[j]. Here comes the problem: what if i = 0 or n, j = 0 or m ? This would make x[-1], x[n], y[-1], y[m] senseless.
A little trick could handle this. We just need to extend both x and y, which means x[-1] = y[-1] = Integer.MIN_VALUE and x[n] = y[m] = Integer.MAX_VALUE. The sample code is:

xl = (i == 0 ? Integer.MIN_VALUE : x[i-1]);
xh = (i == n ? Integer.MAX_VALUE : x[i]);
yl = (j == 0 ? Integer.MIN_VALUE : y[j-1]);
yh = (j == m ? Integer.MAX_VALUE : y[j]);

At last, an accepted code is posted:

public class Solution {
    public double findMedianSortedArrays(int[] x, int[] y) {
        int n = x.length, m = y.length;

        // must note that 'return' is needed or the programme 
        // would continue if n < m and this will cause the 
        // programme to fail
        if (n < m) return findMedianSortedArrays(y, x);
        
        // set the initial lo and hi
        int lo = 0, hi = m;
        while (lo <= hi) {
            int j = (lo+hi)/2, i = (m+n)/2-j; // i+j == n-i+m-j
            
            // note that how to deal with the boundary condition
            int xl = (i == 0 ? Integer.MIN_VALUE : x[i-1]);
            int xh = (i == n ? Integer.MAX_VALUE : x[i]);
            int yl = (j == 0 ? Integer.MIN_VALUE : y[j-1]);
            int yh = (j == m ? Integer.MAX_VALUE : y[j]);
            
            if (xl > yh) lo = j+1; // y[j] too small, j need to be greater 
            else if (yl > xh) hi = j-1; // y[j-1] too big, j need to be smaller
            else {
                if ((m+n)%2 == 1) return xh < yh ? xh : yh; // odd case
                int leftMax = xl > yl ? xl : yl;
                int rightMin = xh < yh ? xh : yh;
                return (double)(leftMax + rightMin)/2; // even case
            }
        }
        return -1; // actually never execute
    }
}
上一篇下一篇

猜你喜欢

热点阅读