ArrayMap解析

2019-01-03 本文已影响12人 jxiang112

注：来自于Android中ArrayMap的解析
问题：
1、ArrayMap采用的数据结构是？
2、ArrayMap默认容量是多大？
3、ArrayMap最大容量是多少？每次扩容多大？
4、ArrayMap get原理
5、ArrayMap put原理
6、ArrayMap 与 HashMap比较？优缺点？应用场景？

问题1：ArrayMap采用的数据结构是？

采用的是两个一维数组，一个用来存储key的hashcode，其下标代表添加元素的起始下标；一个用来存储添加元素的key和value。它属于哈希表

int[] mHashes;
Object[] mArray;

其中mHashes存储的是key的hashcode; mArray存储的是元素的的key和value其结构图如下：

图1

问题2：ArrayMap默认容量是多大？

ArrayMap没有定义默认容量是多少，我们通过其默认构造函数得知其默认容量是0,看源码：

public ArrayMap() {
        this(0, false);
    }

问题3：ArrayMap最大容量是多少？每次扩容多大？

ArrayMap我们在问题5中一并回答此问题

问题4：ArrayMap get原理

先看起源码：

 @Override
  public V get(Object key) {
        //取得key的hashcode所在mHashes的下标,
        final int index = indexOfKey(key);
        /**
        根据mArray的数据存储结构，得知mHashes的 下标*2  (index << 1)便得到其对应元素在mArray的起始下标，第一个是key，第二个是value
        */
        return index >= 0 ? (V)mArray[(index<<1)+1] : null;
  }
public int indexOfKey(Object key) {
        return key == null ? indexOfNull()
                : indexOf(key, mIdentityHashCode ? System.identityHashCode(key) : key.hashCode());
}

int indexOf(Object key, int hash) {
        final int N = mSize;

        // Important fast case: if nothing is in here, nothing to look for.
        if (N == 0) {
            return ~0;
        }
        
        //二分法找到hash所在的下标
        int index = binarySearchHashes(mHashes, N, hash);

        // If the hash code wasn't found, then we have no entry for this key.
        if (index < 0) {
            //没找到，直接返回
            return index;
        }

        // If the key at the returned index matches, that's what we want.
        if (key.equals(mArray[index<<1])) {
            //如果hash下标对应mArray中的key与要找的key相等，直接返回当前下标
            return index;
        }
        //出现冲突处理方案
        //遍历当前index之后元素，找到匹配的key所在的下标
        // Search for a matching key after the index.
        int end;
        for (end = index + 1; end < N && mHashes[end] == hash; end++) {
            if (key.equals(mArray[end << 1])) return end;
        }
        //遍历当前index之前元素，找到匹配的key所在的下标
        // Search for a matching key before the index.
        for (int i = index - 1; i >= 0 && mHashes[i] == hash; i--) {
            if (key.equals(mArray[i << 1])) return i;
        }

        // Key not found -- return negative value indicating where a
        // new entry for this key should go.  We use the end of the
        // hash chain to reduce the number of array entries that will
        // need to be copied when inserting.
        return ~end;
}

不难发现get的原理是：
1、取key的hashcode作为散列函数的值hash
2、通过二分法找到hash在mHashes中的位置下标index
3、根据hash的下标index计算出映射地址(index << 1)
4、取出映射地址的元素，判断元素的key与要找的key是否匹配，如果匹配则返回index值
5、步骤4中，如果不匹配，代表有冲突，则先遍历index之后的元素，再遍历index之前的元素，直到匹配的元素并返回对应的下标

信息量：
1、key是允许为空的
2、采用二分法查找key

问题5：ArrayMap put原理

先看put操作的源码：

@Override
    public V put(K key, V value) {
        //当前容量
        final int osize = mSize;
        //key的散列值
        final int hash;
        //key的hash所在的下标
        int index;

        if (key == null) {
            //key为空hash值为0
            hash = 0;
            //找到key的hash值的下标
            index = indexOfNull();
        } else {
            //key的hash值
            hash = mIdentityHashCode ? System.identityHashCode(key) : key.hashCode();
            // 找到key的hash值的下标
            index = indexOf(key, hash);
        }
        if (index >= 0) {
            //当前要添加的元素已经存在，则直接进行替换操作
            index = (index<<1) + 1;
            final V old = (V)mArray[index];
            mArray[index] = value;
            return old;
        }
        //取反得到要添加元素的位置
        index = ~index;
        if (osize >= mHashes.length) {
            //扩容新的容量
            final int n = osize >= (BASE_SIZE*2) ? (osize+(osize>>1))
                    : (osize >= BASE_SIZE ? (BASE_SIZE*2) : BASE_SIZE);

            if (DEBUG) Log.d(TAG, "put: grow from " + mHashes.length + " to " + n);
            //原hash数组
            final int[] ohashes = mHashes;
            //原散列表
            final Object[] oarray = mArray;
            //扩容操作
            allocArrays(n);

            if (CONCURRENT_MODIFICATION_EXCEPTIONS && osize != mSize) {
                throw new ConcurrentModificationException();
            }

            if (mHashes.length > 0) {
                if (DEBUG) Log.d(TAG, "put: copy 0-" + osize + " to 0");
                //将原数组中的拷贝回新数组中
                System.arraycopy(ohashes, 0, mHashes, 0, ohashes.length);
                System.arraycopy(oarray, 0, mArray, 0, oarray.length);
            }
            //回收释放操作
            freeArrays(ohashes, oarray, osize);
        }

        if (index < osize) {
            if (DEBUG) Log.d(TAG, "put: move " + index + "-" + (osize-index)
                    + " to " + (index+1));
            //将index处（含index）及其之后的数据往后移
            System.arraycopy(mHashes, index, mHashes, index + 1, osize - index);
            System.arraycopy(mArray, index << 1, mArray, (index + 1) << 1, (mSize - index) << 1);
        }

        if (CONCURRENT_MODIFICATION_EXCEPTIONS) {
            if (osize != mSize || index >= mHashes.length) {
                throw new ConcurrentModificationException();
            }
        }
        //将数据添加到index处
        mHashes[index] = hash;
        mArray[index<<1] = key;
        mArray[(index<<1)+1] = value;
        mSize++;
        return null;
    }

private void allocArrays(final int size) {
        if (mHashes == EMPTY_IMMUTABLE_INTS) {
            //扩容时如果mHashes 是不可变的，则抛出异常
            throw new UnsupportedOperationException("ArrayMap is immutable");
        }
        if (size == (BASE_SIZE*2)) {
            //如果扩容容量为8（BASE_SIZE=4）
            synchronized (ArrayMap.class) {
                if (mTwiceBaseCache != null) {
                    /**
                      如果当前有容量为8的int缓存可复用数组和容量为16的object缓存可复用数组，则复用这些数组，而不重新new
                     */
                    final Object[] array = mTwiceBaseCache;
                    mArray = array;
                    mTwiceBaseCache = (Object[])array[0];
                    mHashes = (int[])array[1];
                    array[0] = array[1] = null;
                    mTwiceBaseCacheSize--;
                    if (DEBUG) Log.d(TAG, "Retrieving 2x cache " + mHashes
                            + " now have " + mTwiceBaseCacheSize + " entries");
                    return;
                }
            }
        } else if (size == BASE_SIZE) {
             //如果扩容容量为4（BASE_SIZE=4）
            synchronized (ArrayMap.class) {
                if (mBaseCache != null) {
                    /**
                      如果当前有容量为4的int缓存可复用数组和容量为8的object缓存可复用数组，则复用这些数组，而不重新new
                     */
                    final Object[] array = mBaseCache;
                    mArray = array;
                    mBaseCache = (Object[])array[0];
                    mHashes = (int[])array[1];
                    array[0] = array[1] = null;
                    mBaseCacheSize--;
                    if (DEBUG) Log.d(TAG, "Retrieving 1x cache " + mHashes
                            + " now have " + mBaseCacheSize + " entries");
                    return;
                }
            }
        }

        mHashes = new int[size];
        mArray = new Object[size<<1];
    }

    private static void freeArrays(final int[] hashes, final Object[] array, final int size) {
        if (hashes.length == (BASE_SIZE*2)) {
            //如果当前容量为8（BASE_SIZE=4）
            synchronized (ArrayMap.class) {
                if (mTwiceBaseCacheSize < CACHE_SIZE) {
                    //缓存当前数组，并将数组下标为2之后的数据设置为null
                    array[0] = mTwiceBaseCache;
                    array[1] = hashes;
                    for (int i=(size<<1)-1; i>=2; i--) {
                        array[i] = null;
                    }
                    mTwiceBaseCache = array;
                    mTwiceBaseCacheSize++;
                    if (DEBUG) Log.d(TAG, "Storing 2x cache " + array
                            + " now have " + mTwiceBaseCacheSize + " entries");
                }
            }
        } else if (hashes.length == BASE_SIZE) {
            //如果当前容量为4（BASE_SIZE=4）
            synchronized (ArrayMap.class) {
                //缓存当前数组，并将数组下标为2之后的数据设置为null
                if (mBaseCacheSize < CACHE_SIZE) {
                    array[0] = mBaseCache;
                    array[1] = hashes;
                    for (int i=(size<<1)-1; i>=2; i--) {
                        array[i] = null;
                    }
                    mBaseCache = array;
                    mBaseCacheSize++;
                    if (DEBUG) Log.d(TAG, "Storing 1x cache " + array
                            + " now have " + mBaseCacheSize + " entries");
                }
            }
        }
    }

根据上述源码可以概述put原理如下：
1、根据key的hashcode通过二分法找到匹配hash的下标值index
2、如果找到匹配的元素，表示已经存在要添加的元素，则直接根据index替换元素的值
3、如果没有找到匹配的元素，表示当前不存在要添加的元素需要添加新元素，对下标值取反操作得到要添加元素的位置下标（为什么取反操作得到的是添加元素位置下标）
4、添加元素之前，判断是否需要扩容，如果需要扩容先进行扩容，将原来容器中的数据拷贝到新容器中，接着回收释放原来容器的数据
5、添加新元素到index的位置
通过put操作我们可以得出其中的信息量：

key 和 value是可以null的
没有最大扩容的限制直到出现oom
每次扩容时，如果当前容量>=8, 则扩容的容量是原来容量的一半；如果当前容量>=4&&<8则扩容到容量为8的大小；如果当前容量<8,则扩容到容量为4的大小
put操作并非线程安全
对应容量为8和容量为4是有缓存容器即mHashess数组和mArray数组的功能，提供内存使用效率

问题6：ArrayMap 与 HashMap比较？优缺点？应用场景？

比较：
1、ArrayMap采用的数据结构是两个一维数组，而HashMap使用的是一维数组和单链表数据结构
2、ArrayMap默认容量为0，而HashMap默认容量是16
3、ArrayMap没有最大容量的限制，直到报oom，而HashMap最大容量最大是Integer.MAX_VALUE
4、ArrayMap默认每次扩容时原来容量一半的增量；而HashMap默认每次扩容时原来容量0.75倍的增量
优点：
1、相比HashMap内存空间更优，因为比HashMap少了一个实体类进行装饰
2、容量为4或者8时又缓存复用功能
3、扩容比HashMap高效，因为HashMap扩容时相当于重构，需要重新重新计算hash值和移动元素；而ArrayMap扩容时只需拷贝
缺点：
1、数据量大的情况下查询效率比HashMap差
2、存取效率比HashMap低，因为每次存取都需要二分法找到对应的下标
3、没有实现Serializable，不便在Android bundle进行传输

使用场景：
1、数据量小，建议百量级别
2、内存要求高