ArrayMap源码分析
ArrayMap使用
//创建
Map<String, String> arrayMap = new ArrayMap<>();
//添加数据
arrayMap.put("1", "1");
arrayMap.put("1", null);
arrayMap.put(null, null);
arrayMap.put(null, "1");
//移除
arrayMap.remove("1");
//获取
String value = arrayMap.get("1");
可以看到ArrayMap的使用和HashMap的使用并无多大差别,Android官方设计了ArrayMap也是想在某些场景下用来替代HashMap。
ArrayMap源码分析
下面简单分析下ArrayMap的源码实现,首先看下类的注释
ArrayMap is a generic key->value mapping data structure that is designed to be more memory efficient than a traditional {@link java.util.HashMap}. It keeps its mappings in an array data structure-- an integer array of hash codes for each item, and an Object array of the key/value pairs. This allows it to avoid having to create an extra object for every entry put in to the map, and it also tries to control the growth of the size of these arrays more aggressively (since growing them only requires copying the entries in the array, not rebuilding a hash map).
上面这段注释解释了ArrayMap的设计初衷,旨在提供比HashMap更高的内存效率的键值对存储结构,ArrayMap内部有一个int类型的数组用来存储key的hashcode,还有一个Object的数组用来存储key/value,跟HashMap相比避免了创建额外的数据结构(Entry),同时在扩容时只需要操作数组而不需要重建hash表。
<p>Note that this implementation is not intended to be appropriate for data structures that may contain large numbers of items. It is generally slower than a traditional HashMap, since lookups require a binary search and adds and removes require inserting and deleting entries in the array. For containers holding up to hundreds of items, the performance difference is not significant, less than 50%.</p>
以上注释则说明了ArrayMap使用用二分查找来获取key的hash值的索引,同时在增加和删除数据时需要操作整个数组,在数据量大时性能较差。
ArrayMap构造方法
public final class ArrayMap<K, V> implements Map<K, V> {
/**
* The minimum amount by which the capacity of a ArrayMap will increase.
* This is tuned to be relatively space-efficient.
*/
private static final int BASE_SIZE = 4;
/**
* Maximum number of entries to have in array caches.
*/
private static final int CACHE_SIZE = 10;
/**
* Special hash array value that indicates the container is immutable.
*/
static final int[] EMPTY_IMMUTABLE_INTS = new int[0];
/**
* @hide Special immutable empty ArrayMap.
*/
public static final ArrayMap EMPTY = new ArrayMap<>(-1);
/**
* Caches of small array objects to avoid spamming garbage. The cache
* Object[] variable is a pointer to a linked list of array objects.
* The first entry in the array is a pointer to the next array in the
* list; the second entry is a pointer to the int[] hash code array for it.
*/
//存放大小为BASE_SIZE的 int[] 和 Object[]
static Object[] mBaseCache;
//缓存大小
static int mBaseCacheSize;
//存放大小为2*BASE_SIZE的 int[] 和 Object[]
static Object[] mTwiceBaseCache;
//缓存大小
static int mTwiceBaseCacheSize;
final boolean mIdentityHashCode;
//存放key的hash值的数组
int[] mHashes;
//存放key/value的数组
Object[] mArray;
//已保存的数据大小
int mSize;
MapCollections<K, V> mCollections;
/**
* Create a new empty ArrayMap. The default capacity of an array map is 0, and
* will grow once items are added to it.
*/
public ArrayMap() {
this(0, false);
}
/**
* Create a new ArrayMap with a given initial capacity.
*/
public ArrayMap(int capacity) {
this(capacity, false);
}
/** {@hide} */
public ArrayMap(int capacity, boolean identityHashCode) {
mIdentityHashCode = identityHashCode;
// If this is immutable, use the sentinal EMPTY_IMMUTABLE_INTS
// instance instead of the usual EmptyArray.INT. The reference
// is checked later to see if the array is allowed to grow.
if (capacity < 0) {
mHashes = EMPTY_IMMUTABLE_INTS;
mArray = EmptyArray.OBJECT;
} else if (capacity == 0) {
mHashes = EmptyArray.INT;
mArray = EmptyArray.OBJECT;
} else {
//这里会为存储key hash的int数组和存放 key/value的Object数组进行初始化,可能会复用缓存
allocArrays(capacity);
}
mSize = 0;
}
/**
* Create a new ArrayMap with the mappings from the given ArrayMap.
*/
public ArrayMap(ArrayMap<K, V> map) {
this();
if (map != null) {
putAll(map);
}
}
//省略...
}
ArrayMap对外提供了三个构造方法,在ArrayMap内部有两个比较重要的数组,mHashes和mArray。mHashes是一个int数组用来存放key的hashcode值,而mArray用来存储key和value的值,它是一个Object数组。
在ArrayMap的构造方法中,有一句比较重要的代码,allocArrays(capacity),这个方法会初始化mHashes和mArray两个数组,同时方法内涉及了数组缓存的复用。
private void allocArrays(final int size) {
if (mHashes == EMPTY_IMMUTABLE_INTS) {
throw new UnsupportedOperationException("ArrayMap is immutable");
}
//如果数组容量为4或者8,且之前有缓存则复用给mArray mHashes
if (size == (BASE_SIZE*2)) {
synchronized (ArrayMap.class) {
if (mTwiceBaseCache != null) {
final Object[] array = mTwiceBaseCache;
mArray = array;
mTwiceBaseCache = (Object[])array[0];
mHashes = (int[])array[1];
array[0] = array[1] = null;
mTwiceBaseCacheSize--;
if (DEBUG) Log.d(TAG, "Retrieving 2x cache " + mHashes
+ " now have " + mTwiceBaseCacheSize + " entries");
return;
}
}
} else if (size == BASE_SIZE) {
synchronized (ArrayMap.class) {
if (mBaseCache != null) {
final Object[] array = mBaseCache;
mArray = array;
mBaseCache = (Object[])array[0];
mHashes = (int[])array[1];
array[0] = array[1] = null;
mBaseCacheSize--;
if (DEBUG) Log.d(TAG, "Retrieving 1x cache " + mHashes
+ " now have " + mBaseCacheSize + " entries");
return;
}
}
}
//不能复用缓存,新建两个数组,size<<1 = size * 2 所以mArray的容量是mHashes的两倍
mHashes = new int[size];
mArray = new Object[size<<1];
}
当要申请的容量为4或者8(BASE_SIZE=4)时,如果数组有缓存则复用,可以看到mArray的容量是mHashes的两倍,它们的结构如下图所示。
ArrayMap内部存储结构
获取数据
@Override
public V get(Object key) {
//获取key的索引,如果索引值小于0则说明key不存在
final int index = indexOfKey(key);
return index >= 0 ? (V)mArray[(index<<1)+1] : null;
}
public int indexOfKey(Object key) {
return key == null ? indexOfNull()
: indexOf(key, mIdentityHashCode ? System.identityHashCode(key) : key.hashCode());
}
int indexOf(Object key, int hash) {
final int N = mSize;
// Important fast case: if nothing is in here, nothing to look for.
//集合为空
if (N == 0) {
return ~0;
}
//二分查找keyhash对应的索引值,未找到时将索引值取反,从而可以根据返回值的正负来判断是否命中
int index = binarySearchHashes(mHashes, N, hash);
// If the hash code wasn't found, then we have no entry for this key.
//index小于0,不存在该key值
if (index < 0) {
return index;
}
// If the key at the returned index matches, that's what we want.
//存在该hash值,判断该索引值对应的key和传入的key值是否相等因为有可能会发生hash冲突
if (key.equals(mArray[index<<1])) {
return index;
}
//hash值相等,但key值不等,发生了hash冲突,以index+1开始向后遍历判断相等的key值
// Search for a matching key after the index.
int end;
for (end = index + 1; end < N && mHashes[end] == hash; end++) {
if (key.equals(mArray[end << 1])) return end;
}
//向后遍历没找到该key值,再以index-1开始向前遍历
// Search for a matching key before the index.
for (int i = index - 1; i >= 0 && mHashes[i] == hash; i--) {
if (key.equals(mArray[i << 1])) return i;
}
//均未找到,将end取反,注意end是hash冲突后从冲突位置往后遍历得到的值
// Key not found -- return negative value indicating where a
// new entry for this key should go. We use the end of the
// hash chain to reduce the number of array entries that will
// need to be copied when inserting.
return ~end;
}
获取数据的代码比较简单,先获取key的索引,获取成功的话返回对应值没找到则返回null。值得关注的是通过hash值获取对应索引的方法indexOf(Object key, int hash),代码已经加了相关注释,简单总结下这个方法的逻辑:
- ArrayMap通过二分查找获取hash对应的索引值,所以存放hash数组一定是有序的,这是二分查找的前提
- 若未找到时将索引值取反,从而可以根据返回值的正负来判断是否命中,大于0表示存在该hash值,小于0则表示不存在
- 当发现存在hash值时也需要判断该索引值对应的key和传入的key值是否相等,因为有可能会发生hash冲突
- 当hash值相等但key值不等,先以index+1开始向后遍历hash数组判断相等的key值,向后遍历也未找到时继续以index-1开头向前遍历
- 注意!到了最后均未找到时将end取反,end是hash冲突后从index+1往后遍历得到的值,不同于HashMap解决hash冲突采用的是链地址法,ArrayMap采用的是开放地址法
保存数据
@Override
public V put(K key, V value) {
final int osize = mSize;
final int hash;
int index;
//允许key为空null,key为空时hash为0
if (key == null) {
hash = 0;
index = indexOfNull();
} else {
hash = mIdentityHashCode ? System.identityHashCode(key) : key.hashCode();
//获取key对应存放的索引,采用二分查找hash的索引值,最后没找到的话会将index取反,所以可以根据返回的index值的正负来判断是否命中
index = indexOf(key, hash);
}
//index大于0,说明已存在这个key,新的value直接覆盖掉原来的value值
if (index >= 0) {
index = (index<<1) + 1;
final V old = (V)mArray[index];
mArray[index] = value;
return old;
}
//index取反
index = ~index;
//存放的值的大小大于等于现有数组长度,需要扩容
if (osize >= mHashes.length) {
//扩容策略:若数组的容量大于2 * BASE_SIZE时扩容为当前容量的1.5倍,介于2 * BASE_SIZE与BASE_SIZE之间扩容为2 * BASE_SIZE,小于BASE_SIZE则扩容为BASE_SIZE
final int n = osize >= (BASE_SIZE*2) ? (osize+(osize>>1))
: (osize >= BASE_SIZE ? (BASE_SIZE*2) : BASE_SIZE);
if (DEBUG) Log.d(TAG, "put: grow from " + mHashes.length + " to " + n);
final int[] ohashes = mHashes;
final Object[] oarray = mArray;
//重新创建扩容后的数组
allocArrays(n);
if (CONCURRENT_MODIFICATION_EXCEPTIONS && osize != mSize) {
throw new ConcurrentModificationException();
}
//将原数组数据拷贝至扩容后的新数组
if (mHashes.length > 0) {
if (DEBUG) Log.d(TAG, "put: copy 0-" + osize + " to 0");
System.arraycopy(ohashes, 0, mHashes, 0, ohashes.length);
System.arraycopy(oarray, 0, mArray, 0, oarray.length);
}
//回收原数组
freeArrays(ohashes, oarray, osize);
}
//需要移动原数组,腾出index所在的位置
if (index < osize) {
if (DEBUG) Log.d(TAG, "put: move " + index + "-" + (osize-index)
+ " to " + (index+1));
System.arraycopy(mHashes, index, mHashes, index + 1, osize - index);
System.arraycopy(mArray, index << 1, mArray, (index + 1) << 1, (mSize - index) << 1);
}
if (CONCURRENT_MODIFICATION_EXCEPTIONS) {
if (osize != mSize || index >= mHashes.length) {
throw new ConcurrentModificationException();
}
}
//保存
mHashes[index] = hash;
mArray[index<<1] = key;
mArray[(index<<1)+1] = value;
mSize++;
return null;
}
总结一下put方法的逻辑:
- ArrayMap允许key为空null,key为空时hash默认为0
- 通过上面分析的indexOf(Object key, int hash)方法获取hash值存放的索引
- 返回的索引值大于0,说明已存在这个key,新的value直接覆盖掉原来的value值
- 索引值不存在,先判断是否需要扩容,ArrayMap的扩容策略是:若数组的容量大于2 * BASE_SIZE时扩容为当前容量的1.5倍,介于2 * BASE_SIZE与BASE_SIZE之间扩容为2 * BASE_SIZE,小于BASE_SIZE则扩容为BASE_SIZE,扩容之后会调用freeArrays()方法回收原数组
-
判断是否需要移动数组,mHashs和mArray的index关系结构如下图所示
index关系结构
移除数据
@Override
public V remove(Object key) {
//查找key是否存在
final int index = indexOfKey(key);
if (index >= 0) {
//删除对应key
return removeAt(index);
}
return null;
}
public V removeAt(int index) {
//获取原索引值对应的value
final Object old = mArray[(index << 1) + 1];
final int osize = mSize;
final int nsize;
if (osize <= 1) {
//存放的数据小于等于1,移除完数据后集合为空回收数组,避免内存占用
// Now empty.
if (DEBUG) Log.d(TAG, "remove: shrink from " + mHashes.length + " to 0");
final int[] ohashes = mHashes;
final Object[] oarray = mArray;
mHashes = EmptyArray.INT;
mArray = EmptyArray.OBJECT;
//回收数组
freeArrays(ohashes, oarray, osize);
nsize = 0;
} else {
nsize = osize - 1;
if (mHashes.length > (BASE_SIZE*2) && mSize < mHashes.length/3) {
//现有数组容量过大且占用率不高,收缩数组降低容量,减少内存占用
// Shrunk enough to reduce size of arrays. We don't allow it to
// shrink smaller than (BASE_SIZE*2) to avoid flapping between
// that and BASE_SIZE.
final int n = osize > (BASE_SIZE*2) ? (osize + (osize>>1)) : (BASE_SIZE*2);
if (DEBUG) Log.d(TAG, "remove: shrink from " + mHashes.length + " to " + n);
final int[] ohashes = mHashes;
final Object[] oarray = mArray;
//重新为收缩后的数组分配内存空间
allocArrays(n);
if (CONCURRENT_MODIFICATION_EXCEPTIONS && osize != mSize) {
throw new ConcurrentModificationException();
}
//拷贝数组将原数组的数据拷贝至新数组,移除掉index位置的数据
if (index > 0) {
if (DEBUG) Log.d(TAG, "remove: copy from 0-" + index + " to 0");
System.arraycopy(ohashes, 0, mHashes, 0, index);
System.arraycopy(oarray, 0, mArray, 0, index << 1);
}
if (index < nsize) {
if (DEBUG) Log.d(TAG, "remove: copy from " + (index+1) + "-" + nsize
+ " to " + index);
//将index位置后面的元素往前移动一位
System.arraycopy(ohashes, index + 1, mHashes, index, nsize - index);
System.arraycopy(oarray, (index + 1) << 1, mArray, index << 1,
(nsize - index) << 1);
}
} else {
//不需要收缩数组,直接移动数组内的元素即可,将index位置后面的元素往前移动一位
if (index < nsize) {
if (DEBUG) Log.d(TAG, "remove: move " + (index+1) + "-" + nsize
+ " to " + index);
System.arraycopy(mHashes, index + 1, mHashes, index, nsize - index);
System.arraycopy(mArray, (index + 1) << 1, mArray, index << 1,
(nsize - index) << 1);
}
mArray[nsize << 1] = null;
mArray[(nsize << 1) + 1] = null;
}
}
if (CONCURRENT_MODIFICATION_EXCEPTIONS && osize != mSize) {
throw new ConcurrentModificationException();
}
mSize = nsize;
return (V)old;
}
remove(Object key)方法也比较简单,主要实现逻辑在removeAt(int index)方法,简单分析这个方法的逻辑:
- 若移除完数据后集合为空,将数组置为空数组,大小置为0,并且调用freeArrays()方法回收数组,避免内存占用
- 如果数组的容量大于2*BASE_SIZE且占用率小于容量的1/3,则会收缩数组降低容量,减少内存占用
- 以上条件都不满足时,将index位置后面的元素往前移动一位以此来删除数据
- ArrayMap移除数据时用复制数组的操作来覆盖指定位置的元素,所以当数据量大时效率较差
private static void freeArrays(final int[] hashes, final Object[] array, final int size) {
//将数组长度为BASE_SIZE*2缓存至mTwiceBaseCache中
if (hashes.length == (BASE_SIZE*2)) {
synchronized (ArrayMap.class) {
if (mTwiceBaseCacheSize < CACHE_SIZE) {
array[0] = mTwiceBaseCache;
array[1] = hashes;
for (int i=(size<<1)-1; i>=2; i--) {
array[i] = null;
}
mTwiceBaseCache = array;
mTwiceBaseCacheSize++;
if (DEBUG) Log.d(TAG, "Storing 2x cache " + array
+ " now have " + mTwiceBaseCacheSize + " entries");
}
}
} else if (hashes.length == BASE_SIZE) {
//将数组长度为BASE_SIZE缓存至mBaseCacheSize中
synchronized (ArrayMap.class) {
if (mBaseCacheSize < CACHE_SIZE) {
array[0] = mBaseCache;
array[1] = hashes;
for (int i=(size<<1)-1; i>=2; i--) {
array[i] = null;
}
mBaseCache = array;
mBaseCacheSize++;
if (DEBUG) Log.d(TAG, "Storing 1x cache " + array
+ " now have " + mBaseCacheSize + " entries");
}
}
}
}
在前面的分析也可以看到ArrayMap在put和remove方法中可能会频繁的出现多个容量为BASE_SIZE和BASE_SIZE * 2的int数组和Object数组,所以为了避免连续创建对象,将容量为BASE_SIZE和BASE_SIZE * 2的int数组和Object数组进行缓存。
总结
优点:
- 能够重复的利用因为数据扩容而遗留下来的数组空间,方便下一个ArrayMap的使用
- 积极地控制存储数组的长度的增加,占用率较低时及时收缩数组减少内存占用,同时在扩容时只需拷贝数组而不用重构哈希
- ArrayMap在扩容在上采用的是System.appaycopy所以效率上更高
缺点:
- 不适用于含有大量条目的数据类型,在查找时需要进行二分查找,增加或删除时需要移动数组。