ArrayList源码分析
ArrayList
概述
ArrayList
是使用频率最高的集合之一了,在需要使用List
的情况下,往往都是优先考虑ArrayList
。首先我们来看一下声明:
public class ArrayList<E> extends AbstractList<E>
implements List<E>, RandomAccess, Cloneable, java.io.Serializable
ArrayList
实现的几个接口中,RandomAccess、Cloneable、Serializable
都是标记接口,所以ArrayList
是很纯粹的List
接口的实现,不像它兄弟LinkedList
还实现了Deque
接口,还要作为双向队列使用。
结构
transient Object[] elementData;
// 这个继承自父类AbstractList
protected transient int modCount = 0;
ArrayList
的名称中我们就可以看出来,这是一个用数组实现的List
,或者说是可变数组,数据就是存储在elementData
这个对象数组里。除了elementData
我们还需要关注一个重要的成员变量modCount
,modCount
成员变量是继承自父类AbstractList
。modCount
表示这个List
被结构化修改的次数,结构化修改就是那些会改变List
的大小的操作。modCount
主要被用在迭代器上,如果一个List
在迭代的过程中发生了结构化修改,就会导致结果出错。在List
迭代过程中,如果因为其它线程对List
的操作,导致结构发生变化,那么迭代器就抛出ConcurrentModificationException
,这就是迭代器的fail-fast
机制。
添加元素
/**
* Appends the specified element to the end of this list.
*/
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
/**
* Inserts the specified element at the specified position in this
* list. Shifts the element currently at that position (if any) and
* any subsequent elements to the right (adds one to their indices).
*/
public void add(int index, E element) {
rangeCheckForAdd(index);
ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
elementData[index] = element;
size++;
}
/**
* Appends all of the elements in the specified collection to the end of
* this list, in the order that they are returned by the
* specified collection's Iterator. The behavior of this operation is
* undefined if the specified collection is modified while the operation
* is in progress. (This implies that the behavior of this call is
* undefined if the specified collection is this list, and this
* list is nonempty.)
*/
public boolean addAll(Collection<? extends E> c) {
Object[] a = c.toArray();
int numNew = a.length;
ensureCapacityInternal(size + numNew); // Increments modCount
System.arraycopy(a, 0, elementData, size, numNew);
size += numNew;
return numNew != 0;
}
/**
* Inserts all of the elements in the specified collection into this
* list, starting at the specified position. Shifts the element
* currently at that position (if any) and any subsequent elements to
* the right (increases their indices). The new elements will appear
* in the list in the order that they are returned by the
* specified collection's iterator.
*/
public boolean addAll(int index, Collection<? extends E> c) {
rangeCheckForAdd(index);
Object[] a = c.toArray();
int numNew = a.length;
ensureCapacityInternal(size + numNew); // Increments modCount
int numMoved = size - index;
if (numMoved > 0)
System.arraycopy(elementData, index, elementData, index + numNew,
numMoved);
System.arraycopy(a, 0, elementData, index, numNew);
size += numNew;
return numNew != 0;
}
private void rangeCheck(int index) {
if (index >= size)
throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}
private void rangeCheckForAdd(int index) {
if (index > size || index < 0)
throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}
有多个方法来给ArrayList
添加元素,add(E e)
是添加到数组末尾,add(int index, E element)
是添加到指定位置,addAll(Collection<? extends E> c)
批量添加元素到数组末尾,addAll(int index, Collection<? extends E> c)
批量添加元素到指定位置。
本质上这几个方法都是相同的,首先通过rangeCheck
或者rangeCheckForAdd
方法判断index
是否合法。然后通过ensureCapacityInternal
方法来确保数组的容量足够,该方法会先判断当前数组容量是否足够,如果不够就进行扩容,待会会进行介绍。不过需要注意的是,添加元素是会造成ArrayList
结构化改变的,所以modCount
的值要增加。而源码中将modCount
自增操作放在了ensureCapacityInternal
方法里,感觉有点怪怪的,从方法的命名中可以看出这个方法是用来确保数组容量的,但是却在这个方法里修改了与方法容量无关的成员变量,所以我觉得设计得不是很合理。写代码的人也觉得自己这样搞不是很合理,所以才通过注释来说明。
ensureCapacityInternal(size + 1); // Increments modCount!!
接着刚才的话题,当确保数组的容量足够之后,再通过静态方法System.arraycopy()
将元素拷贝到合适的位置,对原数组进行重新排序就可以了。当然,添加到末尾就不用考虑到数组重排序的问题了,直接将待添加元素放到末尾就可以了。最后修改size
到相应的数值,添加元素的操作就完成了。
扩容
ArrayList
是基于可变数组的,当底层数组容量不足时会进行扩容,以改变数组的容量。代码如下:
private void ensureCapacityInternal(int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
}
ensureExplicitCapacity(minCapacity);
}
private void ensureExplicitCapacity(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
/**
* Increases the capacity to ensure that it can hold at least the
* number of elements specified by the minimum capacity argument.
*/
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}
private static int hugeCapacity(int minCapacity) {
if (minCapacity < 0) // overflow
throw new OutOfMemoryError();
return (minCapacity > MAX_ARRAY_SIZE) ?
Integer.MAX_VALUE :
MAX_ARRAY_SIZE;
}
前面那些ensure
开头的方法是用来检测当前数组容量是否足够容纳minCapacity
的,如果容量不足才会进行扩容,即调用grow(int capacity)
方法,我们直接来看grow()
方法。
grow()
方法首先将数组容量扩张为原来的1.5倍,即int newCapacity = oldCapacity + (oldCapacity >> 1)
这条语句。然后再判断新容量是否满足最小所需容量minCapacity
,如果还是不能满足,就将newCapacity
设置为minCapacity
。接下来要判断newCapacity
是否超过了最大允许的数组大小MAX_ARRAY_SIZE
,如果超过了就调整为最大的int
值。最后就是将原数组的值拷贝到新的数组上。
移除元素
/**
* Removes the element at the specified position in this list.
* Shifts any subsequent elements to the left (subtracts one from their
* indices).
*/
public E remove(int index) {
rangeCheck(index);
modCount++;
E oldValue = elementData(index);
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // clear to let GC do its work
return oldValue;
}
/**
* Removes the first occurrence of the specified element from this list,
* if it is present. If the list does not contain the element, it is
* unchanged. More formally, removes the element with the lowest index
*/
public boolean remove(Object o) {
if (o == null) {
for (int index = 0; index < size; index++)
if (elementData[index] == null) {
fastRemove(index);
return true;
}
} else {
for (int index = 0; index < size; index++)
if (o.equals(elementData[index])) {
fastRemove(index);
return true;
}
}
return false;
}
/*
* Private remove method that skips bounds checking and does not
* return the value removed.
*/
private void fastRemove(int index) {
modCount++;
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // clear to let GC do its work
}
/**
* Removes all of the elements from this list. The list will
* be empty after this call returns.
*/
public void clear() {
modCount++;
// clear to let GC do its work
for (int i = 0; i < size; i++)
elementData[i] = null;
size = 0;
}
/**
* Removes from this list all of the elements whose index is between
* {@code fromIndex}, inclusive, and {@code toIndex}, exclusive.
* Shifts any succeeding elements to the left (reduces their index).
* This call shortens the list by {@code (toIndex - fromIndex)} elements.
* (If {@code toIndex==fromIndex}, this operation has no effect.)
*/
protected void removeRange(int fromIndex, int toIndex) {
modCount++;
int numMoved = size - toIndex;
System.arraycopy(elementData, toIndex, elementData, fromIndex,
numMoved);
// clear to let GC do its work
int newSize = size - (toIndex-fromIndex);
for (int i = newSize; i < size; i++) {
elementData[i] = null;
}
size = newSize;
}
其实移除元素的原理很简单,就是通过System.arraycopy
方法将需要保留的元素复制到正确的位置上,然后调整size
的大小。最后为了防止内存泄露,需要显式将不再使用的位置中存放的元素置为null
。虽然原理简单,但是需要注意的细节很多,大多是索引值方面的小细节。
接下来看一下批量删除或者保留元素的方法。
/**
* Removes from this list all of its elements that are contained in the
* specified collection.
*/
public boolean removeAll(Collection<?> c) {
Objects.requireNonNull(c);
return batchRemove(c, false);
}
/**
* Retains only the elements in this list that are contained in the
* specified collection. In other words, removes from this list all
* of its elements that are not contained in the specified collection.
*/
public boolean retainAll(Collection<?> c) {
Objects.requireNonNull(c);
return batchRemove(c, true);
}
private boolean batchRemove(Collection<?> c, boolean complement) {
final Object[] elementData = this.elementData;
int r = 0, w = 0;
boolean modified = false;
try {
for (; r < size; r++)
//1) 移除c中元素,complement == false
// 若elementData[r]不在c中,则保留
//2)保留c中元素,complement == true
// 若elementData[r]在c中,则保留
if (c.contains(elementData[r]) == complement)
elementData[w++] = elementData[r];
} finally {
// Preserve behavioral compatibility with AbstractCollection,
// even if c.contains() throws.
// 1)r == size, 则操作成功了
// 2)r != size, c.contains抛出了异常,
// 可能是因为元素和c中元素类型不兼容,或者c不支持null元素
// 则将后面尚未检查的元素向前复制
if (r != size) {
System.arraycopy(elementData, r,
elementData, w,
size - r);
w += size - r;
}
if (w != size) {
// clear to let GC do its work
for (int i = w; i < size; i++)
elementData[i] = null;
modCount += size - w;
size = w;
modified = true;
}
}
return modified;
}
其中,无论是批量移除removeAll()
方法还是批量保留retainAll()
方法,都是使用了batchRemove
方法,我们直接来看这个方法。
先来说一下原理,首先通过便利整个数组,找出需要保留的元素,从索引0开始依次保存到elementData
数组中。如果便利过程没有异常出现(也就是r==size
),则显式将不再使用的位置中存放的元素置为null
,让GC回收。当然如果便利过程出现异常(r!=size),则要将未被便利的值拷贝到w
索引及之后的位置。暂时不清楚对异常的处理是否合理。
查找与更新
public boolean contains(Object o) {
return indexOf(o) >= 0;
}
/**
* Returns the index of the first occurrence of the specified element
* in this list, or -1 if this list does not contain the element.
* More formally, returns the lowest index <tt>i</tt> such that
* <tt>(o==null ? get(i)==null : o.equals(get(i)))</tt>,
* or -1 if there is no such index.
*/
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
/**
* Returns the index of the last occurrence of the specified element
* in this list, or -1 if this list does not contain the element.
* More formally, returns the highest index <tt>i</tt> such that
* <tt>(o==null ? get(i)==null : o.equals(get(i)))</tt>,
* or -1 if there is no such index.
*/
public int lastIndexOf(Object o) {
if (o == null) {
for (int i = size-1; i >= 0; i--)
if (elementData[i]==null)
return i;
} else {
for (int i = size-1; i >= 0; i--)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
/**
* Returns the element at the specified position in this list.
*/
public E get(int index) {
rangeCheck(index);
return elementData(index);
}
/**
* Replaces the element at the specified position in this list with
* the specified element.
*/
public E set(int index, E element) {
rangeCheck(index);
E oldValue = elementData(index);
elementData[index] = element;
return oldValue;
}
因为是基于数组实现的,所以查找元素和更新元素比较简单。这几个方法都没有改变List
的结构,所以不会修改modCount
的值。
迭代
列表的迭代也是开发中经常使用到了,特别是使用for each
语句进行迭代。因为Collection
接口继承了Iterable
接口,ArrayList
间接实现了Collection
,所以需要实现Iterable
接口的iterator()
方法,下面我们来看一下。
public Iterator<E> iterator() {
return new Itr();
}
/**
* An optimized version of AbstractList.Itr
*/
private class Itr implements Iterator<E> {
int cursor; // index of next element to return
int lastRet = -1; // index of last element returned; -1 if no such
int expectedModCount = modCount;
public boolean hasNext() {
return cursor != size;
}
@SuppressWarnings("unchecked")
public E next() {
checkForComodification();
int i = cursor;
if (i >= size)
throw new NoSuchElementException();
Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i + 1;
return (E) elementData[lastRet = i];
}
public void remove() {
if (lastRet < 0)
throw new IllegalStateException();
checkForComodification();
try {
ArrayList.this.remove(lastRet);
cursor = lastRet;
lastRet = -1;
expectedModCount = modCount;
} catch (IndexOutOfBoundsException ex) {
throw new ConcurrentModificationException();
}
}
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
}
迭代器中通过cursor
来标注下一个待返回元素的索引值,还有一个lastRet
来标注上一个被返回元素的索引值。ArrayList
的实现不是线程安全的,其fail-fast
机制的实现是通过modCount
变量来实现的。在next
和remove
里都有checkForComodification()
的方法,在该方法中,会比较Iterator
创建时的modCount(expectedModCount)
和当前的modCount
的值是否相等。不过不相,证明在迭代器创建之后ArrayList
的结构有被修改过,此时抛出ConcurrentModificationException
异常。
需要注意的一点在于,remove()
方法调用时,会判断lastRet < 0
,如果小于0
,就会抛出异常。出现lastRet<0
只有两种情况,一种是刚创建迭代器,还未调用next()
方法的时候,一种是调用过一次remove()
方法后会把lastRet
设置为-1
。所以连续两次调用remove()
方法是会抛出异常的。
List
接口还支持另一种迭代器ListIterator
,它不仅可以使用next()
向前迭代,还可以使用previous()
向后迭代;不仅可以使用remove()
在迭代中移除元素,还可以使用add()
方法在迭代中添加元素。
小结
ArrayList
内部使用数组实现,具有高效的随机访问的特性。但是插入和删除元素时往往需要复制数组,开销较大。在容器创建之后需要进行大量访问,但插入和删除操作使用较少的情况下比较适合使用ArrayList。