ArrayList源码分析

2017-01-23  本文已影响16人  Leocat

ArrayList

原文见:Java 容器源码分析之 ArrayList

概述

ArrayList是使用频率最高的集合之一了,在需要使用List的情况下,往往都是优先考虑ArrayList。首先我们来看一下声明:

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable

ArrayList实现的几个接口中,RandomAccess、Cloneable、Serializable都是标记接口,所以ArrayList是很纯粹的List接口的实现,不像它兄弟LinkedList还实现了Deque接口,还要作为双向队列使用。

结构

transient Object[] elementData;

// 这个继承自父类AbstractList
protected transient int modCount = 0;

ArrayList的名称中我们就可以看出来,这是一个用数组实现的List,或者说是可变数组,数据就是存储在elementData这个对象数组里。除了elementData我们还需要关注一个重要的成员变量modCountmodCount成员变量是继承自父类AbstractListmodCount表示这个List被结构化修改的次数,结构化修改就是那些会改变List的大小的操作。modCount主要被用在迭代器上,如果一个List在迭代的过程中发生了结构化修改,就会导致结果出错。在List迭代过程中,如果因为其它线程对List的操作,导致结构发生变化,那么迭代器就抛出ConcurrentModificationException,这就是迭代器的fail-fast机制。

添加元素

/**
 * Appends the specified element to the end of this list.
 */
public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

/**
 * Inserts the specified element at the specified position in this
 * list. Shifts the element currently at that position (if any) and
 * any subsequent elements to the right (adds one to their indices).
 */
public void add(int index, E element) {
    rangeCheckForAdd(index);

    ensureCapacityInternal(size + 1);  // Increments modCount!!
    System.arraycopy(elementData, index, elementData, index + 1,
                     size - index);
    elementData[index] = element;
    size++;
}

/**
 * Appends all of the elements in the specified collection to the end of
 * this list, in the order that they are returned by the
 * specified collection's Iterator.  The behavior of this operation is
 * undefined if the specified collection is modified while the operation
 * is in progress.  (This implies that the behavior of this call is
 * undefined if the specified collection is this list, and this
 * list is nonempty.)
 */
public boolean addAll(Collection<? extends E> c) {
    Object[] a = c.toArray();
    int numNew = a.length;
    ensureCapacityInternal(size + numNew);  // Increments modCount
    System.arraycopy(a, 0, elementData, size, numNew);
    size += numNew;
    return numNew != 0;
}

/**
 * Inserts all of the elements in the specified collection into this
 * list, starting at the specified position.  Shifts the element
 * currently at that position (if any) and any subsequent elements to
 * the right (increases their indices).  The new elements will appear
 * in the list in the order that they are returned by the
 * specified collection's iterator.
 */
public boolean addAll(int index, Collection<? extends E> c) {
    rangeCheckForAdd(index);

    Object[] a = c.toArray();
    int numNew = a.length;
    ensureCapacityInternal(size + numNew);  // Increments modCount

    int numMoved = size - index;
    if (numMoved > 0)
        System.arraycopy(elementData, index, elementData, index + numNew,
                         numMoved);

    System.arraycopy(a, 0, elementData, index, numNew);
    size += numNew;
    return numNew != 0;
}

private void rangeCheck(int index) {
    if (index >= size)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}

private void rangeCheckForAdd(int index) {
    if (index > size || index < 0)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}

有多个方法来给ArrayList添加元素,add(E e)是添加到数组末尾,add(int index, E element)是添加到指定位置,addAll(Collection<? extends E> c)批量添加元素到数组末尾,addAll(int index, Collection<? extends E> c)批量添加元素到指定位置。

本质上这几个方法都是相同的,首先通过rangeCheck或者rangeCheckForAdd方法判断index是否合法。然后通过ensureCapacityInternal方法来确保数组的容量足够,该方法会先判断当前数组容量是否足够,如果不够就进行扩容,待会会进行介绍。不过需要注意的是,添加元素是会造成ArrayList结构化改变的,所以modCount的值要增加。而源码中将modCount自增操作放在了ensureCapacityInternal方法里,感觉有点怪怪的,从方法的命名中可以看出这个方法是用来确保数组容量的,但是却在这个方法里修改了与方法容量无关的成员变量,所以我觉得设计得不是很合理。写代码的人也觉得自己这样搞不是很合理,所以才通过注释来说明。

ensureCapacityInternal(size + 1); // Increments modCount!!

接着刚才的话题,当确保数组的容量足够之后,再通过静态方法System.arraycopy()将元素拷贝到合适的位置,对原数组进行重新排序就可以了。当然,添加到末尾就不用考虑到数组重排序的问题了,直接将待添加元素放到末尾就可以了。最后修改size到相应的数值,添加元素的操作就完成了。

扩容

ArrayList是基于可变数组的,当底层数组容量不足时会进行扩容,以改变数组的容量。代码如下:

private void ensureCapacityInternal(int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }

    ensureExplicitCapacity(minCapacity);
}

private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

/**
 * Increases the capacity to ensure that it can hold at least the
 * number of elements specified by the minimum capacity argument.
 */
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

private static int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError();
    return (minCapacity > MAX_ARRAY_SIZE) ?
        Integer.MAX_VALUE :
        MAX_ARRAY_SIZE;
}

前面那些ensure开头的方法是用来检测当前数组容量是否足够容纳minCapacity的,如果容量不足才会进行扩容,即调用grow(int capacity)方法,我们直接来看grow()方法。

grow()方法首先将数组容量扩张为原来的1.5倍,即int newCapacity = oldCapacity + (oldCapacity >> 1)这条语句。然后再判断新容量是否满足最小所需容量minCapacity,如果还是不能满足,就将newCapacity设置为minCapacity。接下来要判断newCapacity是否超过了最大允许的数组大小MAX_ARRAY_SIZE,如果超过了就调整为最大的int值。最后就是将原数组的值拷贝到新的数组上。

移除元素

/**
 * Removes the element at the specified position in this list.
 * Shifts any subsequent elements to the left (subtracts one from their
 * indices).
 */
public E remove(int index) {
    rangeCheck(index);

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}

/**
 * Removes the first occurrence of the specified element from this list,
 * if it is present.  If the list does not contain the element, it is
 * unchanged.  More formally, removes the element with the lowest index
 */
public boolean remove(Object o) {
    if (o == null) {
        for (int index = 0; index < size; index++)
            if (elementData[index] == null) {
                fastRemove(index);
                return true;
            }
    } else {
        for (int index = 0; index < size; index++)
            if (o.equals(elementData[index])) {
                fastRemove(index);
                return true;
            }
    }
    return false;
}

/*
 * Private remove method that skips bounds checking and does not
 * return the value removed.
 */
private void fastRemove(int index) {
    modCount++;
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work
}

/**
 * Removes all of the elements from this list.  The list will
 * be empty after this call returns.
 */
public void clear() {
    modCount++;

    // clear to let GC do its work
    for (int i = 0; i < size; i++)
        elementData[i] = null;

    size = 0;
}

/**
 * Removes from this list all of the elements whose index is between
 * {@code fromIndex}, inclusive, and {@code toIndex}, exclusive.
 * Shifts any succeeding elements to the left (reduces their index).
 * This call shortens the list by {@code (toIndex - fromIndex)} elements.
 * (If {@code toIndex==fromIndex}, this operation has no effect.)
 */
protected void removeRange(int fromIndex, int toIndex) {
    modCount++;
    int numMoved = size - toIndex;
    System.arraycopy(elementData, toIndex, elementData, fromIndex,
                     numMoved);

    // clear to let GC do its work
    int newSize = size - (toIndex-fromIndex);
    for (int i = newSize; i < size; i++) {
        elementData[i] = null;
    }
    size = newSize;
}

其实移除元素的原理很简单,就是通过System.arraycopy方法将需要保留的元素复制到正确的位置上,然后调整size的大小。最后为了防止内存泄露,需要显式将不再使用的位置中存放的元素置为null。虽然原理简单,但是需要注意的细节很多,大多是索引值方面的小细节。

接下来看一下批量删除或者保留元素的方法。

/**
 * Removes from this list all of its elements that are contained in the
 * specified collection.
 */
public boolean removeAll(Collection<?> c) {
    Objects.requireNonNull(c);
    return batchRemove(c, false);
}

/**
 * Retains only the elements in this list that are contained in the
 * specified collection.  In other words, removes from this list all
 * of its elements that are not contained in the specified collection.
 */
public boolean retainAll(Collection<?> c) {
    Objects.requireNonNull(c);
    return batchRemove(c, true);
}

private boolean batchRemove(Collection<?> c, boolean complement) {
    final Object[] elementData = this.elementData;
    int r = 0, w = 0;
    boolean modified = false;
    try {
        for (; r < size; r++)
            //1) 移除c中元素,complement == false
            //   若elementData[r]不在c中,则保留
            //2)保留c中元素,complement == true
            //   若elementData[r]在c中,则保留
            if (c.contains(elementData[r]) == complement)
                elementData[w++] = elementData[r];
    } finally {
        // Preserve behavioral compatibility with AbstractCollection,
        // even if c.contains() throws.
        // 1)r == size, 则操作成功了
        // 2)r != size, c.contains抛出了异常,
        //      可能是因为元素和c中元素类型不兼容,或者c不支持null元素
        //      则将后面尚未检查的元素向前复制
        if (r != size) {
            System.arraycopy(elementData, r,
                             elementData, w,
                             size - r);
            w += size - r;
        }
        if (w != size) {
            // clear to let GC do its work
            for (int i = w; i < size; i++)
                elementData[i] = null;
            modCount += size - w;
            size = w;
            modified = true;
        }
    }
    return modified;
}

其中,无论是批量移除removeAll()方法还是批量保留retainAll()方法,都是使用了batchRemove方法,我们直接来看这个方法。

先来说一下原理,首先通过便利整个数组,找出需要保留的元素,从索引0开始依次保存到elementData数组中。如果便利过程没有异常出现(也就是r==size),则显式将不再使用的位置中存放的元素置为null,让GC回收。当然如果便利过程出现异常(r!=size),则要将未被便利的值拷贝到w索引及之后的位置。暂时不清楚对异常的处理是否合理。

查找与更新

public boolean contains(Object o) {
    return indexOf(o) >= 0;
}

/**
 * Returns the index of the first occurrence of the specified element
 * in this list, or -1 if this list does not contain the element.
 * More formally, returns the lowest index <tt>i</tt> such that
 * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>,
 * or -1 if there is no such index.
 */
public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

/**
 * Returns the index of the last occurrence of the specified element
 * in this list, or -1 if this list does not contain the element.
 * More formally, returns the highest index <tt>i</tt> such that
 * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>,
 * or -1 if there is no such index.
 */
public int lastIndexOf(Object o) {
    if (o == null) {
        for (int i = size-1; i >= 0; i--)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = size-1; i >= 0; i--)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

/**
 * Returns the element at the specified position in this list.
 */
public E get(int index) {
    rangeCheck(index);

    return elementData(index);
}

/**
 * Replaces the element at the specified position in this list with
 * the specified element.
 */
public E set(int index, E element) {
    rangeCheck(index);

    E oldValue = elementData(index);
    elementData[index] = element;
    return oldValue;
}

因为是基于数组实现的,所以查找元素和更新元素比较简单。这几个方法都没有改变List的结构,所以不会修改modCount的值。

迭代

列表的迭代也是开发中经常使用到了,特别是使用for each语句进行迭代。因为Collection接口继承了Iterable接口,ArrayList间接实现了Collection,所以需要实现Iterable接口的iterator()方法,下面我们来看一下。

public Iterator<E> iterator() {
    return new Itr();
}
/**
 * An optimized version of AbstractList.Itr
 */
private class Itr implements Iterator<E> {
    int cursor;       // index of next element to return
    int lastRet = -1; // index of last element returned; -1 if no such
    int expectedModCount = modCount;

    public boolean hasNext() {
        return cursor != size;
    }

    @SuppressWarnings("unchecked")
    public E next() {
        checkForComodification();
        int i = cursor;
        if (i >= size)
            throw new NoSuchElementException();
        Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length)
            throw new ConcurrentModificationException();
        cursor = i + 1;
        return (E) elementData[lastRet = i];
    }

    public void remove() {
        if (lastRet < 0)
            throw new IllegalStateException();
        checkForComodification();

        try {
            ArrayList.this.remove(lastRet);
            cursor = lastRet;
            lastRet = -1;
            expectedModCount = modCount;
        } catch (IndexOutOfBoundsException ex) {
            throw new ConcurrentModificationException();
        }
    }

    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
}

迭代器中通过cursor来标注下一个待返回元素的索引值,还有一个lastRet来标注上一个被返回元素的索引值。ArrayList的实现不是线程安全的,其fail-fast机制的实现是通过modCount变量来实现的。在nextremove里都有checkForComodification()的方法,在该方法中,会比较Iterator创建时的modCount(expectedModCount)和当前的modCount的值是否相等。不过不相,证明在迭代器创建之后ArrayList的结构有被修改过,此时抛出ConcurrentModificationException异常。

需要注意的一点在于,remove()方法调用时,会判断lastRet < 0,如果小于0,就会抛出异常。出现lastRet<0只有两种情况,一种是刚创建迭代器,还未调用next()方法的时候,一种是调用过一次remove()方法后会把lastRet设置为-1。所以连续两次调用remove()方法是会抛出异常的。

List接口还支持另一种迭代器ListIterator,它不仅可以使用next()向前迭代,还可以使用previous()向后迭代;不仅可以使用remove()在迭代中移除元素,还可以使用add()方法在迭代中添加元素。

小结

ArrayList内部使用数组实现,具有高效的随机访问的特性。但是插入和删除元素时往往需要复制数组,开销较大。在容器创建之后需要进行大量访问,但插入和删除操作使用较少的情况下比较适合使用ArrayList。

上一篇下一篇

猜你喜欢

热点阅读