集合框架——List接口及其实现类源码解析（JDK8）

2018-07-21 本文已影响23人黄金矿工00七

不多说，直接上源码

List接口:

JDK文档中是这样描述的

/**
 * An ordered collection (also known as a <i>sequence</i>).  The user of this
 * interface has precise control over where in the list each element is
 * inserted.  The user can access elements by their integer index (position in
 * the list), and search for elements in the list.<p>
 * Unlike sets, lists typically allow duplicate elements.  More formally,
 * lists typically allow pairs of elements <tt>e1</tt> and <tt>e2</tt>
 * such that <tt>e1.equals(e2)</tt>, and they typically allow multiple
 * null elements if they allow null elements at all.  It is not inconceivable
 * that someone might wish to implement a list that prohibits duplicates, by
 * throwing runtime exceptions when the user attempts to insert them, but we
 * expect this usage to be rare.<p>
Note: While it is permissible for lists to contain themselves as elements,
 * extreme caution is advised: the <tt>equals</tt> and <tt>hashCode</tt>
 * methods are no longer well defined on such a list.
 */

我提取几点比较重要的：
- List是有序的
- List可以通过index精确的控制elements
- List是允许重复元素以及null（但是不同的实现可能有变化）
- List允许将他自己作为element，但是hashcode和equals不在有用（不要使用）
  关于接口中的方法不在详细说，在具体实现中在谈论，List接口中提供了一个特殊的迭代器ListIterator，提供额外的元素替换、添加、双向访问的方法，以及提供了一个用于获取从指定位置开始迭代的迭代器的方法。

ArrayList

吐槽简书，这段注释不识别

  /**
   * Resizable-array implementation of the <tt>List</tt> interface.  Implements
   * all optional list operations, and permits all elements, including
   * <tt>null</tt>.  In addition to implementing the <tt>List</tt> interface,
   * this class provides methods to manipulate the size of the array that is
   * used internally to store the list.  (This class is roughly equivalent to
   * <tt>Vector</tt>, except that it is unsynchronized.)
   */

总结一下，文档中描述的大概就是下面这些东西：

首先，ArrayList 是一个可变的集合类，它实现了List接口的所有方法，允许存在null（可以全是），提供了操作内置数组大小的方法（大概相当于vector，不过是非线程安全）
size、isEmpty、get、set、iterator、listIterator的操作时间是constant time，意思就是说O（1），时间粗略的来说是一个常数，比如add n个元素需要O（n），那add 1个就需要O（1），是一个线性的关系，而且常数因子对比LinkedList来说要低。
ArrayList 是非线程安全的，你要用的话就用Collections.synchronizedList来包装一下它，最好是new的时候，就像这样 List list = Collections.synchronizedList(new ArrayList(...));
迭代器（Iterator和ListIterator是fail-fast机制的）：在用迭代器遍历一个集合对象时，如果遍历过程中对集合对象的内容进行了修改（增加、删除、修改），则会抛出Concurrent Modification Exception。整个集合框架中的集合类都是这样的，原理下面再说。但是不要认为他可以保证并发时的安全
ArrayList的基类和实现的接口
- AbstractList：这玩意就不说了，以最小化程度实现了List接口，是实现array以及linked list应该优先使用的类，然后做了一些对不可修改集合和可修改集合的规定。
- RandomAccess：RandomAccess 是一个标记接口，用于标明实现该接口的List支持快速随机访问，主要目的是使算法能够在随机和顺序访问的list中表现的更加高效。上一篇文章中，说过Collection中获取不可修改的List和线程安全的List等等之类会判断是否实现了这个接口，然后采用相应的算法实现。
- Cloneable：Java 中一个类要实现clone功能必须实现 Cloneable接口，否则在调用 clone() 时会报 CloneNotSupportedException 异常。ArrayList#clone实现的是浅拷贝，这里多说一点
  **数组拷贝System.arraycopy>Arrays.copyOf>clone>for（自己实现）都是浅拷贝这是拷贝的效率，Arrays.copyOf内部使用的是System.arraycopy实现，ArrayList中clone中使用的是Object.clone和Arrays.copyOf（进行元素填充），网上很多错误的说法说是需要转换类型。。。，这里贴一下源码给你们看。

    public Object clone() {
        try {
            ArrayList<?> v = (ArrayList<?>) super.clone();
            v.elementData = Arrays.copyOf(elementData, size);
            v.modCount = 0;
            return v;
        } catch (CloneNotSupportedException e) {
            // this shouldn't happen, since we are Cloneable
            throw new InternalError(e);
        }
    }

ArrayList内部数据结构

    /**默认容量大小
     * Default initial capacity.
     */
    private static final int DEFAULT_CAPACITY = 10;

    /**用于空实例的共享空数组实例（当设置容量为0时或者使用collection来构造时使用的数组实例）
     * Shared empty array instance used for empty instances.
     */
    private static final Object[] EMPTY_ELEMENTDATA = {};

    /**默认使用的数组实例
     * Shared empty array instance used for default sized empty instances. We
     * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
     * first element is added.
     */
    private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

    /**存储ArrayList元素的数组缓冲区。ArrayList的容量等于这个数组缓冲区的长度。
     * The array buffer into which the elements of the ArrayList are stored.
     * The capacity of the ArrayList is the length of this array buffer. Any
     * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
     * will be expanded to DEFAULT_CAPACITY when the first element is added.
     */
    transient Object[] elementData; // non-private to simplify nested class access

    /**数组元素的数量
     * The size of the ArrayList (the number of elements it contains).
     *
     * @serial
     */
    private int size;

总结一下：数组默认容量为10，有两个空数组的实例，一个数组缓冲区的引用，会指向两个空数组实例之一，缓冲区的大小等于ArrayList的容量，以及数组元素的数量

ArrayList数组扩容机制
我们以默认创建的ArrayList为例,

List <String> list = new ArrayList <> ();
list.add("123");

图片.png

在list 初始化之后，数组缓冲区的长度为0，add一个元素之后，数组缓冲区的长度等于DEFAULT_CAPACITY =10，我们再来看一下，继续添加元素之后的扩容机制。

    private void ensureCapacityInternal(int minCapacity) {
          //如果是默认创建的list，则最小容量等于DEFAULT_CAPACITY, minCapacity之间的最大值
         // 也就是说在数组元素的数量小于10的时候，最小容量=10，大于等于10的时候，最小容量为当前size+1
        if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
            minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
        }

        ensureExplicitCapacity(minCapacity);
    }


    private void ensureExplicitCapacity(int minCapacity) {
        //修改次数+1，用来实现迭代器是fail-fast机制(是在AbstractList中实现的。)，所有对集合的修改都会增加
        //在迭代器初始化过程中会将这个值赋给迭代器的 expectedModCount。
        //在迭代过程中，判断 modCount 跟 expectedModCount 是否相等，如果不相等就表示已经有其他线程修改了集合对象
        modCount++;

        // overflow-conscious code 考虑溢出的情况
        //上面代码中minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);实际上
        //minCapacity有可能超过Integer.MAX_VALUE，发生了溢出  那么minCapacity 就会等于DEFAULT_CAPACITY，所以就不会继续增加数组的容量，已经是最大值了
        if (minCapacity - elementData.length > 0)
            grow(minCapacity);
    }


  private void grow(int minCapacity) {
        // overflow-conscious code 考虑溢出
        //旧容量为当前数组的长度，第一次添加为0
        int oldCapacity = elementData.length;
        //新的数组容量为原来的1.5倍 >>等于/2
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
        //如果小于最小容量则新的容量为最小容量
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
        //如果新的数组容量超过Integer.MAX_VALUE - 8
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

  private static int hugeCapacity(int minCapacity) {
        //对溢出进行检测 ，结合下面叙述来看
        if (minCapacity < 0) // overflow
            throw new OutOfMemoryError();
        return (minCapacity > MAX_ARRAY_SIZE) ?
            Integer.MAX_VALUE :
            MAX_ARRAY_SIZE;
    }

上面的代码 if (newCapacity - minCapacity < 0)以及 if (newCapacity - MAX_ARRAY_SIZE > 0) 这里很关键，为啥不使用newCapacity < minCapacity 和newCapacity >MAX_ARRAY_SIZE来判断 ,我们知道newCapacity 是有可能超过int的正数表示范围的，那么就会溢出，成为一个负数，而minCapacity是一个正数，负数<正数是恒成立true，而newCapacity - minCapacity，负数减正数是有可能是一个正数，下面同理newCapacity 在MAX_VALUE -1到MAX_VALUE 之间或者newCapacity <-10 (如果溢出肯定小于)，这个条件就会成立，如果使用newCapacity >MAX_ARRAY_SIZE，则newCapacity 溢出的时候会恒不成立

手动扩容：
上面的代码中我们可以看到，数组以10为基础，使用Arrays.copyOf(elementData, newCapacity)进行1.5倍的扩容，每次满了就扩容，
　向数组中添加第一个元素时，数组容量为10.

　　向数组中添加到第10个元素时，数组容量仍为10.

　　向数组中添加到第11个元素时，数组容量扩为15.

　　向数组中添加到第16个元素时，数组容量扩为22.
这样带来了很大的性能问题，所以有时候我们需要对数组进行手动扩容，避免数组大量的自动扩容
如果参数大于底层数组长度的1.5倍，那么这个数组的容量就会被扩容到这个参数值，
如果参数小于底层数组长度的1.5倍，那么这个容量就会被扩容到底层数组长度的1.5倍。

    /**
     * Increases the capacity of this <tt>ArrayList</tt> instance, if
     * necessary, to ensure that it can hold at least the number of elements
     * specified by the minimum capacity argument.
     *扩展数组容量，尽量使用最小容量
     * @param   minCapacity   the desired minimum capacity
     */
  public void ensureCapacity(int minCapacity) {
          //判断list是否是默认方式创建，如果是则最小扩展量为DEFAULT_CAPACITY，如果不是则等于0
        int minExpand = (elementData != DEFAULTCAPACITY_EMPTY_ELEMENTDATA)
            // any size if not default element table
            ? 0
            // larger than default for default empty table. It's already
            // supposed to be at default size.
            : DEFAULT_CAPACITY;
        //如果参数大于底层数组长度的1.5倍，那么这个数组的容量就会被扩容到这个参数值，
        //如果参数小于底层数组长度的1.5倍，那么这个容量就会被扩容到底层数组长度的1.5倍。
        if (minCapacity > minExpand) {
            ensureExplicitCapacity(minCapacity);
        }
    }

剩下的add（index）、addAll（index）、remove方法都是基于System.arraycopy（）来实现的不在多说。
然后我们发现还有一个subList方法，这个方法是做什么的呢？

List<E> subList(int fromIndex, int toIndex);
- 该方法返回的是父list的一个视图，从fromIndex（包含），到toIndex（不包含）。fromIndex=toIndex 表示子list为空
- 父子list做的非结构性修改（non-structural changes）都会影响到彼此：所谓的“非结构性修改”，是指不涉及到list的大小改变的修改。相反，结构性修改，指改变了list大小的修改。（比如add、remove）
- 对于结构性修改，子list的所有操作都会反映到父list上。但父list的修改将会导致返回的子list失效（即修改原list,则sublist的所有操作会报错）。
- tips：如何删除list中的某段数据：
  我们可以用这个方法来实现对指定范围内数据的清除，
  比如说list中有1/2/3/4/5五个元素
  我现在想删除34怎么做呢

 public static void main(String[] args) {

    LinkedList<String> ll = new LinkedList<>();
    ll.add("1");
    ll.add("2");
    ll.add("3");
    ll.add("4");
    ll.add("5");
    List<String> l2 =  ll.subList(2, 4);
    l2.clear();
    System.out.println(ll);

  }

图片.png

List遍历
List中有四种遍历方式：
- for ：基于计数器，遍历者自己在集合外部维护一个计数器，然后依次读取每一个位置的元素，当读取到最后一个元素后，停止。主要就是需要按元素的位置来读取元素。
- foreach : 根据反编译的字节码可以发现，foreach内部也是采用了Iterator的方式实现，只不过Java编译器帮我们生成了这些代码。
- Iterator ：每一个具体实现的数据集合，一般都需要提供相应的Iterator。相比于传统for循环，Iterator取缔了显式的遍历计数器。所以基于顺序存储集合的Iterator可以直接按位置访问数据。而基于链式存储集合的Iterator，正常的实现，都是需要保存当前遍历的位置。然后根据当前位置来向前或者向后移动指针。
- ListIterator ：Iterator实现了集合框架的Iterator接口可以应用于所有的集合，Set、List和Map和这些集合的子类型。而ListIterator在实现了Iterator的基础上实现了ListIterator接口，提供了额外的方法，只能用于List及其子类型。
  - ListIterator有add方法，可以向List中添加对象，而Iterator不能。
  - ListIterator和Iterator都有hasNext()和next()方法，可以实现顺序向后遍历，但是ListIterator有hasPrevious()和previous()方法，可以实现逆向（顺序向前）遍历。Iterator不可以。
  - ListIterator可以定位当前索引的位置，nextIndex()和previousIndex()可以实现。Iterator没有此功能。
  - 都可实现删除操作，但是ListIterator可以实现对象的修改，set()方法可以实现。Iterator仅能遍历，不能修改

public class Test {

  public static void main(String[] args) {
    //  初始化链表5000个元素
    List<String> list_5000 = new ArrayList<String>(5000);
    List<String> list_10000 = new ArrayList<String>(10000);
    List<String> list_100000 = new ArrayList<String>(100000);
    for (int i = 0; i < 1000; i++) {
      list_5000.add("1");
      list_5000.add("2");
      list_5000.add("3");
      list_5000.add("4");
      list_5000.add("5");
    }
    for (int j = 0; j < 2000; j++) {
      list_10000.add("1");
      list_10000.add("2");
      list_10000.add("3");
      list_10000.add("4");
      list_10000.add("5");
    }
    for (int k = 0; k < 20000; k++) {
      list_100000.add("1");
      list_100000.add("2");
      list_100000.add("3");
      list_100000.add("4");
      list_100000.add("5");
    }
    //avgTime(list_5000);
    //avgTime(list_10000);
    avgTime(list_100000);
    //System.out.println(traverseByFor(list_100000));
  }

  public static void avgTime(List list) {
    long forAvgTime = 0;
    long foreachAbgTime = 0;
    long iteratorAvgTime = 0;
    long listIteratorAvgTime = 0;
    for (int i = 0; i < 20; i++) {
      forAvgTime += traverseByFor(list);
      foreachAbgTime += traverseByForeach(list);
      iteratorAvgTime += traverseByIterator(list);
      listIteratorAvgTime += traverseByListIterator(list);
    }
    System.out.println("使用for遍历时间 ：" + forAvgTime );
    System.out.println("使用迭代器遍历时间 ：" + iteratorAvgTime );
    System.out.println("使用List迭代器遍历时间 ：" + listIteratorAvgTime );
    System.out.println("使用foreach遍历时间 ：" + foreachAbgTime );
  }

  /**
   * 使用for遍历
   */
  public static long traverseByFor(List list) {
    long start = System.currentTimeMillis();
    for (int i = 0; i < list.size(); i++) {
      String s = (String) list.get(i);
      //相应的业务逻辑
    }
    long between = System.currentTimeMillis() - start;
    return between;
  }

  /**
   * 使用迭代器遍历
   */
  public static long traverseByIterator(List list) {
    long start = System.currentTimeMillis();

    Iterator iterator = list.iterator();
    while (iterator.hasNext()) {
      String s = (String) iterator.next();
      //相应的业务逻辑
    }
    long between = System.currentTimeMillis() - start;
    return between;
  }

  public static long traverseByListIterator(List list) {
    long start = System.currentTimeMillis();

    Iterator iterator = list.listIterator();
    while (iterator.hasNext()) {
      String s = (String) iterator.next();
      //相应的业务逻辑
    }
    long between = System.currentTimeMillis() - start;
    return between;
  }

  /**
   * 使用foreach遍历
   */
  public static long traverseByForeach(List<String> list) {
    long start = System.currentTimeMillis();
    for (String attribute : list) {
      //相应的业务逻辑
    }
    long between = System.currentTimeMillis() - start;
    return between;
  }

}

List遍历的性能
- for ：对于顺序存储，因为读取特定位置元素的平均时间复杂度是O(1)，所以遍历整个集合的平均时间复杂度为O(n)。而对于链式存储，因为读取特定位置元素的平均时间复杂度是O(n)，所以遍历整个集合的平均时间复杂度为O(n2)（n的平方）。
  不过LinkedList做了一定优化,查询位置在链表前半部分，从链表头开始查找,查询位置在链表后半部分，从链表尾开始查找

public E get(int index) {
    checkElementIndex(index);
    return node(index).item;
}

Node<E> node(int index) {
    if (index < (size >> 1)) {   //查询位置在链表前半部分，从链表头开始查找
        Node<E> x = first;
        for (int i = 0; i < index; i++)
            x = x.next;
        return x;
    } else {                     //查询位置在链表后半部分，从链表尾开始查找
        Node<E> x = last;
        for (int i = size - 1; i > index; i--)
            x = x.prev;
        return x;
    }
}

Iterator : 就性能而言，对于RandomAccess类型的集合来说，与for对比并没有太大意义，但是对于Sequential Access的集合来说，就有很重大的意义了，因为Iterator内部维护了当前遍历的位置，所以每次遍历，读取下一个位置并不需要从集合的第一个元素开始查找，只要把指针向后移一位就行了，这样一来，遍历整个集合的时间复杂度就降低为O(n)；但是使用迭代器主要是为了解耦合，它可以把访问逻辑从不同类型的集合类中抽象出来，从而避免向客户端暴露集合的内部结构。例如，如果没有使用Iterator，遍历一个数组的方法是使用索引；而访问一个链表（LinkedList）又必须使用while循环。
foreach ：分析Java字节码可知，foreach内部实现原理，也是通过Iterator实现的，只不过这个Iterator是Java编译器帮我们生成的，所以我们不需要再手动去编写。但是因为每次都要做类型转换检查，所以花费的时间比Iterator略长。时间复杂度和Iterator一样。
总的来说，还是推荐使用迭代器，可以使代码解耦合，也不必考虑集合内部实现，降低了时间复杂度，同时可以在遍历的同时对集合进行操作。

集合框架——List接口及其实现类源码解析（JDK8）

List接口:

ArrayList

猜你喜欢

热点阅读