java集合系列之一HashMap

2016-05-13 本文已影响555人 heichong

1. HashMap概述

HashMap是基于哈希表的Map接口的非同步实现。此实现提供所有可选的映射操作，并允许使用null值和null键。此类不保证映射的顺序，特别是它不保证该顺序恒久不变。

2. HashMap的数据结构

本文涉及的图中，每个圆角方块代表一个Entry对象，里面包括key、value等元素，图中黄色方块内的值为Entry对象的key。

HashMap实际上是一个“链表散列”的数据结构，即数组和链表的结合体。

HashMap数据结构

HashMap实际上就是用table数组来进行存储数据的。可以看出，Entry就是数组中的元素，每个 Map.Entry 其实就是一个key-value对，它持有一个指向下一个元素的引用，这就构成了链表，即单向列表。
下面我们来看下Entry对象的代码:

static class Entry<K,V> implements Map.Entry<K,V> {  
    final K key;  //key值
    V value;  //value值
    Entry<K,V> next;  //指向下一个Entry对象的引用
    final int hash;  //hash值
  
    /** 
     * Creates new entry. 
     */  
    Entry(int h, K k, V v, Entry<K,V> n) {  
        value = v;  
        next = n;  
        key = k;  
        hash = h;  
    }  
  
    ......  
}

每一个键值对都对应一个 Entry 对象。 Entry 对象中有一个next，指向另一个Entry对象的引用。这样就组成了一个单向列表。

2. HashMap初始化

默认初始化方法

/** 
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity 
 * (16) and the default load factor (0.75). 
 */  
public HashMap() {  
    this.loadFactor = DEFAULT_LOAD_FACTOR;  
    threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);  
    table = new Entry[DEFAULT_INITIAL_CAPACITY];  
    init();  
}

从上面代码中可以看出，内部HashMap的负载因子（load factor ）为 0.75 ，初始最大容量（initial capacity，即内部数组的长度）为 16，实际最大容量（ threshold ）为0.75*16=12

负载因子（load factor ），表述为实际最大容量 / 初始最大容量的比值。简单来说，负载因子相当于table数组的填充率，默认为0.75。这个值是对空间和时间效率的一个平衡选择。

初始最大容量（initial capacity，即内部数组的长度），内部数组的长度。

** 实际最大容量（ threshold ）** ，定义为负载因子与初始最大容量的乘积（求整）。当table数组的已使用数量大于 threshold 时，HashMap就会重新扩张table数组的长度，长度扩展为当前的两倍

我们可以通过代码来看看“实际最大容量（ threshold ）”是如何扩展的。调用HashMap.put(key,value)时，其中部分代码如下：

/** 
    * Adds a new entry with the specified key, value and hash code to 
    * the specified bucket.  It is the responsibility of this 
    * method to resize the table if appropriate. 
    * 
    * Subclass overrides this to alter the behavior of put method. 
    */  
   void addEntry(int hash, K key, V value, int bucketIndex) {  
       Entry<K,V> e = table[bucketIndex];  
       table[bucketIndex] = new Entry<K,V>(hash, key, value, e);  
       if (size++ >= threshold)  
           resize(2 * table.length);  
   }

从上面我们可以看出，初始化HashMap时，我们最好能预估到HashMap的大小，否则在put的过程中，当table数组的已使用数量大于 threshold 时， HashMap就会自动扩展（我们把这种行为叫做rehash，调用resize(2 * table.length);方法），代码如下：

void resize(int newCapacity) {  
    Entry[] oldTable = table;  
    int oldCapacity = oldTable.length;  
    if (oldCapacity == MAXIMUM_CAPACITY) {  
        threshold = Integer.MAX_VALUE;  
        return;  
    }  
    //新建一个table数组  
    Entry[] newTable = new Entry[newCapacity];  
    transfer(newTable);//把当前数组元素移动到新数组中  
    table = newTable;  
    threshold = (int)(newCapacity * loadFactor);

/** 
 * Transfers all entries from current table to newTable. 
 */  
void transfer(Entry[] newTable) {  
    Entry[] src = table;  
    int newCapacity = newTable.length;  
    for (int j = 0; j < src.length; j++) {//循环old数组中所有元素  
        Entry<K,V> e = src[j];  
        if (e != null) {  
            src[j] = null;  
            do {//循环每个元素所属的单向链表  
                Entry<K,V> next = e.next;  
                int i = indexFor(e.hash, newCapacity);//根据Entry的hash重新计算它在新数组中的位置  
                e.next = newTable[i];//新数组之前移动的数组元素作为Entry的下一个Entry：即插入到当前链表的最顶端  
                newTable[i] = e;  
                e = next;  
            } while (e != null);  
        }  
    }  
}

从上面代码可以看到，rehash导致所有已存入的元素都要根据key的hash与新数组的长度值，重新计算每个Entry在新数组中的位置，并存入新数组。这个动作是相当费时间的，所以我们要尽量避免让HashMap进行自动扩展。

指定参数初始化方法

实际上，我们可以通过 HashMap(int initialCapacity)：构建一个初始容量为 initialCapacity，负载因子为 0.75 的 HashMap；或者 HashMap(int initialCapacity, float loadFactor)：以指定初始容量、指定的负载因子创建一个 HashMap。
一般情况下，我们不需要修改默认的负载因子，所以常用HashMap(int initialCapacity)。其初始化实际上是调用的HashMap(int initialCapacity, float loadFactor)方法，代码如下：

/** 
 * Constructs an empty <tt>HashMap</tt> with the specified initial 
 * capacity and load factor. 
 * 
 * @param  initialCapacity the initial capacity 
 * @param  loadFactor      the load factor 
 * @throws IllegalArgumentException if the initial capacity is negative 
 *         or the load factor is nonpositive 
 */  
public HashMap(int initialCapacity, float loadFactor) {  
    if (initialCapacity < 0)  
        throw new IllegalArgumentException("Illegal initial capacity: " +  
                                           initialCapacity);  
    if (initialCapacity > MAXIMUM_CAPACITY)  
        initialCapacity = MAXIMUM_CAPACITY;  
    if (loadFactor <= 0 || Float.isNaN(loadFactor))  
        throw new IllegalArgumentException("Illegal load factor: " +  
                                           loadFactor);  
  
    // Find a power of 2 >= initialCapacity数组的长度必须是2的n次方，假如初始化initialCapacity=6，实际table数组的长度是大于6、且2的n次方的最小值 ，即 2的3次方=8
    int capacity = 1;  
    while (capacity < initialCapacity)  
        capacity <<= 1;  
  
    this.loadFactor = loadFactor;  
    threshold = (int)(capacity * loadFactor);  
    table = new Entry[capacity];  
    init();

这里，要求table的实际长度（初始最大容量（initial capacity，即内部数组的长度）），必须为2的n次方。原因是为了更高性能的计算出每个key-value对应的元素位置.

情况1：我们如果需要使用HashMap保存20个key-value对，那我们就要Map map= HashMap(20),而实际上，20并不是2的n次方，所以初始化时，table实际的长度会被初始化2的5次方=32，而此map在不扩展的情况下最多可以容纳320.75=24个key-value对。*

情况2：我们需要保存16个key-value对，如果我们Map map= HashMap(16)这样初始化，因为16正好的2的4次方，所以table数组的长度就会被初始化为16，而此map在不扩展的情况下最多可以容纳160.75=12个key-value对.而我们需要保存16个，这样就会导致map自动扩展table数组，非常影响性能。实际上，为了能够保存16个key-value对，考虑的负载因子0.75，我们应该这样初始化Map map= HashMap(32)，（其实17-32之间的任何一个值都可以，在初始化时，最终都会被修正为2的5次方：32）*

参考链接：
http://blog.csdn.net/turkeyzhou/article/details/3197520
http://zhangshixi.iteye.com/blog/672697