Java HashMap 源码分析及文档归纳

2018-12-04 本文已影响0人 abaisa

Java Hash Map 源码分析及文档归纳

源码分析

常量定义

    // 默认初始容量为16，这里这个数组的容量必须为2的n次幂。
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    // 最大容量
    static final int MAXIMUM_CAPACITY = 1 << 30;

    // 默认load_factor
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    // 不使用红黑树的最大Node数
    static final int TREEIFY_THRESHOLD = 8;

    //以Node<K,V>为元素的数组，也就是上图HashMap的纵向的长链数组，起长度必须为2的n次幂
    transient Node<K,V>[] table;

    //已经储存的Node<key,value>的数量，包括数组中的和链表中的
    transient int size;

    //扩容的临界值，或者所能容纳的key-value对的极限。当size>threshold的时候就会扩容
    int threshold;

    //加载因子
    final float loadFactor;

put方法

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        // tab是当前的数组 p是当前操作的Node（每个桶）
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        
        // 如果table为空，则调用resize()函数创建一个，相当于第一次插入数据进行的table初始化
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // i是元素要存储位置的index，若tab[i]没有元素，则创建一个新的Node元素
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        // 已有元素，发生了碰撞，进行处理
        else {
            // e是碰撞处理中的新的临时Node，记录需要修改value的Node，若需要增加新的Node，e值为null
            Node<K,V> e; K k;
            // p的key和要插入的key相等，则p节点为需要修改value的Node
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // 若p链为红黑树，按照红黑树处理
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // 非红黑树。链地址处理冲突，尾插法
            else {
                for (int binCount = 0; ; ++binCount) {
                    // 插入尾部
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // 当插入后链表长度大于8，转为红黑树
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // 已有对应的Node
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // e！=null也就是存在已有Node，Key值和想要插入的Key相等，此时更新这个Node的Value
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                // 没有插入新的Node，直接返回
                return oldValue;
            }
        }
        // 插入了新的Node，modCount自增
        ++modCount;
        // 有了新Node，size加一。若超过最大容量限制，扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

putVal 中 hash相关代码

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        ...... // 省略
        // n为2的次幂，因此对n-1进行与操作相当于取余，与运算效率高因此使用与运算
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        ...... // 省略

    static final int hash(Object key) {
        int h;
        // 去key的hashcode，并对高16位和低16位取异或
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

为什么要对key的hashcode高16位和低16位异或

假设length为16，只要哈希码的后4位为0，不论不论高位怎么变化，最终的结果均为0。因此只取后四位，或低位的话，产生"碰撞"的几率就会较大(&运算中产生碰撞的原因很多，这里只是举个例子)。为了解决低位与操作碰撞的问题，使用“扰动函数”对高16位和低16位进行异或操作。int是32位，右移16位相当于高16位和低16位取异或，目的是混合原始哈希码的高位和低位，优化哈希后的分布。

tableSizeFor方法

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

HashMap的capacity必须是2的幂，这个方法用于找到大于等于cap的最小的2的幂（cap如果就是2的幂，则返回的还是这个数）。

resize方法

    final Node<K,V>[] resize() {
        // oldTab保存原始数组
        Node<K,V>[] oldTab = table;
        // cap是原始数组长度 thr是数组临界值
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            // 原始数组超过了最大值（2^30），数组扩容至最大int，不再处理
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // 新数组长度翻倍，临界值翻倍
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        // 若旧的数组长度 <= 0则新数组长度设置为旧的数组临界值
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        // 若旧的临界值也是0，则全部使用默认值初始化
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // 若新的临界值为0则做计算
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        // 更新HashMap的threshold
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            // 申请内存，生成newTable并更新HashMap的table
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        // 若原数组不为空，将原数据迁移到新数组中来
        if (oldTab != null) {
            // 遍历原数组中元素
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    // 若原数组中的元素没有后续链表，则直接移到新数组中
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    // 若e已经是一个红黑树，则按照红黑树处理
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // 原数组中这个元素后面连接了一个链表，链表重排
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        // 循环链表
                        do {
                            next = e.next;
                            // 个人理解：oldCap是2的n次幂，newCap则是2的n+1次幂。e.hash & oldCap相当于判断第(n+1)位是否为1，若为0则新旧数组位置相同，放在lo链中，若为1则新旧数组位置不同，放在hi链中
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

JDK文档

java.lang.Object

java.util.AbstractMap<K, V>

java.util.HashMap<K, V>

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

HashMap允许key或者value为空
HashMap和HashTable大体一致，区别是HashMap是不同步的，且允许null为key
HashMap不保证按序

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

HashMap为基础操作（get和put）提供常数级别的时间复杂度

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

HashMap有两个重要参数，capacity和load factor
当实体数量超过当前的load factor和capacity时，HashMap会扩容到大致两倍的大小并“重哈希”

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

为了平衡时间和空间，默认的load factor是0.75
初始容量大于 > （最大元素数量 / load factor），则不会发生“重哈希”。