HashMap源码分析

2016-05-20 本文已影响83人言西枣

源码来自jdk1.8

实现了Map<K, V>接口
可以有null键和null值（于此相对，HashTable不允许null，且是同步的）
get和put操作O(1)
性能受initial capacity 和 load factor影响
不同步，解决办法：

This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method.

Map m = Collections.synchronizedMap(new HashMap(...));

iterator fast fail

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {
     /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;
    
    /* ---------------- Fields -------------- */

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;
    // ....
}

常量中比较重要的几点

Capacity一直是2的幂，也就是下面table数组的长度
默认装载因子0.75
bin由链表转化为红黑树的临界值是8

属性中比较重要的

Node<K,V>[] table 是map存储键值对的对象，每个键值对就是一个实现了Entry接口的Node，table中的每个元素即是一个键值对Node，也是这个键值对链表的头节点，通过hash得到相同坐标的键值对通过链表链接在头节点后面
threshold resize()的临界值，也就是（Capacity＊loadfactor）
loadFactor 装载因子
这里要注意的是size指的是所有键值对的数量，而与table数组长度无关

table结构

Node<K,V>

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

单个节点中要注意的是，节点的hash值和key都是final修饰的，而value和下一个节点next是可以更改的。
还有就是这个Node是链表结构的，所以转换为红黑树以后，要相应的换成树节点（也是Node类型的子类），后文会提到。

put

理解了put函数，也就理解了HashMap底层是如何存放键值对(Node)的.
put函数的流程大致如下：

计算key的hash值，这里不仅是调用key.hashCode()函数，还有进一步的计算。
通过hash值进一步计算键值对在数组中的位置，相同hash值的键值对在数组中的坐标相同，也就是说相同hash值的键值对处于同一个bin中，他们以链表的形式存放在数组的这个坐标下。
如果是第一个就直接放，如果碰撞了就链接到链表后面去。
如果链表>=TREEIFY_THRESHOLD，就将链表转换为红黑树（这样查找的时间由O(n)变为O(log(n))）
如果节点存在就更新值，返回旧值
如果map的size>= threshold，那么就要resize()

    public V put(K key, V value) {
        // 这里计算了hash值
        return putVal(hash(key), key, value, false, true);
    }
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
         // 如果map为空，那么通过resize初始化
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // 如果这个bin为空，那么就把这个键值对放到这个index的位置上，成为这个bin的第一个元素
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        // 如果这个位置已经有其它元素了，那就依次比较，存在就更新，不存在就添加
        else {
            Node<K,V> e; K k;
            // p用来保存这个位置的第一个元素，如果正好就是这个元素，那么就用e来返回找到的元素，
            // 如果不是就分链表和红黑树继续找，同样也是找到就用e返回，找不到就添加
            // 这里比较的时候先比较了hash值，然后再比较key是否相等，之所以要比较hash值是
            // 因为在定位到这个位置的时候只使用了hash值的低位（n - 1）& hash，这个的分析见get
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // 红黑树的情况，添加或者返回找到的节点
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // 链表的情况，添加或者返回找到的节点
            else {
                for (int binCount = 0; ; ++binCount) {
                    // 到达链表结尾，没有找到，那么插入新节点
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // 如果链表过长，那么转化为红黑树
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // 在链表中找到了相同key的节点
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // 如果e不为空，就是找到了节点，那么更新节点的value
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        // 增加修改次数
        ++modCount;
        // 如果大于临界值，扩大
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

get

get与put思路基本一致，只是缺少了添加（更新）这个步骤：

key->hashCode()->hash->index
比较节点
a) 第一个节点
b) 链表查找
c) 红黑树查找

    public V get(Object key) {
        Node<K,V> e;
        // 这里计算hash值
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
    
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // 如果table不为空，且index位置有节点
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
             // 检查第一个节点
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            // 如果第一个节点不是，且后面还有节点
            if ((e = first.next) != null) {
                // 红黑树的情况
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // 链表的情况
                do {
                    // 找到就返回
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

hash

计算key的hash值HashMap的关键点，hash函数关系着能否均匀的将键值对散列开，如果散列效果不好，就会发生很多碰撞，影响查找添加的效率，如果计算太过复杂，同样也会影响效率，这里给出jdk1.8的实现

static final int hash(Object key) {
        int h;
        // 保持hashCode的高16位，将低16位与高16位异或
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

resize

resize函数用来申请table数组，所以会在放进第一个键值对初始化和达到threshold进行扩容两种情况下使用。
函数总体也分为上下两个部分，上半部分得到新的table的capacity和threshold值，分配数组，下半部分用于在扩容情况下，将所有的键值对重新计算得到坐标，也就是个再散列的过程。

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        // 到这里为止，函数只做了两件事，就是确定新的table的capacity和map的新的threshold
        threshold = newThr;
        // 这里申请新的table
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        // 从这里开始，对原来所有的键值对重新散列，hash值是存在节点里的，不需要重新计算
        if (oldTab != null) {
            // 按坐标遍历table，每次处理一个bin
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                // 如果这个bin有节点
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    // 如果只有一个节点，那么重新散列，其实也就是由于capacity扩大，计算index时多使用了hash值中的一个高位
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    // 如果这个bin是红黑树结构
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // 如果这个bin是链表结构
                    else { // preserve order
                        // lo其实也就是原来的index位置
                        Node<K,V> loHead = null, loTail = null;
                        // hi就是扩容后与lo相对的新的index，坐标相差了原来的capacity
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // 这里的到的是hash值中新加入计算的高位，根据这位是0或1将原来的链表拆分成两个链表
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        // 将链表（高位为0）放到原来的index位置
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                         // 将链表（高位为1）放到新的index位置
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

iterator

HashMap中的iterator分为key，value，entry三种，基本实现都是一样的，唯一需要关注的是，由于是按照table来遍历的，所以顺序会看起来是无序的。

参考阅读
how-does-a-hashmap-work-in-java
HashMap的key可以是可变的对象吗？

HashMap源码分析

Node<K,V>

put

get

hash

resize

iterator

猜你喜欢

热点阅读