HashMap 源码分析

2021-01-25 本文已影响0人 terry蒋

一、前言

花了一周学习 HashMap 的源码，探究其原理和设计思路，发现 HashMap 设计非常精妙，而且一直在精益求精地完善。学后发现 HashMap 有太多有价值的知识，难怪面试官对 HashMap 这么关注。

打算写一篇总结的文章，但是 HashMap 有非常多的细节，一篇文章肯定不够讲述清楚，本文只分析 HashMap 基础原理，后面再抽空再写其他文章补充细节。

二、概述

１. HashMap 简介

HashMap 是使用频率非常高的 Key-Value 关系的数据结构。从 JDK 1.2 开始就已经有该类，并在不同版本中不断迭代优化。

２. HashMap 特性

允许 key 为 null，value 为 null。key 为 null 时，hash 值为 0。
HashMap　键值对是无序的，键值对存放的位置和插入的顺序无法。且在进行某些操作(put、remove)后，键值对的顺序可能会发生变化。
HashMap　非线程安全。

3. HashMap 原理

算法：HashMap 底层基于拉链式的散列算法实现。
数据结构：HashMap 在 JDK 1.7 基于 数组 + 链表 实现，在 JDK 1.8 基于 数组 + 链表 + 红黑树 实现。

三、源码分析

本文将分析 JDK 1.7 和 JDK 1.8 的 HashMap 源码。

1. 核心变量

JDK 1.8 中 HashMap 源码中主要的变量

    /* --------------  静态变量 -------------- */
    /**
     * The default initial capacity - MUST be a power of two.
     * 中文解释: 默认初始容量-必须是 2 的幂
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The load factor used when none specified in constructor.
     * 中文解释: 默认的负载因子。构造函数中未指定负载因子时，将使用该默认值.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     * 中文解释: 决定桶用红黑树而非链表的阀值。当添加一个元素到桶时，如果桶中元素个数达到该阀值，桶将转换为树。
     * 该值必须大于 2 且小于等于 8 。
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     * 中文解释: 用于容量调整操作时将拆分树结构的阀值。应该小于TREEIFY_THRESHOLD，且最大为6。
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     * 中文解释: 必须大于等于该值，桶才允许被转换成数结构。(否则数组将因为单个桶中节点过多而扩容)
     * 应该至少是 TREEIFY_THRESHOLD 的四倍，以避免调整大小和树化阈值之间的冲突。
     */
    static final int MIN_TREEIFY_CAPACITY = 64;


    /* --------------  成员变量 -------------- */
    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     * 中文解释: 该 table 在首次使用时初始化，并根据需要调整大小。
     * 分配空间时，长度始终是 2 的幂。
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     * 中文解释: 保存缓存的 entry 集合。          
     * 请注意，keySet() 和 values() 取的是 HashMap 的 父类 AbstractMap 中的字段 keySet 和keySet。
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     * 中文解释: 在当前 map 中 key-value 键值对的总数
     */
    transient int size;

    /**
     * The next size value at which to resize (capacity * load factor).
     * 中文解释: 到达该值时将跳转 table 的大小（该值 = 容量 * 负载因子）。
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    // 如果尚未分配表数组，则此字段保留初始数组容量，或者为零，表示DEFAULT_INITIAL_CAPACITY
    int threshold;

    /**
     * The load factor for the hash table.
     * 中文解释: hash table 的负载因子
     *
     * @serial
     */
    final float loadFactor;

initialCapacity

loadFactor

threshold

2. 核心方法

小提示：

HashMap 源码中，经常会在 if 判断表达式、for 循环遍历等步骤中进行赋值操作，这个赋值操作可能比较容易忽略，且表达式过长导致阅读时不易理解，需要多留意。

在 putVal() 方法中有一段代码作为示例：

// 注意此时 p 被赋值了 tab[i = (n - 1) & hash]。与我们平时的习惯不一样，平时一般会在 if 判断前进行赋值
if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
else {
    Node<K,V> e; K k;
    // p 在此处用到
    if (p.hash == hash &&
        ((k = p.key) == key || (key != null && key.equals(k))))
        ......
        ......
        ......
}
}

2.1 构造函数

JDK 1.8 中 HashMap 的构造函数源码

    /**
     * 构造函数1(最常用): 无参构造函数。负载因子取默认的0.75
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }   

    /**
     * 构造函数2 (推荐使用): 指定初始容量。负载因子取默认的0.75
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    /**
     * 构造函数3: 指定初始容量和负载因子
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        // 找到大于或等于 initialCapacity 的最小2的幂，设置为容量
        this.threshold = tableSizeFor(initialCapacity);
    }

    /**
     * Returns a power of two size for the given target capacity.
     * 计算出大于或等于 initialCapacity 的最小2的幂
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

    /**
     * 构造函数4: 指定要拷贝的map。负载因子取默认的0.75
     * Constructs a new <tt>HashMap</tt> with the same mappings as the
     * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
     * default load factor (0.75) and an initial capacity sufficient to
     * hold the mappings in the specified <tt>Map</tt>.
     *
     * @param   m the map whose mappings are to be placed in this map
     * @throws  NullPointerException if the specified map is null
     */
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }

构造函数 1 是最常用的构造函数，但是推荐使用构造函数 2 ，指定初始容量。当未指定初始容量时，会指定默认的初始容量 16，如果在实际使用中容量需求远大于 16 ，则需要多次扩容，造成性能消耗。

2.2 put() 插入

JDK 1.8 中 HashMap 的 put() 方法源码

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 * 中文解释: 将指定的 value 与 当前 map 中指定的 key 相关联。
 * 如果当前 map 原先已有 指定 key 相关的 映射关系，则老的 value 将被覆盖。
 * @param key key with which the specified value is to be associated
 * 将与 value 关联的 key
 * @param value value to be associated with the specified key
 * 将与 key 关联的 value
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 * 返回原先与 key 关联的值；如果没有 key 关联的映射，则为 null 。 （返回 null 也可能是因为原先 key 关联的 value 为 null。）
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

/**
 * Implements Map.put and related methods
 * 中文解释: 该方法是 Map.put() 和 其他相关方法 的具体实现
 * @param hash hash for key  中文解释:  key 的 hash 值
 * @param key the key        中文解释:  key 值
 * @param value the value to put   中文解释: put 设置的 value
 * @param onlyIfAbsent if true, don't change existing value 
 * 中文解释: onlyIfAbsent，直译是，是否仅仅在不存在时才执行。直白解释是，如果为 true ，则不改变已存在的 value。一般情况，都是 false，只有 putIfAbsent(K key, V value) 方法中 调用 putVal() 时，onlyIfAbsent = true。
 * @param evict if false, the table is in creation mode.  中文解释: 如果为 false，则 table 处于创建模式
 * @return previous value, or null if none 中文解释: 返回原先关联的 value ，如果无关联的 value 则返回 null
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    
    // 如果桶数组 table 为空，即未进行初始化，则进行初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        // 用 resize() 扩容的方式进行初始化
        n = (tab = resize()).length;
    // 如果 table 中不包含 hash 值对应的节点，则新建键值对节点，并放入 table。注意此时已经将 p 指向了tab[i = (n - 1) & hash]，即 table 中 hash 值相同
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // 如果 table 中包含 hash 值对应的节点
    else {
        Node<K,V> e; K k;
        
        /* 
         * 如果当前桶中第一个节点的 hash 值、 键值都相等时，则将 e 指向该节点。
         * 为什么是第一个呢？因为 table 存放的是链表的第一个节点或者红黑树的根节点
         */
        if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 如果桶中的引用类型为TreeNode，则调用红黑树的插入方法
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // 如果不是红黑树，则一定是链表
        else {
            // 对链表进行遍历，并在遍历的过程中统计链表长度
            for (int binCount = 0; ; ++binCount) {
                // 链表中不包含要插入的键值对节点时，则将该节点接在链表的最后
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 如果链表长度大于或等于树化阈值，则链表转为树
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // 如果当前链表包含了相同 key 的键值对，则终止遍历
                if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        
        // 当前 map 中 存在要插入的 key 
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            // onlyIfAbsent 如果为 false ，且 原先的 value 为 null 时，才覆盖旧的值。
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            // 在 HashMap 中 afterNodeAccess(e) 是空方法
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    
    // 键值对数量超过阀值，则进行扩容
    if (++size > threshold)
        resize();
    
    // 在 HashMap 中 afterNodeInsertion(evict) 是空方法
    afterNodeInsertion(evict);
    return null;
}

HashMap会缩容吗？

HashMap 只有扩容，没有缩容的机制。

参考：为什么java的hashmap不支持动态缩小容量？

2.3 get() 查询

JDK 1.8 中 HashMap 的 get() 方法源码

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it's also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

/**
 * Implements Map.get and related methods
 * 中文解释: 该方法是 Map.get() 和 其他相关方法 的具体实现
 * @param hash hash for key  中文解释: key 的 hash 值
 * @param key the key   中文解释: key 值
 * @return the node, or null if none  中文解释: 返回 key 对应的节点，如果节点不存在，则返回 null
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    
    // 如果 table 不为空，且 key 在 table 存在，则定位 key 对应节点所在位置
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 如果正好是桶的第一个节点
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 如果不是桶的第一个节点，则继续往后查找
        if ((e = first.next) != null) {
            // 如果桶中的引用类型为TreeNode，则根据红黑树的查找方式查找节点
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 如果不是树，则一定是链表。遍历链表，查找出节点
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    
    
    // 如果 table 空，或 key 在 table 不存在，则返回 null
    return null;
}

查找的步骤还是比较简单的。

2.4 遍历

遍历的方式

Map<String, Integer> map = new HashMap<>(16);
map.put("a", 1);
map.put("b", 2);

// 遍历方式1
for (Map.Entry<String, Integer> next : map.entrySet()) {
    System.out.println("key=" + next.getKey() + " value=" + next.getValue());
}

// 遍历方式2
for (String key : map.keySet()) {
    System.out.println("key=" + key + " value=" + map.get(key));
}

推荐使用方式 1 ，因为第一种方式可以直接把 key 和 value 取出，而方式 2 需要通过 key 再去搜索一次，效率差得多。

2.5 remove() 删除

JDK 1.8 中 HashMap 的 remove() 方法源码

/**
 * Removes the mapping for the specified key from this map if present.
 * 删除当前 map 中指定 key 对应的映射关系
 *
 * @param  key key whose mapping is to be removed from the map
 * 中文解释: 要从 map 中删除对应映射关系的 key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 * 中文解释: 返回 key 原先关联的 value，当原先的 value 为 null，或 key 无对应的映射关系时，则返回 null
 */
public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

/**
 * Implements Map.remove and related methods
 * 中文解释: 该方法是 Map.remove() 和 其他相关方法 的具体实现
 * @param hash hash for key 中文解释: key 对应的 hash 值
 * @param key the key 中文解释: key
 * @param value the value to match if matchValue, else ignored 中文解释: 如果matchValue 为 true 才匹配的 value ， 否则忽略 TODO 还是不太理解干啥用的
 * @param matchValue if true only remove if value is equal 中文解释: 如果为 true ，则仅在 value 相等时才删除
 * @param movable if false do not move other nodes while removing 中文解释: 如果为 false ，则在删除时，不会移动其他节点
 * @return the node, or null if none 中文解释: 返回 key 对应的节点，如果不存在对应节点，则返回 null
 */
final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
  
  // 和 get() 类似，如果 table 不为空，且 key 在 table 存在，则定位 key 对应节点所在位置
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
    
    // 如果正好是桶的第一个节点
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
    // 如果不是桶的第一个节点，则继续往后查找
        else if ((e = p.next) != null) {
      // 如果桶中节点的引用类型为TreeNode，则根据红黑树的查找方式查找节点
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
      // 如果不是树，则一定是链表。遍历链表，查找出节点
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
          // p = e 的目的是用 p 指向当前节点，而while 表达式中 e = e.next 是让 e 指向 下一个节点，该步骤是后面移除节点，修复链表的准备工作
                    p = e;
                } while ((e = e.next) != null); // 如果 (e = e.next) == null 说明已经遍历到最后一个节点，依旧没有找到对应的节点，将退出 do-while 循环，此时 node 依旧为 null
            }
        }
    
    // 如果存在对应节点，则删除节点，并修复链表或红黑树
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
      // 如果桶中节点的引用类型为TreeNode，则按红黑树删除节点的方式删除该节点
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
      // 如果不是树，则一定是链表结构。如果是第一个节点，将 table 的 index 下标的位置指向 node 节点的下一个节点，即第二个节点移动到第一个节点，这样 node 节点就在链表中删除了
            else if (node == p)
                tab[index] = node.next;
      // 如果是链表结构，且不是第一个第一个节点，则 key 对应的节点 node 的前一个节点的 next 指 向 node 之后的节点，这样 node 节点就就在链表中删除了
            else
                p.next = node.next;
            ++modCount;
      
      // map 的 键值对的总数 size 减一 
            --size;
      // 在 HashMap 中 afterNodeRemoval(e) 是空方法
            afterNodeRemoval(node);
            return node;
        }
    }
  
  // 如果 table 空，或 key 在 table 不存在，则返回 null
    return null;
}

romove() 前半部分和 get() 方法步骤是几乎是一样的。