死磕源码系列 - HashMap

2019-10-07 本文已影响0人 sunyelw

本篇涉及的HashMap的源码都是一些我想讲的, 没涉及到的都是不想讲或不会的, 纯学习记录.

数据结构
内部字段
构造函数
添加数据
数组扩容
查找数据
删除数据
拓展问题

数据结构

/**
 * Basic hash bin node, used for most entries.  (See below for
 * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 */
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

HashMap内部是一个一个的Node构成, 这就是一个数组.
而Node的结构中除了基本的K/V/hash 还有一个Node类型的next属性, 很明显, 这是一个单链表.

hash值就是数组下标, 通过对key进行哈希算法得到的哈希值就可以直接访问到此节点, 时间复杂度O(1) (也就是数组的杀手锏: 根据下标随机访问)
1.8 以后这个next字段多了一个类型TreeNode, 也就是传说中链表长度过8且数组长度超过64的红黑树优化.

HashMap 的底层就是数组+链表(1.7)

内部字段

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
transient Node<K,V>[] table;

/**
 * Holds cached entrySet(). Note that AbstractMap fields are used
 * for keySet() and values().
 */
transient Set<Map.Entry<K,V>> entrySet;

/**
 * The number of key-value mappings contained in this map.
 */
transient int size;

/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.  (See ConcurrentModificationException).
 */
transient int modCount;

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * @serial
 */
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;

/**
 * The load factor for the hash table.
 *
 * @serial
 */
final float loadFactor;

table HashMap中的数组
entrySet 持有当前的所有键值对
size 当前键值对的数量
modCount 修改次数, 防止并发错误
threshold 当size达到此值即要进行扩容 (很重要)
loadFactor 负载因子.

这个loadFactor需要单独拎出来讲讲清楚.

每个key通过hash算法后得到hash值是有可能重复的 (也就是所有hash算法都会面临的哈希冲突) . HashMap中的hash方法如下:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

使用自身的hashcode与高十六位异或得到的值作为新的hash值

这个hash值是一个int类型的值, 最多32位, 那么就是 2 ^ 32 个数, 如果超过这个数的key那么必定有不同的key对应同一个hash值.

重复咋办? 还记得Node中最后那个next属性吗, 串在后面就是了, 不然链表干啥的呢....

说回这个负载因子, 负载因子 = 总节点数量 / 当前数组大小, 所以随着往HashMap中插入数据, 这个loadFactor是在不断增加的, 当实际的负载因子达到了构造时的loadFactor时, 数组就需要进行一次扩容。

举个例子, loadFactor = 1.0, 如果插入的七个key得到的hash值全部相同, 那么这样的HashMap的访问时间复杂度会由O(1)退化到O(N)

同理如果loadFactor太小, 那么就要面临着频繁的扩容消耗与内存浪费.

HashMap的loadFactor默认是0.75

构造函数

1、无数据构造

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and the default load factor (0.75).
 *
 * @param  initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

这三个构造函数是没有实际数据传入的，只看第一个大的就好。

initialCapacity 初始容量
loadFactor 负载因子

前面都是一些基本的校验, 最后调用了一个tableSizeFor函数：

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

注释也说了, 根据给定容量返回一个2的幂次方. 看看具体逻辑

给定容量减一
五次无条件右移的或运算, 可以想想这个五次移位做到了什么事, 经过这五次操作后, 你会发现当前容量(减一后)的最高位到最低位全变成了1 , 最后返回一个n+1, 这不就是大于给定容量的最小的2的幂次方吗? (目瞪狗呆.jpg), 将这个值赋给threshold.

注意, 到这一步构造函数结束, 也没看到给HashMap的数组分配任何的内存空间, 只是确定了几个参数(threshold/loadFactor)

带数据初始化

public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if (s > 0) {
        if (table == null) { // pre-size
            float ft = ((float)s / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        else if (s > threshold)
            resize();
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

负载因子取默认0.75
然后将数据插入HashMap中. 里面涉及方法后面都会提到, 这里混个脸熟吧.

添加数据

接着让我们添加一个数据看看

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

注释: 在map中关联key和value的关系, 如果key已存在, 则替换.

继续看具体实现:

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

多了俩参数

onlyIfAbsent 只有key不存在时才插入, 这个值默认插入是false
evict 是否创建时插入, true - 否, false - 是, 这里是true, 我们是new完再add

继续看主体逻辑

先判断数组是否为空，空则调用resize方法获取新的数组.(扩容后面马上写)

然后判断当前下标是否有值, 没有直接插入
否则
(1) 判断是否为同一个key, 即hash值相同且key也相同时, 直接将之前数组中取出的节点返回e == p
(2) 如果是红黑树, 则调用putTreeVal插入
(3) 最后只剩链表了, 遍历链表, 判断是否尾节点, 否则继续判断是否为同一个key, 是则在后面加一个新节点, 最后还要判断链表长度是否达到8, 需要树化treeifyBin.

然后判断是否覆盖

if (e != null) { // existing mapping for key
    V oldValue = e.value;
    if (!onlyIfAbsent || oldValue == null)
        e.value = value;
    afterNodeAccess(e);
    return oldValue;
}

e != null有两种情况, 但结果都是一样的: 存在一模一样的key, 这个时候就需要判断下是否需要覆盖, onlyIfAbsent这个值这里就起作用了, 只有其为false即不管三七二十一均覆盖或者当前key的value是null的情况下, 均用新值覆盖旧值.

最后size和modCount自增, 其中如果size达到了threshold则需要一次扩容(负载因子超出了)

不管是否覆盖、是否有值返回的都是旧的value.(没有值就是覆盖null, 返回都是null)

这里还有俩方法没提 afterNodeInsertion / afterNodeAccess 这俩方法是HashMap的保留方法, 留待后人重写 (比如LinkedHashMap)

void afterNodeAccess(Node<K,V> p) { }
void afterNodeInsertion(boolean evict) { }

LinkedHashMap 重写了这俩方法, 这里也贴一下~

void afterNodeInsertion(boolean evict) { // possibly remove eldest
    LinkedHashMap.Entry<K,V> first;
    if (evict && (first = head) != null && removeEldestEntry(first)) {
        K key = first.key;
        removeNode(hash(key), key, null, false, true);
    }
}

void afterNodeAccess(Node<K,V> e) { // move node to last
    LinkedHashMap.Entry<K,V> last;
    if (accessOrder && (last = tail) != e) {
        LinkedHashMap.Entry<K,V> p =
            (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after;
        p.after = null;
        if (b == null)
            head = a;
        else
            b.after = a;
        if (a != null)
            a.before = b;
        else
            last = b;
        if (last == null)
            head = p;
        else {
            p.before = last;
            last.after = p;
        }
        tail = p;
        ++modCount;
    }
}

数组扩容

在介绍resize方法，也就是扩容逻辑之前，我们先讲一个高效的取模方法。
这里直接给出公式，我不会证明。。

x % n = x & （n - 1） （ps：n = 2 ^ y, y >= 0）

会用就行了，其实归根到底还是位运算最快。

继续看扩容方法resize()

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

这个方法比较复杂，我们分两步来看下

求newCap 和 newThr

Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
    if (oldCap >= MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return oldTab;
    }
    else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
             oldCap >= DEFAULT_INITIAL_CAPACITY)
        newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
    newCap = oldThr;
else {               // zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
    float ft = (float)newCap * loadFactor;
    newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
              (int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;

这里分三种情况
(1) 原数组不为空，即已有长度，扩容就比较简单

是否达到最大容量（Integer.MAX_VALUE - 8）, 是则不能再扩容了；
在原数组上扩容为其两倍，并判断扩容后的值不超过最大容量且原数组容量不小于默认容量（16）,满足则newThr也在oldThr上扩容两倍

(2) 数组初始为空且oldThr > 0, 还记得构造时得到的那个大于给定容量的最小2的幂次方的数赋给谁了吗? 就是这个oldThr啊, 所以这一步就是为了第一次扩容准备的, 直接将这个值当做新的数组容量就好

(3) 不满足上面条件, 取默认值(newCap默认16, newThr默认16 * 0.75),

三步判断后还需要保证newThr不能为0, 直接算一下就好newCap * loadFactor

threshold 这个值除了第一次初始化时存储的初始容量, 其他都是存的capacity * loadFactor, 这个capacity为扩容后的容量

然后新建一个newCap这么大的数组, 赋给table

数组建好了, 接下来就是迁移数据了(如果旧数组不为空)
这里就有个rehash的过程了,

if (oldTab != null) {
    for (int j = 0; j < oldCap; ++j) {
        Node<K,V> e;
        if ((e = oldTab[j]) != null) {
            oldTab[j] = null;
            if (e.next == null)
                newTab[e.hash & (newCap - 1)] = e;
            else if (e instanceof TreeNode)
                ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
            else { // preserve order
                Node<K,V> loHead = null, loTail = null;
                Node<K,V> hiHead = null, hiTail = null;
                Node<K,V> next;
                do {
                    next = e.next;
                    if ((e.hash & oldCap) == 0) {
                        if (loTail == null)
                            loHead = e;
                        else
                            loTail.next = e;
                        loTail = e;
                    }
                    else {
                        if (hiTail == null)
                            hiHead = e;
                        else
                            hiTail.next = e;
                        hiTail = e;
                    }
                } while ((e = next) != null);
                if (loTail != null) {
                    loTail.next = null;
                    newTab[j] = loHead;
                }
                if (hiTail != null) {
                    hiTail.next = null;
                    newTab[j + oldCap] = hiHead;
                }
            }
        }
    }
}

上来先存下了原有节点后直接将原数组下标对应值置空, 方便GC

oldTab[j] = null;

然后具体处理也分三种情况

(1) 如果一个Node只有一个key, 那很简单直接算出rehash后的下标就行

if (e.next == null)
  newTab[e.hash & (newCap - 1)] = e;

(2) 如果是红黑树, 那就调其相关方法处理 (我一定会搞懂红黑树的, 我发四)

if (e instanceof TreeNode)
   ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);

(3) 链表上不止一个key, 这里就要考虑一个问题, 现在这个链表上的值扩容后还是在这个下标上吗?

现在我们知道所谓的扩容都是在原有基础上扩容一倍，也就是当前容量左移一位。要知道扩容后下标是否发生变化只需要对新长度取模与之前的取模结果比对是否相同就行，不相同说明下标变了，相同说明没变。

还是用之前的高效取模公式，来看个例子，假设扩容前长度是8, 扩容后就是16了, 取模运算的第一个数不变设为x

// old
// x & (8 - 1) = x % 8
x & 0000 0111  
// new   
// x & (16 - 1) = x % 16 
x & 0000 1111

可以看到就是往前加了一位, 往前高位的值根本不会影响到取模结果, 再仔细看看, 加的这一位不就是oldCap吗? 再看看结果变化就是会加上个oldCap. (高位全是0, 只有低位参与运算)

也就是说, 同一条链表上的key,在扩容后下标要么不变要么加上oldCap.

那么如何判断是否有影响呢? 既然只有oldCap那一位对最后的结果有影响, 那就按照此位来做分割, 得到两个链表, 分别链到新数组的下标 i 与下标 i + oldCap 上就行了.

(e.hash & oldCap) == 0

这步判断就是判断影响位是1还是0了,
然后就是遍历判断然后往链表上链

do {
    next = e.next;
    if ((e.hash & oldCap) == 0) {
        if (loTail == null)
            loHead = e;
        else
            loTail.next = e;
        loTail = e;
    }
    else {
        if (hiTail == null)
            hiHead = e;
        else
            hiTail.next = e;
        hiTail = e;
    }
} while ((e = next) != null);

两条链表的头指针与尾指针, 结束条件就是next == null即链表到头了, 都比较简单, 看看代码吧~

如果是0则说明下标不变, 为1则下标加oldCap

if (loTail != null) {
    loTail.next = null;
    newTab[j] = loHead;
}
if (hiTail != null) {
    hiTail.next = null;
    newTab[j + oldCap] = hiHead;
}

这里还有一个跟1.7的区别, 1.7是直接对新数组长度取模x & (newCap - 1), 这里会导致一个问题，原数据顺序会被反转一下, 而1.8的这种拆分双链表法会保持其原有顺序.

将旧数组中的数据都这么来一遍, 然后返回这个新的数组, 扩容就算结束了.

可以看到, 每次扩容需要把原有数据全部迁移一遍, O(N)级别的时间复杂度, 而且还涉及链表红黑树的转换等等, 所以HashMap中最耗时的操作就是这步扩容, 实际开发中如果我们知道大致数据量可以提前给一个合适的 2 的幂次方作为初始容量, 当然也要把你们机器的实际情况考虑进去.

查找数据

其实经过上面的步骤, 也能猜出HashMap如何查询数据了

先根据key拿到hash值
找对应下标处的节点, 然后逐个比较真实key, 想等则取出其value返回
没找到返回null

让我们看下源码怎么玩的

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

果然基本与我们猜的差不多, 再多提两句

getNode方法除了寻址用的hash还带上了原始key
比较方式出了== 还用了equals方法, 二者满足其一都算相等
红黑树与链表的遍历查询, 没找到返回null

删除数据

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}


final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

先找到此节点, 没找到返回null
然后删除此节点
(1) 红黑树中的删除removeTreeNode
(2) 首节点删除, 直接链到其next属性上就行 tab[index] = node.next;
(3) 非首节点, 同样将其next节点链到前面一个节点上p.next = node.next;
modCount 自增, size 自减
调用未实现方法afterNodeRemoval (连上面俩方法都是给LinkedHashMap用的)

拓展问题

为什么链表长度达到8再转化为红黑树?

首先还是解释一下为啥要有转化为树这个操作.

前面说过只要是哈希算法都有hash冲突的可能, 都需要解决, 而HashMap的办法就是链表法, 即碰到相同的hash值直接在当前节点上往后链组成一个单链表.

试想一种极端情况, 所有的数据key的hash值都一样, 会发生什么事? 所有数据组成了一条单链表, 访问的时间复杂度从O(1)急剧下降到O(N), 对于工业级应用来说是无法容忍的.

那树是什么? 这么说吧, 红黑树是一颗二叉搜索树, 其保持了完美黑色平衡. 最坏时间复杂度都是 O(lgN), 可以保证其时间复杂度不会降低的太厉害.

那你可能要问, 既然如此为什么不干脆用红黑树? O(lgN)肯定比链表的O(N)好啊. 当然不能, 原因很简单
(1) 树中每个节点需要存储其左右子节点的指针, 如果全用红黑树, 占用内存空间太大
(2) 在N不是很大的时候,O(lgN)不一定就比O(N)快, 还要考虑系数/常量等.

综上, 需要一个权衡点, 超过这个点就用树存, 否则还是用链表.

回到了起点, 为什么是8 ?

源码中有一段关于8的说明

/*
 * Because TreeNodes are about twice the size of regular nodes, we
 * use them only when bins contain enough nodes to warrant use
 * (see TREEIFY_THRESHOLD). And when they become too small (due to
 * removal or resizing) they are converted back to plain bins.  In
 * usages with well-distributed user hashCodes, tree bins are
 * rarely used.  Ideally, under random hashCodes, the frequency of
 * nodes in bins follows a Poisson distribution
 * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
 * parameter of about 0.5 on average for the default resizing
 * threshold of 0.75, although with a large variance because of
 * resizing granularity. Ignoring variance, the expected
 * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
 * factorial(k)). The first values are:
 *
 * 0:    0.60653066
 * 1:    0.30326533
 * 2:    0.07581633
 * 3:    0.01263606
 * 4:    0.00157952
 * 5:    0.00015795
 * 6:    0.00001316
 * 7:    0.00000094
 * 8:    0.00000006
 * more: less than 1 in ten million
 */

大义是在随机哈希函数下, 其得到的哈希值满足泊松分布, 然后哈希值相同的值达到8的概率是百万分之一, 基于此概率, 将这个权衡点设为了8.

还有一个值直接表示这个8

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

二十多岁的Java每一次改动都是有其充足的科学依据的.

今天本来想看ConcurrentHashMap的, 结果回顾自以为半个小时就能捡起来的HashMap花了一下午......

温故而知新, 古人诚不我欺.