Java数据结构_JDK1.7HashMap的工作原理

2019-05-16 本文已影响9人未见哥哥

本文的源码是基于 JDK 1.7版本

JDK 1.7版本HashMap

顺序表&链表

在分析 HashMap 源码之前，先来了解一下两个重要的数据结构，顺序表和链表。

HashMap 采用的两者的优点，构成的 HashMap 的 hash 表。

对应的顺序表结构的代表就是数组,而链表结构哦的代表就是链表。

顺序表&链表

Hash 表

HashMap 中 hash 结构

hash 表结构由链表和顺序表组成

hash 表结构

下面是 HashMap 中 hash 表的代码体现。

//顺序表结构
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

//链表结构
static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;
    int hash;
}

链表和数组是如何工作的？

数组是用于存放链表头结点，根据数组的特性，能快速定位 key 对应 value 所在的链表，

链表的特性，能快速增删数据，因此找到对应链表后，将数据 Entry 添加到链表的某一位置。

hash 的原理

根据 key 计算出对应的 hash 值，然后根据 hash 找到数组的索引位置，然后再往链表中添加/删除/查找数据。

如何计算 hash ？为什么要计算 hash 值？

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Hash 值是一个数字摘要，表示一个数据的身份，而在 Object 就定义 hashCode() 方法，用于计算每一个对象的 hash 值。

如何根据 hash 值得到数组索引？

在 HashMap 中提供 indexFor 来计算 hash 对应的数组索引。

两个参数分别表示：h 表示 key 对应的 hash 值，length 表示顺序表 tabl[] 的长度，调用 indexFor 方法就是可以得出 hash 对应在 table 的那个索引index 值。

/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
   // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
   return h & (length-1);//等同于index =  hash % n
}

Hash 碰撞与链表

什么叫 hash 碰撞？

hash 碰撞就是说不同的 key 对象计算出来的数组索引 index 值是一样就表示出现 hash 碰撞。

如果两个对象的 hashcode 一样，会发生什么事？或者你该怎么取值？

如何去解决 hash 碰撞呢？

使用一个单链表来存储 hash 碰撞的元素，用于解决 Hash 碰撞的方案,加入一个 next 记录下一个节点的信息。

举例：例如下图的 A ，B，C 元素对应的 key 计算出来的数组 index 都是一样的，为了存储 A,B,C 数据，就是用链表结构来存储，每一个元素都会有一个 next 节点记录下一个节点的信息，例如 A 节点的 next 就是 B，B 节点的 next 就是 C，C 节点的 next 是 NULL。

hash碰撞

put 和 get 方法实现的底层原理

put() 的工作原理

计算 key 的 hash 值

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

根据 hash 与 table 的 length 计算出对应数组索引

/**
 * Returns index for hash code h.
 */
static int indexFor(int h, int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return h & (length-1);
}

Entry<Key,Value> 插入到链表的头部中(头插法)

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    //将当前的链表 e 赋值给 new Entry 的 next 属性，所以新的节点是插入到链表的头部
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

头插法的原因：后插入的元素，被使用的机会更大，使用头插法，可以更快的遍历到对应的元素。

get()的工作原理

计算 key 的 hash 值——与 put 原理一致
根据 hash 与 table 的 length 计算出对应数组索引index——与 put 原理一致
获取指定 index 对应的 Entry，遍历链表元素

for (Entry<K,V> e = table[indexFor(hash, table.length)];
     e != null;
     e = e.next) {
    Object k;
    if (e.hash == hash &&
        ((k = e.key) == key || (key != null && key.equals(k))))
        return e;
}

hash 表扩容原理

在了解 hash 表扩容前，先来看看下面列举几个比较重要的属性：

默认的数组容量为16

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

默认的加载因子大小0.75f

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

面试题：为什么需要加载因子呢？

·扩容的阈值

初始阈值：threashold = loadFactor * capacity,16*0.75 = 12也就是说，当数组存放的元素达到 12 个时(达到阈值)就要开始扩容。

/**
 * The next size value at which to resize (capacity * load factor).
 * @serial
 */
// If table == EMPTY_TABLE then this is the initial capacity at which the
// table will be created when inflated.
int threshold;

如何 resize 扩容?

当存储数据时，会检测是否达到阈值，如果超过阈值，那么就要开始扩容了。

size >= threshold

void addEntry(int hash, K key, V value, int bucketIndex) {
    //size >= threshold 表示当前的 size 已经达到了阈值了。
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }
    createEntry(hash, key, value, bucketIndex);
}

第一步：开始扩容 resize(2 * table.length) ,2 * table.length 表示扩容后的数组长度。

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }
    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    //重新计算阈值
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

第二步：重新计算旧的 table 数组的每一个元素的在 newTable 的索引值。

/**
 * Transfers all entries from current table to newTable.
 */
void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

第三步：重新计算阈值 threshold

threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);

扩容会产生什么问题呢？

扩容之后，数组的 length 就发生改变，所以之前基于这个 length 计算的每一个元素的数组 index 就失效了，因此需要重新计算整个 hash 表的数据，然后存入一个新的数组中。

JDK 1.8 HashMap 有什么优化？

JDK1.8融入了红黑树来替代链表，也就是说当元素发生 Hash 冲突后不再是用链表来保存相同 index 的节点，
相应的采用红黑树（高性能的平衡树）来保存冲突节点，节点查找优先级由 O（n）-> 提高到了O（logn）。

本文是笔者学习之后的总结，方便日后查看学习，有任何不对的地方请指正。

记录于 2019年5月12号

Java数据结构_JDK1.7HashMap的工作原理

顺序表&链表

Hash 表

HashMap 中 hash 结构

链表和数组是如何工作的？

hash 的原理

如何计算 hash ？为什么要计算 hash 值？

如何根据 hash 值得到数组索引？

Hash 碰撞与链表

什么叫 hash 碰撞？

如何去解决 hash 碰撞呢？

put 和 get 方法实现的底层原理

put() 的工作原理

get()的工作原理

hash 表扩容原理

如何 resize 扩容?

扩容会产生什么问题呢？

JDK 1.8 HashMap 有什么优化？

猜你喜欢

热点阅读