HashMap源码解析

2021-03-13 本文已影响0人如愿以偿丶

1.简介

HashMap它是基于哈希表的Map接口的实现，是以key-value的像是存在，主要存放的是键值对。他不是线程安全的。

&（按位与运算）：如果相同二进制数位上，数字都是1的时候为1，否则为0
^ （按位异或运算）：如果相同二进制数位上，数字相同，结果为0，不同为1
| （按位或运算）：如果相同二进制数位上，数字都是1的时候为1，否则为0

2.HashMap数据结构（数据存储方式）

HashMap它内部是以数组 + 链表 + 红黑树三种数据结构进行存储。
ArrayList它内部是以数组进行存储
LinkList它内部是以双向链表进行存储。

2.1.HashMap的存储流程：

HashMap首先是以数组进行存储，通过存储元素的key的hashCode值进行hash计算。获取存储的索引值，如果数组索引位置没有元素，那么直接进行存储，如果数组索引值位置已经有元素，那么就会通过进行对比两元素的hashCode值，如果hashCode值不相等，那么就会在数组索引值上创建链接，进行存储，如果hashCode值相等，那么会通过equals方法获取内容信息进行对比，如果不相等，存储在链表上，如果相等，那么就会进行覆盖，并赋予新的value值。

image.png

3.HashMap集合类成员

    /**
     * 默认的初始容量-必须是2的n次幂。如果不指定默认为16
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * 最大的容量必须是2的幂<= 1<<30，也就是2^30次幂
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * 默认的加载因子0.75（用于扩容计算）
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    
    /**
     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     * 翻译：因为树节点的大小是普通节点的两倍，我们仅当垃圾箱（数组角标）包含足够的节点时才使用
     * (见TREEIFY_THRESHOLD)。当它们变得太小(由于移除或调整大小)它们被转换回普通箱子。在
     * 使用分布良好的用户hashCodes, tree bins很少使用。在理想情况下，在随机hashCodes下，
     * 频率为bin中的节点遵循泊松分布
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006    能达到8个节点的概率
     *
     * 红黑树的最大阈值，当链表节点数超过8时，并且数组长度要大于等于64，就会转为红黑树，他是根据泊松分布图
     * 由文档可知，转为红黑树的概率极低，其实设计者也不想让转为红黑树，毕竟树节点占用内存的大小是普通节点的两倍
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * 当红黑树长度小于6时，重新转为链表
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * 当节点数大于8并且数组长度大于等于64，会转为红黑树
     */
    static final int MIN_TREEIFY_CAPACITY = 64;
    
    /**
     * Hashmap中存储键值对的个数，不是table的长度
     */
    transient int size;

    /**
     * 用来记录Hashmap的修改次数，每次扩容和修改map结构的计数器
     */
    transient int modCount;


    /**
     * 用来扩容Hashmap容量大小下一个容量。临界值计算方式 =  map容量 *  负载因子
     * 也就是说当容量达到临界值，就会进行扩容
     */
    int threshold;

    /**
     * 负载因子
     */
    final float loadFactor;

4.HashMap构造方法分析：


    public HashMap() {
        // 设置负载因子，默认为0.75
        this.loadFactor = DEFAULT_LOAD_FACTOR;
    }
    
    //指定容量大小
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    
    public HashMap(int initialCapacity, float loadFactor) {
        //判断初始化容量是否小于0
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        //判断初始化容量是否大于最大容量2^30次幂
        if (initialCapacity > MAXIMUM_CAPACITY)
            //超过最大容量，就将容量设置为最大容量
            initialCapacity = MAXIMUM_CAPACITY;
        //判断负载因子是否小于0 或者 负载因子是否是一个非数值
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        //指定负载因子
        this.loadFactor = loadFactor;

        //指定扩容阈值。（注意：这里还并没有 * 负载因子，最终会在put方法中进行赋值）
        this.threshold = tableSizeFor(initialCapacity);
    }
    
    //计算容量大小，当容量不是2的n次幂时，会将之设置为大于指定容量的最小2的次幂数。
    static final int tableSizeFor(int cap) {
        //为什么会先减去1呢，如果不减去1时，那么指定的值刚好为2的n次幂的话，计算的结果会是指定值的双倍。
        //比如：指定的8，不减去1，计算结果就是16
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

5.HashMap的put方法：


    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;

        //默认数组长度tab为null，
        if ((tab = table) == null || (n = tab.length) == 0)
            //默认情况都未0，先扩容大小
            n = (tab = resize()).length;
        //（n - 1) & hash通过数组长度-1&hash值获取数组索引值，如果当前位置为空
        if ((p = tab[i = (n - 1) & hash]) == null)
            //数组索引位置，创建一个Node节点，并赋值
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            //如果数组位置有值，对比元素的hash值，key值是否相等或者key的equals值是否相等。如果都相等，直接替换
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //判断P是否是树节点
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            //表示链表上可以添加节点
            else {
                //遍历链表
                for (int binCount = 0; ; ++binCount) {
                    //如果链表为空，直接添加
                    if ((e = p.next) == null) {
                        //表示没有链表，创建链表并赋值
                        p.next = newNode(hash, key, value, null);
                        //判断链表节点长度 >= 7，如果大于，进行判断数组长度是否超过64，未超过进行扩容，否则转换为红黑树
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //对比hash值，和key的equals方法内容是否相等，相等直接赋值，否则添加节点
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            //当key相同时，进行value值替换，并返回老的value值
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        //存储数组的修改次数，和扩容次数
        ++modCount;
        //如果数组元素个数大于扩容边界值，进行扩容。
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }


   /**
    * 扩容方法
    */
   final Node<K,V>[] resize() {
        //table为数组长度，默认为null
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        //默认请款都为0
        if (oldCap > 0) {
            //当oldCap是否大于最大值2^30
            if (oldCap >= MAXIMUM_CAPACITY) {
                //扩容边界值，也为最大。也就是说不能在进行扩容了
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            //新的长度 = 老的数组长度左移1位，也就是老的长度的2倍大小。
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                //新的扩容边界值 = 为老的边界值左移1位，也是老边界值得2倍，比如12变为24
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        //首次默认会进入，进行扩容
        else {               // zero initial threshold signifies using defaults
            //默认数组长度为16
            newCap = DEFAULT_INITIAL_CAPACITY;
            //扩展边界值为数组长度 16 *  负载因子0.75
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        //将扩展边界值赋值
        threshold = newThr;
        //创建一个数组，长度为扩容的长度。
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        //赋值给table
        table = newTab;
        
        if (oldTab != null) {
            //下边是遍历数组,重新赋值数据到扩容后的数组上。
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

   /**
    * 该方法判断是否需要进行扩容，如果数组长度大于等于就进行扩容，否则就转换为红黑树。
    */
   final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        //如果tab不等于空，并且<64,此时就进行扩容
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        //否则该索引转换为红黑树
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            TreeNode<K,V> hd = null, tl = null;
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                //红黑树的转换方法
                hd.treeify(tab);
        }
    }

4.面试问题：

4.1.为什么HashMap的长度必须是2的n次幂？如果不是2的n次幂，比如为10会怎样？

答：为了防止哈希碰撞，提高性能。所以HashMap数组的长度一定是2的n次幂，如果不是2的n次幂，会通过右移，按位或运算将长度变为指定容量大的最小的2的n次幂。

   数组索引 = hash值&(length - 1)
   例如：长度为8的时候， 4&(8-1)=4  2&(8-1)=2
       hashCode值：  4     00000100
       length-1      7     00000111
       --------------------------------
       索引值：            00000100   4
    
       hashCode值：  2     00000010
       length-1      7     00000111
       --------------------------------
       索引值：            00000010   2
      总结：不容易产生哈希碰撞。
    
    例如：长度为10的时候
       hashCode值：  4     00000100
       length-1      9     00001001
       --------------------------------
       索引值：            00000000   0
    
       hashCode值：  2     00000010
       length-1      9     00001001
       --------------------------------
       索引值：            00000000   0
      总结：hash值相等，就会产生哈希碰撞。
    
    //会通过我们指定的值的无符号右移，与运算计算出长度。
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

4.2.为什么指定容量计算时先减1后，才进行右移和与运算？

答：当我们不减1直接进行运算是，如果我们指定的值刚好为2的n次幂，那么计算出来的容量就会为指定的2倍。

4.3.HashMap中hash函数是怎么实现的？还有那些hash函数的实现方式？

答：通过源码可知，他是通过元素的key的hashCode值进行无符号向右移16位。我们也可以直接通过对key的hashCode值进行取余（%）。key.hashCode()%16，只是该方式效率不如无符号向右移动高。

   static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

4.4.何时会发生哈希碰撞？

答：当通过元素key的hashCode值进行位计算后，如果hash值相等，就会产生哈希碰撞，在JDK8之前采用链表解决，JDK8之后采用链表+红黑树解决。

4.5.引用红黑树的目的是什么？

答：引用红黑树的目的就是为了提高查找效率，通过文档说明，红黑树的节点占用内存大小是普通节点的2倍，都有好有坏。

4.6.当两个元素的hashCode值相等会怎样？

答：如果hashCode值相等，那么就会获取key的equals方法，获取内容信息，对比两个元素内容信息是否相同，如果相同，那么就直接覆盖，赋值新的Value值，如果不同，那么直接添加在链表上。

4.7.负载因子为什么是0.75呢，为什么不是0.3，0.9呢？

答：数值大的话，数组元素存储较多，发生哈希冲撞的几率增大，查找元素效率低。数值小的话，数组元素存储的个数较少，就会产生扩容，造成空间的浪费。

比如：如果为0.3，16 * 0.3 = 4.8，也就是说存储4个元素就会进行扩容，还有12个数组空间被浪费，太小了不好。如果是0.9，16 * 0.9 = 14.4，也就是说存储14个元素才进行扩容，基本块存储满了，这样会出现哈希冲突的几率非常大，所以说为我们选用了一个最优值0.75,不建议指定该值。

。