源码的魅力 - HashMap 的工作原理

2017-09-14 本文已影响2060人 Nichool

HashMap 的工作原理（Android7.1源码）

其他相关文章

简介

HashMap 是以散列表形式实现的Map (Key-Value)

初始化

   ...
   //保存数据的散列表
   transient HashMapEntry<K，V>[] table = (HashMapEntry<K，V>[]) EMPTY_TABLE;
   ...
   //实际存在的size个数
   transient int size;
   ...
   //table扩展的阈值
   int threshold;
   //HashMap构造函数中并没有对table分配空间 而是使用EMPTY_TABLE
   public HashMap(int initialCapacity， float loadFactor) {
           if (initialCapacity < 0)
               throw new IllegalArgumentException("Illegal initial capacity: " +
                                                  initialCapacity);
           if (initialCapacity > MAXIMUM_CAPACITY) {
               initialCapacity = MAXIMUM_CAPACITY;
           } else if (initialCapacity < DEFAULT_INITIAL_CAPACITY) {
               initialCapacity = DEFAULT_INITIAL_CAPACITY;
           }

           if (loadFactor <= 0 || Float.isNaN(loadFactor))
               throw new IllegalArgumentException("Illegal load factor: " +
                                                  loadFactor);
           // Android-Note: We always use the default load factor of 0.75f.

           // This might appear wrong but it's just awkward design. We always call
           // inflateTable() when table == EMPTY_TABLE. That method will take "threshold"
           // to mean "capacity" and then replace it with the real threshold (i.e， multiplied with
           // the load factor).
           //注释的意思是在当table为空(也就是当前，刚创建的HashMap就是一个空列表)时inflateTable中会对table散列表进行分配空间
           threshold = initialCapacity;
           //空实现
           init();
       }

新创建的HashMap并没有对table散列表分配内存空间，在后面的put方法中我们将分析具体分配空间的位置以及函数.散列表的存储元素是HashMapEntry。

    /** @hide */  // Android added.
    static class HashMapEntry<K，V> implements Map.Entry<K，V> {
        final K key;
        V value;
        HashMapEntry<K，V> next;
        int hash;

   }

除了Key与Value值之外还有HashMapEntry的引用，这里先简单介绍下这个next值，它链接的对象将会是一个链表的Head或者红黑树的Head，它就是解决HashMap冲突的方法之一 - 链接法。

put 方法

  public V put(K key， V value) {
      if (table == EMPTY_TABLE) {
          inflateTable(threshold);
      }
      if (key == null)
          return putForNullKey(value);
      int hash = sun.misc.Hashing.singleWordWangJenkinsHash(key);
      int i = indexFor(hash， table.length);
      for (HashMapEntry<K，V> e = table[i]; e != null; e = e.next) {
          Object k;
          if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
              V oldValue = e.value;
              e.value = value;
              e.recordAccess(this);
              return oldValue;
          }
      }

      modCount++;
      addEntry(hash， key， value， i);
      return null;
  }

当table为空时，第一次使用put方法时会触发这个table散列表的初始化。
当key是空时，将会插入value，并返回老的数据

通过singleWordWangJenkinsHash方法来获取HashCode.

    ...
    public static int singleWordWangJenkinsHash(Object k) {
        int h = k.hashCode();
        h += (h <<  15) ^ 0xffffcd7d;
        h ^= (h >>> 10);
        h += (h <<   3);
        h ^= (h >>>  6);
        h += (h <<   2) + (h << 14);
        return h ^ (h >>> 16);
    }

实质是通过key的hashCode()，然后再处理得到hash值，这个hash的值很重要。

通过indexFor方法计算得出当前数据在table散列表的索引位置

static int indexFor(int h， int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return h & (length-1);
}

依旧是通过'与'运算来高效计算出索引值，由于length永远是2的倍数或者是0，所以在这里位运算提高了速度（通过与上length - 1可以快速计算出index）。
一般HashMap的平均查找数据时间复杂度(1)，这一个优点主要得益于这个Hashcode的计算，为了更低的冲突率，在前面的singleWordWangJenkinsHash函数中最后一步的h ^ (h >>> 16)，就是将h的高16与低16位异或操作，让hash值尽力不会出现部分位数相同的情况，让indexFor计算更加平均，每一个值对应一个index，减少冲突率。

通过获得的index在散列表指定位置找到HashMapEntry，由于HashMap是使用链接法来解决冲突的，所以如果出现冲突(也就是不同的key得到的index相同)，通过上面我们讲的next值向下查找如果找到一样的数据，则替换并返回，如果不存在则在此处添加数据。

void addEntry(int hash， K key， V value， int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? sun.misc.Hashing.singleWordWangJenkinsHash(key) : 0;
        bucketIndex = indexFor(hash， table.length);
    }

    createEntry(hash， key， value， bucketIndex);
}
//扩充table散列表
void resize(int newCapacity) {
    HashMapEntry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    HashMapEntry[] newTable = new HashMapEntry[newCapacity];
    transfer(newTable);
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor， MAXIMUM_CAPACITY + 1);
}
//将老列表中的数据插入到新数据表中
void transfer(HashMapEntry[] newTable) {
     int newCapacity = newTable.length;
     for (HashMapEntry<K，V> e : table) {
         //此处e代表table中的Entry
         while(null != e) {
             //这个while循环是如果Entry含有next值，将会顺着next向下查找
             HashMapEntry<K，V> next = e.next;
             //计算在新table中的index
             int i = indexFor(e.hash， newCapacity);
             //将当前Entry拷贝到新位置前如果那个位置存在数据
             //则保存到Entry的next中
             e.next = newTable[i];
             //移到新位置
             newTable[i] = e;
             e = next;
         }
     }
}

调用addEntry方法
- 当数据数量达到阈值则要扩展成原先的两倍
- 在resize函数中，当列表的大小已经是最大值，设置阈值为integer的最大值，不再扩展
- 生成一个新的表，然后执行transfer将老表中的数据转换到新表中去。
- 在transfer函数中，先遍历老表table，找出已经有数据的Entry，重新通过indexFor计算在新表中的index，将原先的entry移到新位置，如果原先数据中存在next值则继续顺着next进行移动数据。transfer函数不仅仅是扩展散列表大小那么简单，通过transfer这一步可以将原先已经存在的冲突均匀分散开，这一步可以提高当前HashMap的获取数据的速度，重点就在indexFor方法中的与操作，待会我将来分析为何起到这个作用
- transfer完数据后，更新阈值.
- 结束了resize方法后，重新计算bucketIndex，然后通过createEntry来插入数据.

indexFor 的神奇作用

//计算公式
h & (length-1);

//假设一个Key的hashCode是  0000 0000 0000 0000 0000 0001 1111 1111
//另一个Key的hashCode是    0000 0000 0000 0000 0000 0000 1111 1111
//length正好是256 也就是2的八次方                    0001 0000 0000
//length - 1等于                                   0000 1111 1111
//执行与运算
//两个index一样 index1 = index2 = 255;
//当resize()函数执行翻倍时
//length正好是512 也就是2的九次方                    0010 0000 0000
//length - 1等于                                   0001 1111 1111
//执行与运算
//index1 等于                                      0001 1111 1111
//index2 等于                                      0000 1111 1111
//两者不相等了

通过上面的注释的计算介绍，可以很清晰的看到原本冲突的两个key，通过扩充后，并且只要一个indexFor的函数，执行相与操作就可以将冲突完美化解。

get 方法

  public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K，V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }


    final Entry<K，V> getEntry(Object key) {
          if (size == 0) {
              return null;
          }
        int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);
        for (HashMapEntry<K，V> e = table[indexFor(hash， table.length)]; e != null; e = e.next) {
              Object k;
              if (e.hash == hash &&
                  ((k = e.key) == key || (key != null && key.equals(k))))
                  return e;
          }
          return null;
      }

get方法很简单，就是通过key直接找数据，第一步获取hash值，通过indexFor获取数据在数组的位置，然后遍历，如果没有冲突的话，直接可以获取到数据并退出遍历，存在冲突就需要在next的链表中查找。

冲突的坏处

这个标题段也许是多余的吧，希望不明白的人可以知道下吧。
HashMap的高效率依靠的就是通过HashCode散列式插入到表的不同位置，当不存在冲突的时候，get()查找可以是(1)的时间复杂度，直接就可以取到数据，如果存在冲突就必须沿着next的链表一个一个查询比对，效率大大降低。

题外话

Java8中的HashMap源码中，在解决冲突部分，使用了红黑树与链表替换使用的方式来管理冲突的数据，提高冲突时的get(object)搜索速度，当冲突数据少时用链表，大时使用红黑树。
总是HashMap是出了名的用空间换时间的数据结构，也是常用的数据结构，但是内存使用率低是它致命的弱点，为此Android有一个ArrayMap数据结构在一定程度上来替代它，下面的章节中我将分析ArrayMap这个数据结构，讲解什么时候使用ArrayMap什么时候使用HashMap。

更多好文章请关注微信公众号【Android技术栈】，微信公众号刚刚创建不久希望大家多多支持。

Android技术栈

源码的魅力 - HashMap 的工作原理

HashMap 的工作原理（Android7.1源码）

猜你喜欢

热点阅读