JDK1.7 HashMap源码解读
之前分析过JDK8的源码,对于共同的内容就不再赘述。
首先看hashmap的继承关系:
可以看出与JDK8变化不大。
那我们从头开始读吧。
/**
* holds values which can't be initialized until after VM is booted.
*/
private static class Holder {
/**
* Table capacity above which to switch to use alternative hashing.
*/
static final int ALTERNATIVE_HASHING_THRESHOLD;
static {
String altThreshold = java.security.AccessController.doPrivileged(
new sun.security.action.GetPropertyAction(
"jdk.map.althashing.threshold"));
int threshold;
try {
threshold = (null != altThreshold)
? Integer.parseInt(altThreshold)
: ALTERNATIVE_HASHING_THRESHOLD_DEFAULT;
// disable alternative hashing if -1
if (threshold == -1) {
threshold = Integer.MAX_VALUE;
}
if (threshold < 0) {
throw new IllegalArgumentException("value must be positive integer.");
}
} catch(IllegalArgumentException failed) {
throw new Error("Illegal value for 'jdk.map.althashing.threshold'", failed);
}
ALTERNATIVE_HASHING_THRESHOLD = threshold;
}
}
先看这段源码,这段是JDK7内的第一段代码.作为静态内部类,这个类的作用只有一个,那就是初始化ALTERNATIVE_HASHING_THRESHOLD值。看其过程可知,是为了解析出初始的临界值,超出这个值map则会再hash扩大。并且在调用的地方也写明了,只会在jvm启动之后才会调用。对于这段代码stackoverflow还有一个提问:http://stackoverflow.com/questions/29918624/what-is-the-use-of-holder-class-in-hashmap
看一下hashmap最重要的一个构造方法:
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
threshold = initialCapacity;
init();
}
这个构造方法就是为了给负载因子和临界值赋值。对于我们一般常用的new HashMap();的初始是多少呢?
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
注释中写的很清楚了。
这里还有一个init方法,看源码可以看出是一个空方法,那么是不是有点奇怪呢?其实这个是在hashmap中是用不着的,但是在linkedHashMap中会覆写这个方法。和jdk8有点类似。
看一下对于定位元素的核心2个方法:
/**
* Retrieve object hash code and applies a supplemental hash function to the
* result hash, which defends against poor quality hash functions. This is
* critical because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
final int hash(Object k) {
int h = hashSeed;
if (0 != h && k instanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
}
h ^= k.hashCode();
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
// assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
return h & (length-1);
}
先说indexFor,看注释:length must be a non-zero power of 2,意思是length必定是被2除尽,也就是这里的length肯定是2的幂次方,好处可以自行查询。这个indexFor的含义其实是定位出这个hash值位于HashMap的哪个bucket中,等同于h%length。
对于hash方法,这里引入了hashSeed,为了降低hash冲突的概率。若入参为String且hashSeed不为0,即需要动态生成新的hash值。(加入这段分支代码,是为了解决jdk7u40以下的一个bug,有兴趣可以单独了解)
否则的话会走下面的分支。下面这段分支的含义是为了解决hash冲突而做的二次hash值,即将高位的hash值与低位进行异或操作,降低冲突的概率。
看完了定位元素,那么我们再来看获取元素:
/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code (key==null ? k==null :
* key.equals(k))}, then this method returns {@code v}; otherwise
* it returns {@code null}. (There can be at most one such mapping.)
*
* <p>A return value of {@code null} does not <i>necessarily</i>
* indicate that the map contains no mapping for the key; it's also
* possible that the map explicitly maps the key to {@code null}.
* The {@link #containsKey containsKey} operation may be used to
* distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
if (key == null)
return getForNullKey();
Entry<K,V> entry = getEntry(key);
return null == entry ? null : entry.getValue();
}
/**
* Offloaded version of get() to look up null keys. Null keys map
* to index 0. This null case is split out into separate methods
* for the sake of performance in the two most commonly used
* operations (get and put), but incorporated with conditionals in
* others.
*/
private V getForNullKey() {
if (size == 0) {
return null;
}
for (Entry<K,V> e = table[0]; e != null; e = e.next) {
if (e.key == null)
return e.value;
}
return null;
}
/**
* Returns the entry associated with the specified key in the
* HashMap. Returns null if the HashMap contains no mapping
* for the key.
*/
final Entry<K,V> getEntry(Object key) {
if (size == 0) {
return null;
}
int hash = (key == null) ? 0 : hash(key);
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}
取元素的代码分为3段,主要看最下面方法getEntry。
Entry<K,V> e = table[indexFor(hash, table.length)]这段是定位到该元素在hashmap是位于哪个bucket内。jdk中的hashmap是数组+链表的结构,会根据key的hash值映射到对应的数组元素内,但是这个数组元素不是存储一个对象,而是维护一个链表。每个数组元素被称之为一个bucket。在取的时候里面会判断是否为同一个key。判断的逻辑有2层,注意体会。
put方法,则更复杂一点。
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V put(K key, V value) {
if (table == EMPTY_TABLE) {
inflateTable(threshold);
}
if (key == null)
return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
/**
* Offloaded version of put for null keys
*/
private V putForNullKey(V value) {
for (Entry<K,V> e = table[0]; e != null; e = e.next) {
if (e.key == null) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(0, null, value, 0);
return null;
}
/**
* Adds a new entry with the specified key, value and hash code to
* the specified bucket. It is the responsibility of this
* method to resize the table if appropriate.
*
* Subclass overrides this to alter the behavior of put method.
*/
void addEntry(int hash, K key, V value, int bucketIndex) {
if ((size >= threshold) && (null != table[bucketIndex])) {
resize(2 * table.length);
hash = (null != key) ? hash(key) : 0;
bucketIndex = indexFor(hash, table.length);
}
createEntry(hash, key, value, bucketIndex);
}
/**
* Like addEntry except that this version is used when creating entries
* as part of Map construction or "pseudo-construction" (cloning,
* deserialization). This version needn't worry about resizing the table.
*
* Subclass overrides this to alter the behavior of HashMap(Map),
* clone, and readObject.
*/
void createEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
size++;
}
inflateTable这个方法就不说了,主要是初始化table。在jdk8已废弃,我就没细看了。
put时会先判断put的key是否在bucket存在,若存在,则更新value,返回旧的value。否则的话,则会去新增一个bucket。
若map存储空间不够时(大于临界值threshold),则会进行扩容。看一下源码:
/**
* Rehashes the contents of this map into a new array with a
* larger capacity. This method is called automatically when the
* number of keys in this map reaches its threshold.
*
* If current capacity is MAXIMUM_CAPACITY, this method does not
* resize the map, but sets threshold to Integer.MAX_VALUE.
* This has the effect of preventing future calls.
*
* @param newCapacity the new capacity, MUST be a power of two;
* must be greater than current capacity unless current
* capacity is MAXIMUM_CAPACITY (in which case value
* is irrelevant).
*/
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
transfer(newTable, initHashSeedAsNeeded(newCapacity));
table = newTable;
threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
/**
* Transfers all entries from current table to newTable.
*/
void transfer(Entry[] newTable, boolean rehash) {
int newCapacity = newTable.length;
for (Entry<K,V> e : table) {
while(null != e) {
Entry<K,V> next = e.next;
if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
}
}
}
先判断容量是否已达最大值(1<<30),若已达系统默认最大值,则直接赋为Integer.MAX_Value.未达最大值,则会走下面的transfer逻辑。
transfer的逻辑是将原map各bucket的值,按照新的策略,重新放入各bucket内。但是由于HashMap是非线程安全的,在这一步可能会出现环,那么在get的时候就会出现oom异常。一个例子可以看这个:http://www.cnblogs.com/ITtangtang/p/3966467.html
移除元素的源码如下:
/**
* Removes the mapping for the specified key from this map if present.
*
* @param key key whose mapping is to be removed from the map
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V remove(Object key) {
Entry<K,V> e = removeEntryForKey(key);
return (e == null ? null : e.value);
}
/**
* Removes and returns the entry associated with the specified key
* in the HashMap. Returns null if the HashMap contains no mapping
* for this key.
*/
final Entry<K,V> removeEntryForKey(Object key) {
if (size == 0) {
return null;
}
int hash = (key == null) ? 0 : hash(key);
int i = indexFor(hash, table.length);
Entry<K,V> prev = table[i];
Entry<K,V> e = prev;
while (e != null) {
Entry<K,V> next = e.next;
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k)))) {
modCount++;
size--;
if (prev == e)
table[i] = next;
else
prev.next = next;
e.recordRemoval(this);
return e;
}
prev = e;
e = next;
}
return e;
}
核心也是在下面的removeEntryForKey方法上。步骤也是类似的,先定位到bucket,然后对这个bucket进行循环,找到该key对应的元素,将其指向下一个,当前元素没有引用了,会被GC回收掉。
最后看一下类图中继承的cloneable接口的实现:
/**
* Returns a shallow copy of this <tt>HashMap</tt> instance: the keys and
* values themselves are not cloned.
*
* @return a shallow copy of this map
*/
public Object clone() {
HashMap<K,V> result = null;
try {
result = (HashMap<K,V>)super.clone();
} catch (CloneNotSupportedException e) {
// assert false;
}
if (result.table != EMPTY_TABLE) {
result.inflateTable(Math.min(
(int) Math.min(
size * Math.min(1 / loadFactor, 4.0f),
// we have limits...
HashMap.MAXIMUM_CAPACITY),
table.length));
}
result.entrySet = null;
result.modCount = 0;
result.size = 0;
result.init();
result.putAllForCreate(this);
return result;
}
注释里已经声明了这个clone方法只是一个浅复制。一般是不推荐直接clone的。