重写equals 方法一定要重写hashcode 吗？

2019-01-06 本文已影响120人 zh_harry

equals & hashCode 方法

java 是oop的，而Object 对象是所有java对象的父类，其中equals 和hashCode就是这个对象的两个方法，可见其的重要性。

hashCode到底是什么？

在《白话 Synchronized》https://www.jianshu.com/p/dd87f60ff37c 一文中，笔者提到过hashcode 其本质是一个随机数。

hash contains the identity hash value: largest value is
//    31 bits, see os::random()

我们查看os::random 方法发现

long os::random() {
  /* standard, well-known linear congruential random gener�ator with
   * next_rand = (16807*seed) mod (2**31-1)
   * see
   * (1) "Random Number Generators: Good Ones Are Hard to Find",
   *      S.K. Park and K.W. Miller, Communications of the ACM 31:10 (Oct 1988),
   * (2) "Two Fast Implementations of the 'Minimal Standard' Random
   *     Number Generator", David G. Carta, Comm. ACM 33, 1 (Jan 1990), pp. 87-88.
  */

LCG 随机算法: https://en.wikipedia.org/wiki/Linear_congruential_generator

hashCode() 的取值

JNIEXPORT jint JNICALL
Java_java_lang_System_identityHashCode(JNIEnv *env, jobject this, jobject x)
{
    return JVM_IHashCode(env, x);
}


JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
  JVMWrapper("JVM_IHashCode");
  // as implemented in the classic virtual machine; return 0 if object is NULL
  return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
JVM_END

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
//偏向锁
  if (UseBiasedLocking) {
    // NOTE: many places throughout the JVM do not expect a safepoint
    // to be taken here, in particular most operations on perm gen
    // objects. However, we only ever bias Java instances and all of
    // the call sites of identity_hash that might revoke biases have
    // been checked to make sure they can handle a safepoint. The
    // added check of the bias pattern is to avoid useless calls to
    // thread-local storage.
    if (obj->mark()->has_bias_pattern()) {
      // Box and unbox the raw reference just in case we cause a STW safepoint.
      Handle hobj (Self, obj) ;
      // Relaxing assertion for bug 6320749.
      assert (Universe::verify_in_progress() ||
              !SafepointSynchronize::is_at_safepoint(),
             "biases should not be seen by VM thread here");
      BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
      obj = hobj() ;
      assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
    }
  }

  // hashCode() is a heap mutator ...
  // Relaxing assertion for bug 6320749.
  assert (Universe::verify_in_progress() ||
          !SafepointSynchronize::is_at_safepoint(), "invariant") ;
  assert (Universe::verify_in_progress() ||
          Self->is_Java_thread() , "invariant") ;
  assert (Universe::verify_in_progress() ||
         ((JavaThread *)Self)->thread_state() != _thread_blocked, "invariant") ;

  ObjectMonitor* monitor = NULL;
  markOop temp, test;
  intptr_t hash;
  markOop mark = ReadStableMark (obj);

  // object should remain ineligible for biased locking
  assert (!mark->has_bias_pattern(), "invariant") ;

  if (mark->is_neutral()) {
    hash = mark->hash();              // this is a normal header
    //正常对象
    if (hash) {                       // if it has hash, just return it
      return hash;
    }
    hash = get_next_hash(Self, obj);  // allocate a new hash code
    temp = mark->copy_set_hash(hash); // merge the hash code into header
    // use (machine word version) atomic operation to install the hash
    test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);
    if (test == mark) {
      return hash;
    }
    // If atomic operation failed, we must inflate the header
    // into heavy weight monitor. We could add more code here
    // for fast path, but it does not worth the complexity.
    //重量级锁
  } else if (mark->has_monitor()) {
    monitor = mark->monitor();
    temp = monitor->header();
    assert (temp->is_neutral(), "invariant") ;
    hash = temp->hash();
    if (hash) {
      return hash;
    }
    // Skip to the following code to reduce code size
  } else if (Self->is_lock_owned((address)mark->locker())) {
 //轻量级锁
    temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
    assert (temp->is_neutral(), "invariant") ;
    hash = temp->hash();              // by current thread, check if the displaced
    if (hash) {                       // header contains hash code
      return hash;
    }
    // WARNING:
    //   The displaced header is strictly immutable.
    // It can NOT be changed in ANY cases. So we have
    // to inflate the header into heavyweight monitor
    // even the current thread owns the lock. The reason
    // is the BasicLock (stack slot) will be asynchronously
    // read by other threads during the inflate() function.
    // Any change to stack may not propagate to other threads
    // correctly.
  }

  // Inflate the monitor to set hash code
  monitor = ObjectSynchronizer::inflate(Self, obj);
  // Load displaced header and check it has hash code
  mark = monitor->header();
  assert (mark->is_neutral(), "invariant") ;
  hash = mark->hash();
  if (hash == 0) {
    hash = get_next_hash(Self, obj);
    temp = mark->copy_set_hash(hash); // merge hash code into header
    assert (temp->is_neutral(), "invariant") ;
    test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
    if (test != mark) {
      // The only update to the header in the monitor (outside GC)
      // is install the hash code. If someone add new usage of
      // displaced header, please update this code
      hash = test->hash();
      assert (test->is_neutral(), "invariant") ;
      assert (hash != 0, "Trivial unexpected object/monitor header usage.");
    }
  }
  // We finally get the hash
  return hash;
}

因为对象的word - mark 会随着锁升级，存储位置会发生改变，具体改变规则见笔者上一博文《白话 Synchronized》，所以通过以上代码我们发现获取hashcode 时也需要判断不同的锁级别。

hashcode 方法的官方文档

 /**
     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * 返回object 的hash code,这个方法为了支持hash table 的益处，比如象hashmap 所提供的。
     * {@link java.util.HashMap}.
     * <p>
     * The general contract of {@code hashCode} is:
     *  hash code 的常规合约是
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application,
     *
     *     在一个应用程序执行期间(进程),同一个对象，任意时间，不管被执行多少次,
     *
     *     the {@code hashCode} method
     *     must consistently return the same integer,
     *
     *     hashCode 方法必须一致地返回相同的integer.
     *
     *     (provided 引导条件状语从句)
     *     provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *
     *     前提是参与对象比较的信息没有被修改。
     *
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     *
     *     同一个应用程序的不同进程，不需要保持hashCode的一致.
     *
     * <li>If two objects are equal according to the {@code equals(Object)}
     *     method, then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     *
     *     如果两个对象通过equals 方法比较后是"true"(是相等的),那么这两个对象必须生成相同的integer 结果。
     *
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     *     如果两个对象通过"equals"方法的结果是不相等，那么两个不同的对象不必生成相同的integer结果。
     *     然后作为程序员应该知道为不相等的对象生成不同的integer 结果会提升hash tables 的性能。
     * </ul>
     * <p>
     * As much as is reasonably practical, the hashCode method defined by
     * class {@code Object} does return distinct integers for distinct
     * objects. (This is typically implemented by converting the internal
     * address of the object into an integer, but this implementation
     * technique is not required by the
     * Java&trade; programming language.)
     *
     * 尽可能的合理实践，"Object" 对象的hashcode 方法的实现为不同的对象返回了不同的integer。
     * （典型的实现是将对象的内部地址转换为integer,但是这种实现技术不需要java 语言）
     *
     * @return  a hash code value for this object.
     * @see     java.lang.Object#equals(java.lang.Object)
     * @see     java.lang.System#identityHashCode
     */
    public native int hashCode();

从官方文档读到以下信息

hashCode 用途: 对hash table 有好处
jdk 对hashCode的三个规约(见上文翻译)
hashCode vs 地址
1. hashCode 并非对象地址
2. hashCode 和地址之间并无函数关系
3. hashCode 和地址之间只是映射关系（由c实现）
  4.实际上不同对象的hashCode 有可能相同

package com.sparrow.jdk.os;

import java.util.HashSet;
import java.util.Set;

/**
 * @author by harry
 */
public class RandomTest {
    public static void main(String[] args) {
        //return distinct integers for distinct objects
        Set<Integer> s = new HashSet<>();
        int i = 0;
        while (true) {
            Object o = new Object();
            if (!s.contains(o.hashCode())) {
                s.add(o.hashCode());
            } else {
                System.out.println(o.hashCode());
            }
        }
    }
}

equals 方法

 * The {@code equals} method for class {@code Object} implements
     * the most discriminating possible equivalence relation on objects;
     * Object 的equals 的实现方法最具识别能力；
     * that is, for any non-null reference values {@code x} and
     * {@code y}, this method returns {@code true} if and only
     * if {@code x} and {@code y} refer to the same object
     * ({@code x == y} has the value {@code true}).
     * 对于引用的值x 和y，当且仅当x和y引用同一个对象(x==y=true)时，x.eques(y)的结果才为true
     * <p>
     * Note that it is generally necessary to override the {@code hashCode}
     * method whenever this method is overridden, so as to maintain the
     * general contract for the {@code hashCode} method, which states
     * that equal objects must have equal hash codes.
     *
     * 注意：当这个方法被重写时通常需要重写 hashCode 方法，为保持 hashCode 方法的常规合约 即如果两个对象相等，则hash code 一定相同。
     *
     * @param   obj   the reference object with which to compare.
     * @return  {@code true} if this object is the same as the obj
     *          argument; {@code false} otherwise.
     * @see     #hashCode()
     * @see     java.util.HashMap
     */
    public boolean equals(Object obj) {
        return (this == obj);
    }

equals的几个性质本文没有列出,重点看一下Object 方法的具体实现和注意：

 Note that it is generally necessary to override the {@code hashCode}
     * method whenever this method is overridden, so as to maintain the
     * general contract for the {@code hashCode} method, which states
     * that equal objects must have equal hash codes.

笔者理解：这里的提示并非强制，而是建议我们在重写equals方法时重写hashcode。

equals 和hashcode 的关系

通过以上分析，如果只是为了重写equals 方法而重写hashcode，那么两者之间并无直接关系。
那么我们思考，为什么jdk这样提示？
为什么需要hashcode 要满足三大规约？
从hascode的官方文档注释可知，hashcode 为hash table 服务。jdk 中的hash 实现包括hash set ,hash map ,hash table 及concurrent hash map 等。
而hashcode 有利于提高hash table 的性能。为了更进一步分析hash code 与hash table 的关系，我们简单分析下hash map的代码 jdk 1.8

hash map

hash-map.jpg

 /**
     * The default initial capacity - MUST be a power of two.
     * 默认的初始化容量 一定是2的n次幂 下文会解释为什么？
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
    * 最大容量
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     *负载因子
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     * 升级红黑树的阀值
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
    * 降为链表的阀值
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
    *  升级为红黑树的最小表容量阀值
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;//p-->pointer
       //表为空
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
       //表的糟为空 为什么？
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k; e-->exist node
            //这里是本文重点，如何判断是否存在
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                //
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        //default is 8 
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
//这里可能会升级红黑树，但不一定，需要进一步判断
                            treeifyBin(tab, hash);

                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;

        //超过初始容量
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }


 /**
     * Replaces all linked nodes in bin at index for given hash unless
     * table is too small, in which case resizes instead.
     */
    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        //default 64  没过超最小表容量，则不升级红黑树，只是resize
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            TreeNode<K,V> hd = null, tl = null;
   //升级红黑树
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

hash code equals 在hash map的作用

hash map 是通过key 的hashcode 映射到hash map中,所以要保证(这里参见hash code 的三大规约), 如果对象内容相同，其hashcode一定相同，而object 的hashcode 是随机生成（大部分情况下不同），所以这里必须重写，否则hash map 中会存在内容相同的对象。
get put 方法排重时要先判断hash code 是否相同再判断equals方法是否相同，因为这样提高查询的性能。因为hashcode 不同，对象一定不同。

hash map 的容量为什么要求是2 的n次幂

if ((p = tab[i = (n - 1) & hash]) == null)

这里n为hash map 的容量(2的n次幂) hash是当前key的hash code.
通过以上公式可以通过按位与的方式直接获取当前key 所在的糟，性能会更好。
因为2 的n次幂的数都有一个特点，就是二进制数只有一位为1,而减1后的结果是除为1的那一位，其余低于该位全为1,所以取余运算就可以转换成与该数进行按位与即可。
举个例子
如果n=16 2的4次幂 (16-1)二进制为1111 全为1
而hash code 假如为19 二进制为10011 按位与的结果为011即为3 19%16=3
同样的代码在netty中也同样的实现

DefaultEventExecutorChooserFactory

private static final class PowerOfTwoEventExecutorChooser implements EventExecutorChooser {
        private final AtomicInteger idx = new AtomicInteger();
        private final EventExecutor[] executors;

        PowerOfTwoEventExecutorChooser(EventExecutor[] executors) {
            this.executors = executors;
        }

        @Override
        public EventExecutor next() {
            return executors[idx.getAndIncrement() & executors.length - 1];
        }
    }