Java的equals和hashCode方法

2019-04-27 本文已影响0人 siriusing

对于equals和hashCode 方法，Java建议在重写时应当满足规定：

如果两对象equals为true，hashCode也应该一致
重写equals就必须重写hashCode

在map中，关于equals和hashCode存在：

hashCode一致，equals不一致，对应同一个Node，这个Node上维护链表。
hashCode和equals都一致，认为是同一个对象
hashCoed 不一致，存在两个Node上。

String hashCode

对于String类， hash值就是对每个字符，关于31 做散列。String的父类是Object，因为Override了hashCode，所以也重写equals，逐个字符判断是否相等。

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {

    private final char value[];

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }
    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

Object的hashCode

对于不同的对象Object.hashCode ，根据默认的算法，会根据对象地址返回不同的hashCode。

hashCode的实现，Java中约定：

一致性：当对象的属性没有发生改变时，无论何时调用，都应该返回相同的结果。
如果 a.equals(b)，那么a.hashCode()==b.hashCode()
如果 !a.equals(b),那么 a的hashCode 不需要一定不等于b.hashCode()，但是如果这样，可以提高性能。


    /**
     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * {@link java.util.HashMap}.
     * <p>
     * The general contract of {@code hashCode} is:
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     * <li>If two objects are equal according to the {@code equals(Object)}
     *     method, then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     * </ul>
     * <p>
     * As much as is reasonably practical, the hashCode method defined by
     * class {@code Object} does return distinct integers for distinct
     * objects. (This is typically implemented by converting the internal
     * address of the object into an integer, but this implementation
     * technique is not required by the
     * Java&trade; programming language.)
     *
     * @return  a hash code value for this object.
     * @see     java.lang.Object#equals(java.lang.Object)
     * @see     java.lang.System#identityHashCode
     */
    public native int hashCode();

JVM 中 Object.hashCode 的实现细节

实现Object.hashCode

在 jdk-8\src\share\native\java\lang\Object.c，可以看到前面有一段：

static JNINativeMethod methods[] = {
    {"hashCode",    "()I",                    (void *)&JVM_IHashCode},
    {"wait",        "(J)V",                   (void *)&JVM_MonitorWait},
    {"notify",      "()V",                    (void *)&JVM_MonitorNotify},
    {"notifyAll",   "()V",                    (void *)&JVM_MonitorNotifyAll},
    {"clone",       "()Ljava/lang/Object;",   (void *)&JVM_Clone},
};

JNIEXPORT void JNICALL
Java_java_lang_Object_registerNatives(JNIEnv *env, jclass cls)
{
   
    (*env)->RegisterNatives(env, cls,
                            methods, sizeof(methods)/sizeof(methods[0]));
}

这里将 hashCode() 映射到 (void *)&JVM_IHashCode 这个函数。

关于(void *)&JVM_IHashCode 的定义，在openJDK中。(jdk-8\src\share\javavm\export\jvm.h)

/*
 * java.lang.Object
 */
JNIEXPORT jint JNICALL
JVM_IHashCode(JNIEnv *env, jobject obj);

作为JVM 的implement，hotspot8实现了这个方法。 (hotspot-8\src\share\vm\prims\jvm.cpp)

// java.lang.Object ///////////////////////////////////////////////

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
  JVMWrapper("JVM_IHashCode");
  // as implemented in the classic virtual machine; return 0 if object is NULL
  return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
JVM_END

如果对象的handle 为null，返回0，否则调用 FastHashCode。继续跟进去，看到最后生成的逻辑。直接看到hash值的生成过程,在 get_next_hash 方法，就是详细的hash逻辑。(hotspot-8\src\share\vm\runtime\synchronizer.cpp)：

// hashCode() generation :
//
// Possibilities:
// * MD5Digest of {obj,stwRandom}
// * CRC32 of {obj,stwRandom} or any linear-feedback shift register function.
// * A DES- or AES-style SBox[] mechanism
// * One of the Phi-based schemes, such as:
//   2654435761 = 2^32 * Phi (golden ratio)
//   HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ;
// * A variation of Marsaglia's shift-xor RNG scheme.
// * (obj ^ stwRandom) is appealing, but can result
//   in undesirable regularity in the hashCode values of adjacent objects
//   (objects allocated back-to-back, in particular).  This could potentially
//   result in hashtable collisions and reduced hashtable efficiency.
//   There are simple ways to "diffuse" the middle address bits over the
//   generated hashCode values:
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
      // 直接用内存地址
     value = cast_from_oop<intptr_t>(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

hashcode=1，使用最简单的随机算法。
hashcode=4，直接使用的内存地址。
默认使用的是hashcode>=5的 xor-shift scheme算法。可以用JVM parameter -XX:hashCode来调整。

xor-shift scheme使用位移和异或运算生成随机数, 所以运算速度非常快(移位指令需要的机器周期更少).

Set 的 hashCode

对于集合类，就是每个对象的hashCode的累加。这样做的好处是：顺序无关。


public abstract class AbstractSet<E> extends AbstractCollection<E> implements Set<E> {

    public int hashCode() {
        int h = 0;
        Iterator<E> i = iterator();
        while (i.hasNext()) {
            E obj = i.next();
            if (obj != null)
                h += obj.hashCode();
        }
        return h;
    }
}

List 的 hashCode

用31做散列，做到了顺序相关。

public abstract class AbstractList<E> extends AbstractCollection<E> implements List<E> {

    public int hashCode() {
        int hashCode = 1;
        for (E e : this)
            hashCode = 31*hashCode + (e==null ? 0 : e.hashCode());
        return hashCode;
    }
}

Map 的 hashCode

map 其实就是set类的包装类，典型的装饰器模式。和 set类似，hashCode是每个entry的hashCode 总和。

public abstract class AbstractMap<K,V> implements Map<K,V> {
    
    public abstract Set<Entry<K,V>> entrySet();

    public int hashCode() {
        int h = 0;
        Iterator<Entry<K,V>> i = entrySet().iterator();
        while (i.hasNext())
            h += i.next().hashCode();
        return h;
    }
}

对于HashSet，实现了特殊的Entry，也就是Node节点。每个Entry的 hash都是key.hashCode*value.hashCode。而在ConcurrentHashMap中，也是如此。

    static class Node<K,V> implements Map.Entry<K,V> {
       
        public final int hashCode() {

            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

总结：

hashCode 和equals的约束，不过是建议性的，或者说是约定，并不强制。
hashCode相等，并不能表示两个对象一致，正确的做法是重写 equals方法
大多数类的hashCode方法都和Object的hashCode方法相关
对于map的元素，每个item，如equals相同，而hashCode不同，有利于散列。
Object类的hashCode函数是native函数，为了避免频繁做JNI调用，大多数类对hash值做了缓存。