Equals And HashCode 梳理

2020-07-07 本文已影响0人史小豪

前言

常言道:"基础不牢，地动山摇"。万丈高楼平地起，不管学习什么语言，在高谈阔论框架和语言应用的同时，也不能忘了语言基础。
本文是关于Equals和HashCode关系梳理的一篇文章，会主要从定义、联系、使用这三个方面围绕展开。

正文

定义

equals()

equals()是object里的方法，话不多说，先直接上源码：

   /**
     * Indicates whether some other object is "equal to" this one.
     * <p>
     * The {@code equals} method implements an equivalence relation
     * on non-null object references:
     * <ul>
     * <li>It is <i>reflexive</i>: for any non-null reference value
     *     {@code x}, {@code x.equals(x)} should return
     *     {@code true}.
     * <li>It is <i>symmetric</i>: for any non-null reference values
     *     {@code x} and {@code y}, {@code x.equals(y)}
     *     should return {@code true} if and only if
     *     {@code y.equals(x)} returns {@code true}.
     * <li>It is <i>transitive</i>: for any non-null reference values
     *     {@code x}, {@code y}, and {@code z}, if
     *     {@code x.equals(y)} returns {@code true} and
     *     {@code y.equals(z)} returns {@code true}, then
     *     {@code x.equals(z)} should return {@code true}.
     * <li>It is <i>consistent</i>: for any non-null reference values
     *     {@code x} and {@code y}, multiple invocations of
     *     {@code x.equals(y)} consistently return {@code true}
     *     or consistently return {@code false}, provided no
     *     information used in {@code equals} comparisons on the
     *     objects is modified.
     * <li>For any non-null reference value {@code x},
     *     {@code x.equals(null)} should return {@code false}.
     * </ul>
     * <p>
     * The {@code equals} method for class {@code Object} implements
     * the most discriminating possible equivalence relation on objects;
     * that is, for any non-null reference values {@code x} and
     * {@code y}, this method returns {@code true} if and only
     * if {@code x} and {@code y} refer to the same object
     * ({@code x == y} has the value {@code true}).
     * <p>
     * Note that it is generally necessary to override the {@code hashCode}
     * method whenever this method is overridden, so as to maintain the
     * general contract for the {@code hashCode} method, which states
     * that equal objects must have equal hash codes.
     *
     * @param   obj   the reference object with which to compare.
     * @return  {@code true} if this object is the same as the obj
     *          argument; {@code false} otherwise.
     * @see     #hashCode()
     * @see     java.util.HashMap
     */
    public boolean equals(Object obj) {
        return (this == obj);
    }

正如注释第一句所谈到的，equals就是一个辨别两个对象之间的"相等"关系的函数。
当然这种相等关系应由子类自行定义，但注释中指出需遵守以下性质：
性质1：reflexive，自反性。对于任何非空引用x，x.equals(x)==true应恒成立。
性质2：symmetric，对称性。对于两个非空引用x和y，x.euqals（y)==true当且仅当y.euqals（x)==true。
性质3：transitive，传递性。简单地讲，x equals y , y equals z，则x equals z成立。
性质4：consistent，一致性。如果两个非空对象x,y没有发生变化，则反复调用x.euqals(y)所返回的结果应一致。
性质5：对于任何非空引用x，x.equals(null)==false成立。

了解完性质，关键的地方来了：

 * Note that it is generally necessary to override the {@code hashCode}
     * method whenever this method is overridden, so as to maintain the
     * general contract for the {@code hashCode} method, which states
     * that equal objects must have equal hash codes.

注释中指出，重写equals()则应重写hashcode()，使得equals相等的对象具有相等的hashCode。

hashcode()

hashcode是native方法，具体生成的细节和jdk版本有关，如何生成hashcode并不是今天的主题。接下来，我们看下hashcode的部分注释：

     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * {@link java.util.HashMap}.
     * <p>
     * The general contract
 * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     * <li>If two objects are equal according to the {@code equals(Object)}
     *     method, then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     * </ul>

总的来讲：
hashcode的产生是为了hashmap等使用到hashcode的散列存储结构服务的。
hashcode有如下三个性质：
性质1：一致性：对于同一对象，多次调用hashcode()应返回相同的Integer。
性质2：equals相等，则hashCode必须相等。
性质3：equals不等，hashCode不是必须不等。

联系

了解了hashcode()以及equals()之后，肯定会发生这样一个疑问，为什么equals相等，hashcode必须相等？
实际上，这个问题只要搞懂hashcode()的用途，就可以很清楚的明白了。
hashcode()产生的目的就是为了hashMap等散列存储结构提供支持。
提供什么样的支持？
首先要明白，散列结构存储元素的重要过程之一是判别元素的“相同”/“不同”，这个相同与否是根据元素的hash值（为了降低hash冲突的概率，不同结构计算hash值的方式不同，但都是基于hashCode）决定的。每次put元素，则计算对应元素的hash值以及对应的坐标 ,有hash冲突则解决hash冲突，没有就直接存放。
这样做的好处是，当我要向一个散列结构存储一个元素时，我判别是否有重复元素，只需要计算hash值一次并比较，而不是对已经存储的每个元素依次equals（想象一个巨大的hashset，每次存放一个元素要和所有元素依次equals，这效率该多么低下）。
所以问题来了，如果散列结构存储的元素重写了equals方法却没有重写hashcode（）方法，就会导致一个问题：逻辑上我们认为相同的对象（equals==true），在散列存储结构的存储中因为hashcode的不同，最终被判定为两个不同的对象，从而存放了两份。
显然这并不是我们想要的。
下面附上一个例子，来展示下

public class Person {

    private int age;
    private String name;
    public Person(int age,String name){
        this.age=age;
        this.name=name;
    }

    @Override
    public boolean equals(Object object) {
        if (object == this) {
            return true;
        } else if (object instanceof Person) {
             if(((Person) object).age==this.age&&((Person) object).name.equals(this.name)){
                 return true;
             }
        } else {
            return false;
        }
        return false;
    }

//    @Override
//    public int hashCode(){
//        ` `
//    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

这里创建了一个Person对象，重写了equals方法，equals方法的逻辑是如果是同一个对象或者两个Person对象的名字和年龄相等，则认为是“相等”的。

public class test {

    public static void main(String[] args){
        Person person1 = new Person(1,"alex");
        Person person2 = new Person(1,"alex");
        System.out.println("person1==person2?"+(person1==person2)+
                "\n"+"person1 Equals person2?"+(person1.equals(person2))+
                "\n"+"person1 hashCode == person2?"+(person1.hashCode()==person2.hashCode()));
        HashSet<Person> hs = new HashSet<>();
        hs.add(person1);
        hs.add(person2);
        System.out.println("hashSize=="+(hs.size()));
        for(Person temp :hs){
            System.out.println(temp.getName());
        }
    }
}

这里我们创建了两个我们认为逻辑上相等的对象，使用hashset测试结果为：

person1==person2?false
person1 Equals person2?true
person1 hashCode == person2?false
hashSize==2
alex
alex

显然hashset将其认为是两个对象，分别存储了。
所以我们现在要做的就是重写hashCode()方法，确保equals相等的实例hashcode()也相等。

使用

不要小瞧equals()和hashcode()的重写，如果不恰当的重写，会导致一系列难以预料的结果产生。

equals()重写：

在上面的person例子中，对equals的重写，我参照了String的equals重写。
大致逻辑是判断是否为同一个对象，如果否，判断是否相同类型且逻辑相等。
实际上《Effective Java》这本书中提出了一个关于重写equals的规范，可以进行参考。

使用＝＝操作符检查“参数是否为这个对象的引用” 。如果是，则返回 true。这只不过是一种性能优化，如果比较操作有可能很昂贵，就值得这么做。

使用 instanceof 操作符检查“参数是否为正确的类型”。如果不是，则返回 false0 一般说来，所谓“正确的类型”是指 equals 方法所在的那个类。某些情况下，是指该类所实现的某个接口。如果类实现的接口改进了 equals 约定，允许在实现了该接口的类之间进行比较，那么就使用接口。集合接口如 Set 、 List 、 Map 和 Map . Entry 具有这样的特性。

把参数转换成正确的类型。因为转换之前进行过且instanceof 测试，所以确保会成功。

对于该类中的每个“关键”（ significant ）域，检查参数中的域是否与该对象中对应的域相匹配

对于equals的重写，最应该记住的是传入的应该是object！（最好使用@Override来帮助检查），否则就不是重写而是重载了！

当然，在重写完equals后最好对其进行单元测试，看其是否符合之前所讲的equals方法所强调的性质。如果你是android开发人员，简单的junit单元测试可以参照我之前的一篇博客。https://www.jianshu.com/p/5fea0dcc53b6

HashCode()重写

同样的《Effective Java》这本书中提出了一个关于重写HashCode方法的建议，可以进行参考。

一个好的散列函数通常倾向于“为不相等的对象产生不相等的散列码” 。这正是 hashCode 约定中第三条的含义。理想情况下，散列函数应该把集合中不相等的实例均匀地分布到所有可能的 int 值上。要想完全达到这种理想的情形是非常困难的。幸运的是，相对接近这种理想情形则并不太困难。
下面给出一种简单的解决办法：

声明一个 int 变量并命名为 result ，将它初始化为对象中第一个关键域的散列码 c，如步骤 2.a 中计算所示（如第 10 条所述，关键域是指影响equals 比较的域）。

对象中剩下的每一个关键域 f 都完成以下步骤：
a. 为该域计算 int 类型的散列码 C:
I . 如果该域是基本类型，则计算 Type. hashCode ( f ），这里的 Type 是装箱基本类型的类，与 f 的类型相对应。
II .如果该域是一个对象引用，并且该类的 equals 方法通过递归地调用 equals 的方式来比较这个域，则同样为这个域递归地调用 hashCode。如果需要更复杂的比较，则为这个域计算一个“范式”（ canonical representation），然后针对这个范式调用 hashCode。如果这个域的值为 null ，则返回 0 （或者其他某个常数，但通常是 0 ）。
III. 如果该域是一个数组，则要把每一个元素当作单独的域来处理。也就是说递归地应用上述规则，对每个重要的元素计算一个散列码，然后根据步骤 2.b 中的做法把这些散列值组合起来。如果数组域中没有重要的元素，可以使用一个常量，但最好不要用 0。如果数组域中的所有元素都很重要，可以使用 Arrays . hashCode 方法。
b. 按照下面的公式，把步骤 2.a 中计算得到的散列码 c 合并到 result 中：
result=31*result+C;

返回 result。

我们参照上面的建议，重写一下Person类的hashcode()：

    @Override
    public int hashCode() {
        int result = 17;
        result = 31 * result + age;
        result = 31 * result + name.hashCode();
        return result;
    }

在未测试的情况下这个hashcode()的冲突处理是否合适还需考量，但逻辑上它确实保证了与equals方法的同步。
现在再进行测试，发现已经搞定。

person1==person2?false
person1 Equals person2?true
person1 hashCode == person2?true
hashSize==1
alex

Why 31?

看到《Effective Java》建议的你，一定会有些疑问，why must 31?
事实上，如果你看过String（Android jdk）的hashCode源码，你会发现它是这样的

   public int hashCode() {
        int h = hash;
        final int len = length();
        if (h == 0 && len > 0) {
            for (int i = 0; i < len; i++) {
                h = 31 * h + charAt(i);
            }
            hash = h;
        }
        return h;
    }

它竟然也使用了31这个数字！！
所以这背后的原理是什么呢？31又有怎样的好处呢？
《Effective Java》里也进行了解释：

之所以选择 31 ，是因为它是一个奇素数。如果乘数是偶数，并且乘法，溢出的话，信息就会丢失，因为与 2 相乘等价于移位运算。使用素数的好处并不很明显，但是习惯上都使用素数来计算散列结果。 31 有个很好的特性，即用移位和减法来代替乘法，可以得到更好的性能： 31 * i = = ( i < < 5 ) - i。现代的虚拟机可以自动完成这种优化。

这段话里已经讲的比较明确了：
1.选择奇数，防止溢出信息丢失。（实际这一点我还不太明白，有懂的老哥可以讲讲）
2.选择素数，because its traditional。(总的来讲就是和是否是素数关系不大。可以看下面的StackOverFlow链接，许多人认为和是否为素数没关系，毕竟所有的素数都是奇数（除了2）)
3.乘以31，可以被优化为位运算，就会比传统的乘法快很多。

所以为什么必须是31呢？41，51不行吗？
https://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier
stackOverFlow的这篇问答里有一个老哥做了测试：

As Goodrich and Tamassia point out, If you take over 50,000 English words (formed as the union of the word lists provided in two variants of Unix), using the constants 31, 33, 37, 39, and 41 will produce less than 7 collisions in each case. Knowing this, it should come as no surprise that many Java implementations choose one of these constants.

也就是说，如果对超过50000个英语单词做hash运算，并采用31，33，37，39，41等做为常数乘，结果冲突次数都在7次以内。
那么31是最接近2幂次的数字，优化为位运算更简洁，自然采用31。

后记

这篇博客在草稿箱里也待了好几天，一直耽搁着没有发出来。今天完成了发出来真是畅快啊。