哈希函数的设计

2020-02-13 本文已影响0人滨岩

核心思想：

“键”通过哈希函数得到的“索引”分布越均匀越好

对于一些特殊领域，有特殊领域的哈希函数设计方式，甚至有专门的论文
字符串转换成整形：

166=1*10^2+6*10^1+6*10^0
code=c*26^3+o*26^2+d*26^1+e*26^0

hash(code)=(c*B^2+o*B^2+d*B^1+e*B^0)%M

等价于

hash(code)=((((c*B)+o)*B+d)*B+e)%M

等价于

hash(code)=((((c%M)*B+o)%M*B+d)%M*B+e)%M

转换成程序：

int hash=0;
for(int i=0;i<s.length();i++){
   hash=(hash*B+s.charAt(i))%M
}

这就是字符串的哈希函数的设计

复合对象的哈希函数设计

Date: year,month,day

hash(date)=((((date.year%M)*B+date.month)%M*B+date.day)%M

总而言之都是转成整型处理，并不是唯一的方法！

哈希函数设计的三个原则：

1、一致性：如果a==b，则hash(a)==hash(b)
2、高效性：计算高效简便
3、均匀性：哈希值均匀分布

Java 哈希函数

Java 中的HashCode

    public static void main(String[] args) {
        int t1=100;
        System.out.println(((Integer)t1).hashCode());   //输出100

        int t2=-100;
        System.out.println(  ((Integer)t2).hashCode());//输出-100

        double t3=3.1415296;
        System.out.println( ((Double)t3).hashCode() );// 输出89289278

        String t4="bing";
        System.out.println(t4.hashCode()); //3023936
        
    }

Integer的哈希函数：

    @Override
    public int hashCode() {
        return Integer.hashCode(value);
    }
    public static int hashCode(int value) {
        return value;
    }

Double 的哈希函数：

    @Override
    public int hashCode() {
        return Double.hashCode(value);
    }
    public static int hashCode(double value) {
        long bits = doubleToLongBits(value);
        return (int)(bits ^ (bits >>> 32));
    }

String 的哈希函数：

    /**
     * Returns a hash code for this string. The hash code for a
     * {@code String} object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using {@code int} arithmetic, where {@code s[i]} is the
     * <i>i</i>th character of the string, {@code n} is the length of
     * the string, and {@code ^} indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

Java 默认的HashCode 是根据每一个对象的地址把它映射成整型

public class Student {

    private int classRoom;
    private int grade;
    private String name;


    public Student(int classRoom,int grade,String name){
        this.classRoom=classRoom;
        this.grade=grade;
        this.name=name;
    }


    @Override
    public int hashCode(){
        int hash=0;
        int b=31;

        hash=hash*b+classRoom;
        hash=hash*b+grade;
        hash=hash*b+name.toLowerCase().hashCode();  //是否区分大小写

        return  hash;
    }


    @Override
    public boolean equals(Object o){

        //如果地址一样
        if(this==o){
            return true;
        }

        if(o==null){
            return  false;
        }

        if(o.getClass()!=this.getClass()){
            return false;
        }

        Student other=(Student)o;

        return  other.classRoom==this.classRoom&&
                other.grade==this.grade&&
                other.name.toLowerCase().equals(this.name.toLowerCase());

    }

}

运行如下例子试试

        Student student=new Student(3,5,"bing");
        Student student2=new Student(3,5,"bing");

        HashSet<Student> hashSet=new HashSet<>();

        hashSet.add(student);
        hashSet.add(student2);
        System.out.println(hashSet.size()); //不重写 equals  会是2  自定义重写是 1

        HashMap<Student,Integer> hashMap=new HashMap<>();

        hashMap.put(student,100);
        hashMap.put(student2,200);

        System.out.println(hashMap.size()); //不重写 equals  会是2  自定义重写是 1

HashSet 和HashMap 除了调用对象的hashcode 来判断是否hash 冲突，还调用equals 判断对象是否为同一个对象。

我们来看看HashMap的put 方法的实现代码：

 /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

HashMap中添加新的元素，从put方法的具体实现可知，会先调用hashCode方法得到该元素的hashCode值，然后查看table中是否存在该hashCode值，如果存在则调用equals方法重新确定是否存在该元素，如果存在，则更新value值，否则将新的元素添加到HashMap中。从这里可以看出，hashCode方法的存在是为了减少equals方法的调用次数，从而提高程序效率。

java8 Hashmap hash函数为:

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

h 是hashcode 。 h>>>16是用来取出h的高16，（>>>是无符号右移），如下图所示：

0000 0100 1011 0011  1101 1111 1110 0001
 
>>> 16 
 
0000 0000 0000 0000  0000 0100 1011 0011

为什么用^而不用&和|

因为&和|都会使得结果偏向0或者1 ,并不是均匀的概念,所以用^。

异或运算符（^）

参加运算的两个数据，按二进制位进行“异或”运算。

运算规则：0^0=0； 0^1=1； 1^0=1； 1^1=0；

即：参加运算的两个对象，如果两个相应位为“异”（值不同），则该位结果为1，否则为0。

HashTable 与HashMap的区别，HashTable 是线程安全的它的添加方法使用了synchronized来保证线程安全，HashTable已经处于废弃状态，HashMap是非线程安全的。

HashMap的复杂度分析

总共M个地址，如果放入哈希表的元素是N，那么链表的时间复杂度是 N/M
如果每个地址是平衡树 O(log(N/M))
平均每个地址承载的元素过多过一定程度立即扩容
N/M >=upperTol
哈希表牺牲了顺序性
二分搜索树、AVL树、红黑色保证有顺序性