哈希函数的设计
核心思想:
“键”通过哈希函数得到的“索引”分布越均匀越好
对于一些特殊领域,有特殊领域的哈希函数设计方式,甚至有专门的论文
字符串转换成整形:
166=1*10^2+6*10^1+6*10^0
code=c*26^3+o*26^2+d*26^1+e*26^0
hash(code)=(c*B^2+o*B^2+d*B^1+e*B^0)%M
等价于
hash(code)=((((c*B)+o)*B+d)*B+e)%M
等价于
hash(code)=((((c%M)*B+o)%M*B+d)%M*B+e)%M
转换成程序:
int hash=0;
for(int i=0;i<s.length();i++){
hash=(hash*B+s.charAt(i))%M
}
这就是字符串的哈希函数的设计
复合对象的哈希函数设计
Date: year,month,day
hash(date)=((((date.year%M)*B+date.month)%M*B+date.day)%M
总而言之都是转成整型处理,并不是唯一的方法!
哈希函数设计的三个原则:
1、一致性:如果a==b,则hash(a)==hash(b)
2、高效性:计算高效简便
3、均匀性:哈希值均匀分布
Java 哈希函数
Java 中的HashCode
public static void main(String[] args) {
int t1=100;
System.out.println(((Integer)t1).hashCode()); //输出100
int t2=-100;
System.out.println( ((Integer)t2).hashCode());//输出-100
double t3=3.1415296;
System.out.println( ((Double)t3).hashCode() );// 输出89289278
String t4="bing";
System.out.println(t4.hashCode()); //3023936
}
Integer的哈希函数:
@Override
public int hashCode() {
return Integer.hashCode(value);
}
public static int hashCode(int value) {
return value;
}
Double 的哈希函数:
@Override
public int hashCode() {
return Double.hashCode(value);
}
public static int hashCode(double value) {
long bits = doubleToLongBits(value);
return (int)(bits ^ (bits >>> 32));
}
String 的哈希函数:
/**
* Returns a hash code for this string. The hash code for a
* {@code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {@code int} arithmetic, where {@code s[i]} is the
* <i>i</i>th character of the string, {@code n} is the length of
* the string, and {@code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
Java 默认的HashCode 是根据每一个对象的地址把它映射成整型
public class Student {
private int classRoom;
private int grade;
private String name;
public Student(int classRoom,int grade,String name){
this.classRoom=classRoom;
this.grade=grade;
this.name=name;
}
@Override
public int hashCode(){
int hash=0;
int b=31;
hash=hash*b+classRoom;
hash=hash*b+grade;
hash=hash*b+name.toLowerCase().hashCode(); //是否区分大小写
return hash;
}
@Override
public boolean equals(Object o){
//如果地址一样
if(this==o){
return true;
}
if(o==null){
return false;
}
if(o.getClass()!=this.getClass()){
return false;
}
Student other=(Student)o;
return other.classRoom==this.classRoom&&
other.grade==this.grade&&
other.name.toLowerCase().equals(this.name.toLowerCase());
}
}
运行如下例子试试
Student student=new Student(3,5,"bing");
Student student2=new Student(3,5,"bing");
HashSet<Student> hashSet=new HashSet<>();
hashSet.add(student);
hashSet.add(student2);
System.out.println(hashSet.size()); //不重写 equals 会是2 自定义重写是 1
HashMap<Student,Integer> hashMap=new HashMap<>();
hashMap.put(student,100);
hashMap.put(student2,200);
System.out.println(hashMap.size()); //不重写 equals 会是2 自定义重写是 1
HashSet 和HashMap 除了调用对象的hashcode 来判断是否hash 冲突,还调用equals 判断对象是否为同一个对象。
我们来看看HashMap的put 方法的实现代码:
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
/**
* Implements Map.put and related methods
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
HashMap中添加新的元素,从put方法的具体实现可知,会先调用hashCode方法得到该元素的hashCode值,然后查看table中是否存在该hashCode值,如果存在则调用equals方法重新确定是否存在该元素,如果存在,则更新value值,否则将新的元素添加到HashMap中。从这里可以看出,hashCode方法的存在是为了减少equals方法的调用次数,从而提高程序效率。
java8 Hashmap hash函数为:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
h 是hashcode 。 h>>>16是用来取出h的高16,(>>>是无符号右移),如下图所示:
0000 0100 1011 0011 1101 1111 1110 0001
>>> 16
0000 0000 0000 0000 0000 0100 1011 0011
为什么用^而不用&和|
因为&和|都会使得结果偏向0或者1 ,并不是均匀的概念,所以用^。
异或运算符(^)
参加运算的两个数据,按二进制位进行“异或”运算。
运算规则:0^0=0; 0^1=1; 1^0=1; 1^1=0;
即:参加运算的两个对象,如果两个相应位为“异”(值不同),则该位结果为1,否则为0。
HashTable 与HashMap的区别,HashTable 是线程安全的它的添加方法使用了synchronized来保证线程安全,HashTable已经处于废弃状态,HashMap是非线程安全的。
HashMap的复杂度分析
总共M个地址,如果放入哈希表的元素是N,那么链表的时间复杂度是 N/M
如果每个地址是平衡树 O(log(N/M))
平均每个地址承载的元素过多过一定程度 立即扩容
N/M >=upperTol
哈希表 牺牲了顺序性
二分搜索树、AVL树、红黑色保证有顺序性