python查看simhash,minhash转换后的值

2022-01-27  本文已影响0人  丙吉

看了下simhash, minhash算法原理。
查到的大多是直接用它们做计算,但想了解下hash后的值长什么样子。
https://leons.im/posts/a-python-implementation-of-simhash-algorithm/

simhash 查其值,用.value

from simhash import Simhash

def get_features(s):
    width = 3
    s = s.lower()
    s = re.sub(r'[^\w]+', '', s)
    return [s[i:i + width] for i in range(max(len(s) - width + 1, 1))]

print('%x' % Simhash(get_features('How are you? I am fine. Thanks.')).value)
print('%x' % Simhash(get_features('How are u? I am fine.     Thanks.')).value)
print('%x' % Simhash(get_features('How r you?I    am fine. Thanks.')).value)

结果如下:

image.png

minhash 查看值用,digest()

from datasketch import MinHashLSHEnsemble, MinHash

m1 = MinHash()
m2 = MinHash()
m1.update('How are you? I am fine. Thanks.'.encode('utf8'))
m2.update('How r you?I am fine. Thanks.'.encode('utf8'))
print(m1.digest())
print(m2.digest())

是个128维的向量


image.png

查看hashlib中的相关算法

https://docs.python.org/3.5/library/hashlib.html

import hashlib
hashlib.algorithms_guaranteed
image.png
上一篇下一篇

猜你喜欢

热点阅读