哈希表的简单实现和存取时间实验

2018-06-01  本文已影响26人  Dash_chan

今天看了哈希表原理,理解了个大概:

python 的 字典是基于哈希表实现的,之所以字典的存取操作的时间复杂度为O1也得益于哈希表。
我们知道,数组的时间复杂度为O1,是因为数组实现了以索引存取值,这样在存取数据的时候,只需要
将索引(内存地址)拿到,就能很快地将数据进行存取操作,一步到位。当然插入和删除因为要涉及全部元素
所以时间复杂度是另一个级别。

而字典,其实就是一个变相的数组。它将我们要存储的字符串先转化为 数字 索引,然后再存到列表里。
这里 可以想象,由于同样的字符串转化为数字索引的时候(如果是用同一种算法),可定会发生重复,即
碰撞。这里不写解决碰撞的方法,本例中 思路是将每个字符串存进表的时候,挂上一个独一无二的id。

talk is cheap,代码实现如下:

import random
import time


class Hastable(object):
    def __init__(self):
        self.table_size = 100007
        self.table = [0] * self.table_size

    def add(self, key, value):
        index = 1
        f = 1
        for s in key:
            index += ord(s) * f
            f *= 10
        index = index % self.table_size

        data = [key, value]
        if self.table[index] == 0:
            self.table[index] = [data]
        else:
            self.table[index].append(data)

    def get(self, key, defult=None):
        index = 1
        f = 1
        for k in key:
            index += ord(k) * f
            f *= 10
        index = index % self.table_size

        if isinstance(self.table[index], list):
            for ls in self.table[index]:
                if ls[0] == key:
                    return ls[1]
        else:
            return defult


# -----------------以下为测试时间-----------

ls = []


def ran():
    seed = 'abcdefghijklmnopqrstuvwxyz'
    s = ''
    for i in range(5):
        random_index = random.randint(0, len(seed) - 1)
        s += seed[random_index]
    ls.append(s)


def ye(n):
    i = 0
    while i < n:
        ran()
        i += 1


def test():
    import uuid
    t1 = time.time()
    print('list add start', t1)
    ye(500000)
    t2 = time.time()
    print('list add end', t2)
    print('total time', t2 - t1)

    print("=====================")

    t3 = time.time()
    print('Hastable add start', t3)
    ht = Hastable()
    for key in ls:
        value = uuid.uuid4()
        # print('value', value)
        ht.add(key, value)
    t4 = time.time()
    print('Hastable add total', t4 - t3)

    print("=====================")

    t5 = time.time()
    print('Hastable get start', t5)
    for key in ls:
        v = ht.get(key)
    t6 = time.time()
    print('Hastable get total', t6 - t5)

    print("=====================")

    t7 = time.time()
    print('list get start', t7)
    for i in range(len(ls)):
        ls.__getitem__(i)
    t8 = time.time()
    print('list get total', t8 - t7)

    print('all time', t8 - t1)

if __name__ == '__main__':
    test()

输出如下:

list add start 1527833095.2639012
list add end 1527833102.5443177
total time 7.280416488647461
=====================
Hastable add start 1527833102.5443177
Hastable add total 6.015344142913818
=====================
Hastable get start 1527833108.5596619
Hastable get total 1.8111035823822021
=====================
list get start 1527833110.3707654
list get total 0.07000398635864258
all time 15.176868200302124
[Finished in 15.7s]
上一篇下一篇

猜你喜欢

热点阅读