为搜索引擎设计一个 key-value 储存

2018-06-10  本文已影响11人  MontyOak

为搜索引擎设计一个 key-value 储存 原文链接

1.描述使用场景和约束

使用场景:

假设和约束:

容量估算:
数据结构keyqueryvalueresults:

2.创建系统设计图

系统总体设计图

3.设计关键组件

使用场景:用户请求已在缓存中
场景缓存可以使用Memcache或者Redis,减少倒挂索引服务和文档服务的读压力,在缓存淘汰策略上,可以使用LRU(least recently used)。

LRU的实现可以借助hash表和双向链表来实现:

class Node(object):

    def __init__(self, query, results):
        self.query = query
        self.results = results
class LinkedList(object):

    def __init__(self):
        self.head = None
        self.tail = None

    def move_to_front(self, node):
        ...

    def append_to_front(self, node):
        ...

    def remove_from_tail(self):
        ...

class Cache(object):

    def __init__(self, MAX_SIZE):
        self.MAX_SIZE = MAX_SIZE
        self.size = 0
        self.lookup = {}  # key: query, value: node
        self.linked_list = LinkedList()

    def get(self, query)
        """Get the stored query result from the cache.

        Accessing a node updates its position to the front of the LRU list.
        """
        node = self.lookup[query]
        if node is None:
            return None
        self.linked_list.move_to_front(node)
        return node.results

    def set(self, results, query):
        """Set the result for the given query key in the cache.

        When updating an entry, updates its position to the front of the LRU list.
        If the entry is new and the cache is at capacity, removes the oldest entry
        before the new entry is added.
        """
        node = self.lookup[query]
        if node is not None:
            # Key exists in cache, update the value
            node.results = results
            self.linked_list.move_to_front(node)
        else:
            # Key does not exist in cache
            if self.size == self.MAX_SIZE:
                # Remove the oldest entry from the linked list and lookup
                self.lookup.pop(self.linked_list.tail.query, None)
                self.linked_list.remove_from_tail()
            else:
                self.size += 1
            # Add the new key and value
            new_node = Node(query, results)
            self.linked_list.append_to_front(new_node)
            self.lookup[query] = new_node

查询服务:

class QueryApi(object):

    def __init__(self, memory_cache, reverse_index_service):
        self.memory_cache = memory_cache
        self.reverse_index_service = reverse_index_service

    def parse_query(self, query):
        """Remove markup, break text into terms, deal with typos,
        normalize capitalization, convert to use boolean operations.
        """
        ...

    def process_query(self, query):
        query = self.parse_query(query)
        results = self.memory_cache.get(query)
        if results is None:
            results = self.reverse_index_service.process_search(query)
            self.memory_cache.set(query, results)
        return results

缓存需要在下面情况下更新:

4.完善设计

最终设计图

关于分布式缓存,可以参考Redis Cluster

上一篇下一篇

猜你喜欢

热点阅读