《Python 核心技术与实战》学习笔记 Day12 面向对象

2023-01-26 本文已影响0人 _相信自己_

“高大上”的搜索引擎

引擎一词尤如其名，听起来非常酷炫。搜索引擎，则是新世纪初期互联网发展最重要的入口之一，依托搜索引擎，中国和美国分别诞生了百度、谷歌等巨型公司。

搜索引擎极大地方便了互联网生活，也成为上网必不可少的刚需工具。依托搜索引擎发展起来的互联网广告，则成了硅谷和中国巨头的核心商业模式；而搜索本身，也在持续进步着。

一个搜索引擎由搜索器、索引器、检索器和用户接口四个部分组成。

搜索器，通俗来讲就是我们常提到的爬虫（scrawler），它能在互联网上大量爬取各类网站的内容，送给索引器。索引器拿到网页和内容后，会对内容进行处理，形成索引（index），存储于内部的数据库等待检索。最后的用户接口很好理解，是指网页和 App 前端界面，例如百度和谷歌的搜索页面。用户通过用户接口，向搜索引擎发出询问（query），询问解析后送达检索器；检索器高效检索后，再将结果返回给用户。

先来定义 SearchEngineBase 基类


class SearchEngineBase(object):
    def __init__(self):
        pass

    def add_corpus(self, file_path):
        with open(file_path, 'r') as fin:
            text = fin.read()
        self.process_corpus(file_path, text)

    def process_corpus(self, id, text):
        raise Exception('process_corpus not implemented.')

    def search(self, query):
        raise Exception('search not implemented.')

def main(search_engine):
    for file_path in ['1.txt', '2.txt', '3.txt', '4.txt', '5.txt']:
        search_engine.add_corpus(file_path)

    while True:
        query = input()
        results = search_engine.search(query)
        print('found {} result(s):'.format(len(results)))
        for result in results:
            print(result)

接下来，我们实现一个最基本的可以工作的搜索引擎，代码如下：


class SimpleEngine(SearchEngineBase):
    def __init__(self):
        super(SimpleEngine, self).__init__()
        self.__id_to_texts = {}

    def process_corpus(self, id, text):
        self.__id_to_texts[id] = text

    def search(self, query):
        results = []
        for id, text in self.__id_to_texts.items():
            if query in text:
                results.append(id)
        return results

search_engine = SimpleEngine()
main(search_engine)


########## 输出 ##########


simple
found 0 result(s):
little
found 2 result(s):
1.txt
2.txt

《Python 核心技术与实战》学习笔记 Day12 面向对象

“高大上”的搜索引擎

先来定义 SearchEngineBase 基类

接下来，我们实现一个最基本的可以工作的搜索引擎，代码如下：

猜你喜欢

热点阅读

《Python 核心技术与实战》 学习笔记 Day12 面向对象

“高大上”的搜索引擎

先来定义 SearchEngineBase 基类

接下来，我们实现一个最基本的可以工作的搜索引擎，代码如下：

猜你喜欢

热点阅读

《Python 核心技术与实战》学习笔记 Day12 面向对象