Rasa_NLU 源码分析

2018-08-06 本文已影响0人走在成长的道路上

周末找了个 nlp 相关的工具，使用起来还不错，它就是 rasa_nlu, 具有实体识别，意图分类等功能，在加上一个简单的意图操作即可实现简单的 chatbot 功能，其类图如下所示：

Rasa_NLU 类依赖图

整体程序的入口是在 data_router.py 文件中的 DataRouter 类中，主要作用是将模型以 project 的方式进行管理，控制数据的流向问题

component_classes 中包含所有 Component 类

# Classes of all known components. If a new component should be added,
# its class name should be listed here.
component_classes = [
    SpacyNLP, MitieNLP,
    SpacyEntityExtractor, MitieEntityExtractor, DucklingExtractor,
    CRFEntityExtractor, DucklingHTTPExtractor,
    EntitySynonymMapper,
    SpacyFeaturizer, MitieFeaturizer, NGramFeaturizer, RegexFeaturizer,
    CountVectorsFeaturizer,
    MitieTokenizer, SpacyTokenizer, WhitespaceTokenizer, JiebaTokenizer,
    SklearnIntentClassifier, MitieIntentClassifier, KeywordIntentClassifier,
    EmbeddingIntentClassifier
]

# Mapping from a components name to its class to allow name based lookup.
registered_components = {c.name: c for c in component_classes}

registered_components 通过将 component_classes 中的类进行迭代并遍历出名称 Map

get_component_class 函数将名称转为相应的 Component 类

主要架构相关的文件：

registry.py 文件 主要作用是将 pipeline 中的名称转为相应的 类，以及导入相应的模型文件
config.py 配置文件转换
model.py 文件 主要是模型相关内容

类名	说明
RasaNLUModelConfig	用来存放训练是使用的 pipeline 参数
Metadata	将 model 目录下 metadata.json 文件进行解析，并缓存
Trainer	训练所有相关的 Component 部分，通过 train 函数进行训练，通过 persist 函数进行持久化存储
Interpreter	通过训练好的 pipeline 模型解析文本字符串
Persistor	用于存储模型在云端 aws，gcs，azure等

在 persist 函数中，通过 self.pipeline 缓存内容，加上各种参数以及相应模型文件配置到 metadata.json 文件中

Interpreter 初始化流程

1. 加载 MetaData 数据内容
2. 根据 metadata.json 中 pipeline 构件Component 执行序列
3. 初始化 Interpreter 参数列表

Interpreter 解析文本过程

1. 将文本通过 Message 进行封装
2. 根据 Component 执行序列处理 Message 对象
3. 格式化输出 Message 对象内容

Message 中通过 Map 将所有计算结果存放在相应的地方最终格式化为输出结果

Rasa_NLU 源码分析

猜你喜欢

热点阅读