初探rasa

2020-05-11 本文已影响0人还闹不闹

0、rasa框架简介

Rasa是一个开源机器学习框架，用于构建上下文AI助手和聊天机器人。

Rasa有两个主要模块：
Rasa NLU ：用于理解用户消息，包括意图识别和实体识别，它会把用户的输入转换为结构化的数据。
Rasa Core：是一个对话管理平台，用于举行对话和决定下一步做什么。

Rasa X是一个工具，可帮助您构建、改进和部署由Rasa框架提供支持的AI Assistants。 Rasa X包括用户界面和REST API。

1、环境准备

Python3.7
请自行安装。
安装visual studio 2015
后面安装MITIE依赖于此环境。
链接：https://pan.baidu.com/s/1rWXW_5H_oeBDl_3depTJKw
提取码：i6he
安装cmake

pip install cmake

安装boost
boost没有编译好的安装程序之类的东西，需要将下载回来的压缩包解压后，再编译。编译过程对VS2015有依赖，之前装了，可以放心运行。
我本机boost的解压目录为：G:\App\boost\boost_1_73_0
安装步骤：
先打开命令行工具，并切换工作目录。执行bootstrap.bat命令后，会生b2.exe bjam.exe。最后再执b2, --prefix指定安装的目录。

cd G:\App\boost\boost_1_73_0
bootstrap.bat
b2 --prefix=G:\boost\bin install

vswhere和BuildTools
安装boost过程中可能会提示无法找到vswhere。
链接：https://pan.baidu.com/s/1FdPZNq9YviTBIYEpEttPXA
提取码：7xk7

jieba

pip install jieba

安装rasa_nlu和rasa_core

pip install rasa_nlu
pip install rasa_core[tensorflow]

再来一个

pip install rasa

Rasa_NLU_Chi
Rasa_NLU_Chi 作为 Rasa_NLU 的一个 fork 版本，加入了jieba 作为中文的 tokenizer，实现了中文支持。

git clone https://github.com/crownpku/rasa_nlu_chi.git
cd rasa_nlu_chi
python setup.py install

安装MITIE

pip install git+https://github.com/mit-nlp/MITIE.git

2、构建项目

2.1 准备语料

提供1份语料
链接：https://pan.baidu.com/s/163HTSjpFklxUI2OJ_H5TIw
提取码：chg5
也可以自己构建，rasa提供了数据标注平台rasa_nlu_trainer

{
  "rasa_nlu_data": {

    "common_examples": [

      {
        "text": "你好",
        "intent": "greet",
        "entities": []
      },

      {
        "text": "最近好吗",
        "intent": "greet",
        "entities": []
      },

      {
        "text": "我想找地方吃饭",
        "intent": "restaurant_search",
        "entities": []
      },

      {
        "text": "我想吃火锅啊",
        "intent": "restaurant_search",
        "entities": [
          {
            "start": 2,
            "end": 5,
            "value": "火锅",
            "entity": "food"
          }
        ]
      },

      {
        "text": "我想吃雪糕啊",
        "intent": "restaurant_search",
        "entities": [
          {
            "start": 2,
            "end": 5,
            "value": "雪糕",
            "entity": "food"
          }
        ]
      },

      {
        "text": "明天天气预报",
        "intent": "search_weather",
        "entities": [
          {
            "start": 0,
            "end": 2,
            "value": "明天",
            "entity": "datatime"
          }
        ]
      },

      {
        "text": "下午会下雨吗",
        "intent": "search_weather",
        "entities": [
          {
            "start": 0,
            "end": 2,
            "value": "下午",
            "entity": "datatime"
          }
        ]
      }

    ]

  }
}

2.2 训练的配置文件

config_jieba_mitie_sklearn.yml

language: "zh"

pipeline:
- name: "MitieNLP"
  model: "total_word_feature_extractor_zh.dat"//加载MITIE模型
- name: "JiebaTokenizer"//使用jieba进行分词
- name: "MitieEntityExtractor"//MITIE的命名实体识别
- name: "EntitySynonymMapper"
- name: "RegexFeaturizer"
- name: "MitieFeaturizer"//特征提取
- name: "SklearnIntentClassifier"//sklearn的意图分类模型

运行的时候可能会报错，把注释删掉即可。

2.3 语言模型

由于在pipeline中使用了MITIE，所以需要一个训练好的MITIE模型（先进行中文分词）。MITIE模型是非监督训练得到的，类似于word2vec中的word embedding，需要大量中文语料，训练该模型对内存要求较高，并且非常耗时，直接使用网友分享的中文的维基百科和百度百科语料生成的模型文件。
https://github.com/howl-anderson/MITIE_Chinese_Wikipedia_corpus/releases

2.4 整体项目结构

|---yuliao                                  //语料
|   |---zh_yuliao.json
|---total_word_feature_extractor_zh.dat     //MITIE模型
|---total_word_feature_extractor.dat
|---conf                                    //训练的配置文件
|   |---config_jieba_mitie_sklearn.yml
|---result_models                           //存储训练结果文件
|   |---nlu_1_demo_v11
|   |   |---nlu_1_demo_v11

__init__.py                 空文件
actions.py                  可以自定义 actions 的代码文件
config.yml                  Rasa NLU 和 Rasa Core 的配置文件
credentials.yml             定义和其他服务连接的一些细节，例如rasa api接口
data/nlu.md                 Rasa NLU 的训练数据
data/stories.md             Rasa stories 数据
domain.yml                  Rasa domain 文件
endpoints.yml               和外部消息服务对接的 endpoins 细则，例如 fb messenger
models/<timestamp>.tar.gz   初始训练的模型数据

3、训练rasa_nlu模型

在根目录下，打开CMD，输入以下命令进行训练。

python -m rasa_nlu.train -c conf\config_jieba_mitie_sklearn.yml --data yuliao\zh_yuliao.json --path result_models --project nlu_1_demo_v11

参数：
训练配置文件：-c
训练语料：--data
模型保存路径：--path
项目名称：--project

模型训练完成后，会在--path指定的路径下保存训练好的模型文件，如果训练时指定了模型名称（即--project），模型就会存储在result_models/project_name/model_**目录中，如result_models/nlu_1_demo_v11/model_20200511-155530

4、启动服务

python -m rasa_nlu.server -c conf\config_jieba_mitie_sklearn.yml --path result_models

5、测试

方法一
使用浏览器，输入以下url进行访问。
http://localhost:5000/parse?q=明天天气预报&project=nlu_1_demo_v11&model=model_20200511-155530
方法二
打开一个新的终端，使用curl命令获取结果

curl -XPOST localhost:5000/parse -d '{"q":"明天天气预报", "project":"nlu_1_demo_v11", "model":"model_20200511-155530"}'