人机对话系统 (1)

2019-07-20 本文已影响12人 zidea

人机对话

NLP（natural language process）自然语言处理是机器学习的一部分 google 和百度在机器翻译上在近几年都宣称取得巨大成绩，当我打开必应（bing）搜索资料时候也喜欢和微软的 chatbot 聊一聊。

import nltk
from nltk.stem.lancaster import LancasterStemmer

import numpy
import tflearn
import tensorflow
import random
import json

with open("intents.json") as file:
    data = json.load(file)
print(data)

准备数据

{'intents': [{'tag': 'greeting', 'patterns': ['Hi', 'How are you', 'Is anyone there?', 'Hello', 'Good day', 'Whats up'], 'response': ['Hello!', 'Good to see you again', 'Hi there, how can i help?'], 'context_set': ''}]}

数据格式为 patterns 使我们输入内容，是用户发起的我们对其进行汇总，而 response 是 chatbot 根据用发起返回信息。通过这些我们来训练我们 chatbot 模型。大家看到这些会感觉这不就是根据内容进行搜索答案吗，其实不然训练后 chatbot 会根据内容，即使不在这里也能够做出与问题相匹配的应答。
大家注意到我们为每一个 intent 打上了标签（tag) 这是 chatbot 会根据用户语言对其进行分类判断出用户内容属于哪一个标签。

准备开发环境

因为 tflearn 在 python 3.7 有些问题，这里 Anaconda 创建一个纯净 python3.6 的环境来开发我们应用。
在官网成功安装 Anaconda 后，在命令行运行下面命令即可

conda create -n chatbot python=3.6

然后激活我们的 Anaconda 环境来进行在 python 3.6 下开发应用

activate chatbot

然后就是进行安装所需要依赖，第一个是 nltk 一个自然语言处理集合

pip install nltk

然后我们还需要安装 TensorFlow 和 tflearn ，其中 tflearn 是基于 TensorFlow 上提供高级 api ，来让开发者更容易地开发机器学习的系统。

开始开发

import nltk
from nltk.stem.lancaster import LancasterStemmer

stemmer = LancasterStemmer()

import numpy
import tflearn
import tensorflow
import random
import json
import pickle

with open("intents.json") as file:
    data = json.load(file)
    print(data)

首先输出一下我们的数据，从 json 文件中获取数据。
接下来要做的事将 patterns 内容分别出是哪一个标签（tag）下。

words = []
labels = []
docs = []

for intent in data["intents"]:
    for pattern in intent["patterns"]:
        wrds = nltk.word_tokenize(pattern)
        print(wrds)

首先我们需要通过 nltk 提供抽取单词，将每一个 pattern（话）转换为单词结构的集合
输出

['Hi']
['How', 'are', 'you']
['Is', 'anyone', 'there', '?']
['Hello']
['Good', 'day']
['Whats', 'up']
['cya']
['see', 'you', 'later']
['Goodbye']
['I', 'am', 'Leaving']
['Have', 'a', 'Good', 'day']
['how', 'old']
['how', 'old', 'is', 'tim']
['Goodbye']

words.extend(wrds)

然后把所有抽出单词放置到 words 数组中去，这里简单说一下 append 和 extend 区别
list.append(object) 向列表中添加一个对象object

l1 = [1, 2, 3, 4, 5]
l2 = [1, 2, 3]

l1.append(l2)
print(l1)

输出为

[1, 2, 3, 4, 5, [1, 2, 3]]

list.extend(sequence) 把一个序列seq的内容添加到列表中

l1 = [1, 2, 3, 4, 5]
l2 = [1, 2, 3]

l1.extend(l2)
print(l1)

输出为

[1, 2, 3, 4, 5, 1, 2, 3]

接下来将 tag 数据保存在 labels 中

words = []
labels = []
docs = []

for intent in data["intents"]:
    for pattern in intent["patterns"]:
        wrds = nltk.word_tokenize(pattern)
        words.extend(wrds)
        docs.append(pattern)

    if intent["tag"] not in labels:
        labels.append(intent["tag"])

通过上面代码我们完成将 intent 中句子保存在 docs，将单词保存在 words 而在 tag 保存在labels 中的任务。

人机对话系统 (1)

准备数据

准备开发环境

开始开发

猜你喜欢

热点阅读