在 Web 上构建您自己的 AI 驱动的虚拟助手

2022-09-22 本文已影响0人程序员DS

使用 Typescript 和 React 构建由 Houndify API 提供支持的无障碍 Alexa 克隆

文章来源：https://cs310.hashnode.dev/build-your-own-ai-powered-virtual-assistant-on-the-web-part1

配置

我们将使用create-react-appTypescript 制作这个项目。

npx create-react-app my-assistant --template typescript

完成后，让我们安装开始所需的库。

npm i houndify jotai react-feather

这将为我们提供以下信息：

Houndify 节点 SDK
Jotai：一个简单的状态管理库（作为 React Context API 的替代品）
react-feather 是来自feathericons.com的优秀开源图标的包装器

我们还将使用 Sass 为我们的组件设置样式，并使用npm i -D sass.

创建服务器

我们首先需要一个服务器来验证对 Houndify API 的请求。Houndify SDK 提供了一个HoundifyExpress对象。这将附加到 Express 服务器并在其中添加我们需要的路由server.js。添加以下代码来设置服务器：

const express = require("express");
const path = require("path");
const houndifyExpress = require("houndify").HoundifyExpress;
const app = express();
require("dotenv").config({ path: "./.env" });

const PORT = process.env.PORT || 8080;

app.use(express.static(path.join(__dirname, "build")));

app.get("/", function (req, res) {
    res.sendFile(path.join(__dirname, "build", "index.html"));
});

app.get(
    "/houndifyAuth",
    houndifyExpress.createAuthenticationHandler({
        clientId: process.env.HOUNDIFY_CLIENT_ID,
        clientKey: process.env.HOUNDIFY_CLIENT_KEY,
    })
);

app.listen(PORT, () => console.log(`Listening on port ${PORT}`));

请注意，它还提供目录中的index.html文件build。因此，如果我们要部署此应用程序，我们将首先运行npm run build以生成静态文件。然后运行node server.js以启动服务器以从那里访问应用程序。

另外，请注意我们正在使用的环境变量，在中配置.env，我们尚未添加。我们也没有客户端 ID 和客户端密钥。所以让我们开始使用 Houndify API。

使用 API 仪表板

在Houndify免费创建一个开发者帐户并创建一个新客户端。一旦你在那里，输入应用程序的名称和类型，如下所示：

现在您应该会看到一个页面，询问您要为应用程序启用哪些域。我们将使用以下十个域（全部来自第一页）：

如果您看一下，其中一些域需要客户端集成。客户端必须应用额外的逻辑来集成功能。我们以后可以随时添加其他域，但现在我们不需要它们。

单击显示概述和 API 密钥的按钮：

一旦你在那里，复制客户端 ID 和客户端密钥并将它们添加到以下.env文件中：

HOUNDIFY_CLIENT_ID={YOUR_CLIENT_ID}
HOUNDIFY_CLIENT_KEY={YOUR_CLIENT_SECRET}

现在我们准备开始构建前端。

创建语音请求

首先，打开src目录并删除以下我们不需要的文件：

徽标.svg
setupTests.ts
应用程序.test.tsx

然后，定义一个函数来初始化对 houndify API 的语音请求。它获取有关音频流的信息，以及捕获各种事件的处理程序。

// lib/initVoiceRequest.ts
import { RequestHandlers } from "./types";

export default function initVoiceRequest(
    recorder: any,
    conversationState: object,
    handlers: RequestHandlers
) {
    // @ts-ignore (2339)
    const voiceRequest = new window.Houndify.VoiceRequest({
        //Your Houndify Client ID
        clientId: "{YOUR_CLIENT_ID}",

        authURL: "/houndifyAuth",

        //REQUEST INFO JSON
        //See https://houndify.com/reference/RequestInfo
        requestInfo: {
            UserID: "test_user",
            //See https://www.latlong.net/ for your own coordinates
            Latitude: 37.388309,
            Longitude: -121.973968,
        },

        //Pass the current ConversationState stored from previous queries
        //See https://www.houndify.com/docs#conversation-state
        conversationState,

        //Sample rate of input audio
        sampleRate: recorder.sampleRate,

        //Enable Voice Activity Detection
        //Default: true
        enableVAD: true,

        //Partial transcript, response and error handlers
        onTranscriptionUpdate: handlers.onTranscriptionUpdate,
        onResponse: function (response: any, info: any) {
            recorder.stop();
            handlers.onResponse(response, info);
        },
        onError: function (err: any, info: any) {
            recorder.stop();
            handlers.onError(err, info);
        },
    });

    return voiceRequest;
}

让我们分解这段代码：

RequestHandlers：用于响应语音请求状态的函数的类型接口

// lib/types.ts
export interface RequestHandlers {
  onResponse(response: any, info: any): void;
  onTranscriptionUpdate(transcript: any): void;
  onError(err: any, info: any): void;
}

对话状态：在此处阅读文档
记录器：AudioRecorderSDK 中的一个对象。它允许从用户的麦克风捕获音频流。我们必须在any这里使用类型，因为 SDK 没有定义类型（真可惜！）
enableVAD：启用后，一旦用户停止讲话，请求将被发送
ts-ignore：在编辑器中禁用 linter 警告以访问window.Houndify. 这包含对 Houndify 浏览器 SDK 的引用。我们可以使用以下代码从 CDN 中包含此 SDK public/index.html：
```
<script src="https://unpkg.com/houndify@3.1.12/dist/houndify.js"></script>
```

语音输入显示

VoiceInput.ts现在在目录中创建一个名为的src文件并添加以下代码：

import { useAtom } from "jotai";
import { useRef } from "react";
import { Mic, MicOff } from "react-feather";
import { recorderAtom, recordingAtom } from "./store";
import styles from "./VoiceInput.module.scss";

interface VoiceInputProps {
    transcription: string;
}

export default function VoiceInput({ transcription }: VoiceInputProps) {
    const [recorder] = useAtom(recorderAtom);
    const [recording] = useAtom(recordingAtom);

    const onClickMic = () => {
        if (recorder && recorder.isRecording()) {
            recorder.stop();
            return;
        }

        recorder.start();
    };

    const Icon = recording ? MicOff : Mic;

    return (
        <div className={styles.inputContainer}>
            <button
                type="button"
                title={`${recording ? "Stop" : "Start"} voice query`}
                onClick={onClickMic}
            >
                <Icon size={64} color="#343434" />
            </button>
            <div>
                <div className={styles.transcript}>{transcription}</div>
            </div>
        </div>
    );
}

该组件执行以下操作：

获取用户当前所说的转录字符串
使用定义在lib/store.ts. 它们允许我们访问我们的AudioRecorder对象和一个recording布尔值。布尔值确定是否正在捕获用户的麦克风
单击该按钮时，将切换录音机。然后按钮的图标也会更新

我们现在可以使用以下代码添加样式表：

$backgroundColor: #e8e1d3;
$complimentColor: #efe8e7;

.inputContainer {
    display: flex;
    flex-direction: column;
    align-items: center;
    width: 100%;

    button {
        border-radius: 50%;
        padding: 20px;
        background: transparent;
        border: 3px solid black;

        &:hover {
            cursor: pointer;
            background: $complimentColor;
        }
    }
}

.transcript {
    background-color: lighten($complimentColor, 5%);
    border-radius: 10px;
    display: flex;
    justify-content: center;
    align-items: center;
    height: 30px;
    margin-top: 20px;
    flex: 1;
    min-width: 33.3vw;
    padding: 5px 10px;
}

我们可以atoms在src/lib/store.ts文件中定义我们的：

import { atom } from "jotai";

export const recorderAtom = atom<any>(null);
export const recordingAtom = atom(false);

构建 App 组件

将的内容替换为src/App.tsx以下代码：

import { useCallback, useEffect, useRef, useState } from "react";
import styles from "./App.module.scss";
import initVoiceRequest from "./lib/initVoiceRequest";
import VoiceInput from "./VoiceInput";
import { useAtom } from "jotai";
import { recorderAtom, recordingAtom } from "./lib/store";

function App() {
    // Keep hold of the state
    const conversationState = useRef<any>(null);

    // Holds what the user is currently saying
    const [transcription, setTranscription] = useState("");

    // Any errors from the voice request will be stored here
    const [error, setError] = useState("");

    const [recorder, setRecorder] = useAtom(recorderAtom);
    const [recording, _setRecording] = useAtom(recordingAtom);

    const setRecording = (value: boolean) => {
        ...
        _setRecording(value);
    };
    ...

    return (
        <div className={styles.root}>
            <h1 className={styles.h1}>Assist310</h1>
            <VoiceInput transcription={transcription} />
            {error && <div className={styles.errorContainer}>{error}</div>}
        </div>
    );
}

export default App;

首先，定义我们的语音请求处理函数：

const onResponse = useCallback((response: any, info: any) => {
    if (response.AllResults && response.AllResults.length) {
        const result = response.AllResults[0];
        conversationState.current = result.ConversationState;
        handleResult(result);
        setTranscription("");
    }
}, []);

const onTranscriptionUpdate = useCallback((transcript: any) => {
    setTranscription(transcript.PartialTranscript);
}, []);

const onError = useCallback((error: any, info: any) => {
    setError(JSON.stringify(error));
}, []);

const handleResult = (result: any) => {
    // We'll add more here later
};

您可以在此处查看服务器响应格式。

我们现在可以创建一个安装效果，它将初始化 AudioRecorder 对象。然后，它将其事件绑定到一个初始化的 VoiceRequest 对象：

useEffect(() => {
    // @ts-ignore (2339)
    const audioRecorder = new window.Houndify.AudioRecorder();
    setRecorder(audioRecorder);

    let voiceRequest: any;

    audioRecorder.on("start", () => {
        setRecording(true);
        voiceRequest = initVoiceRequest(
            audioRecorder,
            conversationState.current,
            {
                onResponse,
                onTranscriptionUpdate,
                onError,
            }
        );
    });

    audioRecorder.on("data", (data: any) => {
        voiceRequest.write(data);
    });

    audioRecorder.on("end", () => {
        voiceRequest.end();
        setRecording(false);
    });

    audioRecorder.on("error", () => {
        voiceRequest.abort();
        setRecording(false);
    });
}, []);

第一次预览

首先，将我们的全局样式添加到src/index.css：

body {
    margin: 0;
    font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", "Roboto",
        "Oxygen", "Ubuntu", "Cantarell", "Fira Sans", "Droid Sans",
        "Helvetica Neue", sans-serif;
    -webkit-font-smoothing: antialiased;
    -moz-osx-font-smoothing: grayscale;
    width: 100%;
    height: 100%;
    position: fixed;
}

#root {
    width: 100%;
    height: 100%;
}

这使得我们的主要 div 元素占据了整个页面并将所有内容保存在一个视图中。

为了让我们的服务器create-react-app知道我们的其他服务器，我们需要在proxy我们的package.json文件中添加一个：

{
    ...
    "proxy": "http://localhost:8080",
    ...
}

现在在项目目录中打开两个终端窗口。node server.js在一个窗口和另一个窗口中运行npm start。

如果一切正常，端口 8080 上的服务器应该记录 8080，而开发服务器应该记录自己的端口。在浏览器中打开后者，您应该会看到以下内容：

如果您尝试单击按钮并说出命令，客户端会将请求发送到服务器。同时，它显示部分转录。但之后不会有其他事情发生。因此，让我们向应用程序添加输出。

将文本添加到语音

我们希望显示来自服务器的书面响应或口头响应。让我们使用语音响应并使用Web Speech API实现 TTS，以便应用程序可以“说出”响应。

设置它就像首先将以下内容添加到App.tsx文件顶部一样简单：

const speech = new SpeechSynthesisUtterance();

// Set to your language code
speech.lang = "en";

const say = (text: string) => {
    speech.text = text;
    window.speechSynthesis.speak(speech);
};

然后将此代码添加到我们的handleResult函数中：

const handleResult = (result: any) => {
    // We'll add more here later
    say(result["SpokenResponseLong"]);
};

这就是一切！

音频反馈

但是现在，我们还可以在用户按下主按钮时添加一些听觉反馈。我们可以使用 Howler.js 库播放音频文件，我们可以使用以下行安装它：

npm i howler

然后使用函数创建一个文件lib/playSound.ts来播放任何音频源：

import { Howler, Howl, HowlOptions } from "howler";

export default function playSound(
    src: string,
    options?: Omit<HowlOptions, "src">
) {
    Howler.stop();
    new Howl({
        src,
        ...options,
    }).play();
}

现在在文件顶部导入我们需要的所有内容，App.tsx如下所示：

import { Howl } from "howler";

import startSound from "./audio/start.wav";
import stopSound from "./audio/stop.wav";
import playSound from "./lib/playSound";

const sources = {
    start: startSound,
    stop: stopSound,
};

您可以在此处下载麦克风提示音。将它们添加到src/audio具有各自名称的文件夹中：start和stop.

然后将以下行添加到setRecording函数中：

playSound(sources[value ? "start" : "stop"]);

现在打开浏览器。您应该听到播放的音频和口头响应。

奖励 - 可视化音频输入

我们可以使用 Wave.js 库来可视化来自用户麦克风的音频，并使用以下行安装它：

npm i https://github.com/WoolDoughnut310/Wave.js

确保从我的仓库而不是 NPM 安装它。这是由于我对其进行了细微的更改以使其与我们的 AudioRecorder 对象一起使用。我已经提出了拉取请求，但我仍在等待图书馆作者的回复。

打开src/VoiceInput.tsx并添加以下导入语句：

import { Wave } from "@foobar404/wave";

然后将以下代码添加到onClickMic我们之前创建的函数的末尾：

recorder.on("start", () => {
    if (canvasEl.current) {
        let wave = new Wave(
            {
                source: recorder.source as MediaElementAudioSourceNode,
                context: recorder.audioCtx as AudioContext,
            },
            canvasEl.current
        );
        wave.addAnimation(
            new wave.animations.Lines({
                top: true,
            })
        );
    }
});

现在运行代码，我们应该与开始显示的演示相提并论。

最后的笔记

如果您有任何问题，请随时发表评论。如果您喜欢这个基于网络的 Jarvis 克隆，请务必与朋友分享。下一次，我们将添加额外的功能，例如音乐播放和歌曲识别。和以前一样，我将代码留在了我的GitHub 上，所以请尽情享受吧。