iOS使用原生框架Speech Kit实现语音识别功能

2019-06-10 本文已影响0人智狸

一、前言

2016年Apple在发布重磅产品iOS10的同时也发布了Speech Kit语音识别框架，大名鼎鼎的Siri的语音识别就是基于Speech Kit实现的。有了Speech Kit，我们就可以非常简单地实现声音转文字的功能。下面我就简单介绍一下Speech Kit的用法。

二、实现

1、申请用户权限

首先需要引入Speech Kit框架

#import <Speech/Speech.h>

申请权限非常简单，在识别前（viewDidAppear:）加入以下代码即可申请语音识别的权限：

- (void)viewDidAppear:(BOOL)animated {

[super viewDidAppear:animated];

__weak typeof(self) wekself = self;

[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {

dispatch_async(dispatch_get_main_queue(), ^{

switch (status) {

case SFSpeechRecognizerAuthorizationStatusNotDetermined:

wekself.recordButton.enabled = NO;

[wekself.recordButton setTitle:@"语音识别未授权" forState:UIControlStateNormal];

break;

case SFSpeechRecognizerAuthorizationStatusDenied:

wekself.recordButton.enabled = NO;

[wekself.recordButton setTitle:@"用户未授权使用语音识别" forState:UIControlStateNormal];

break;

case SFSpeechRecognizerAuthorizationStatusRestricted:

wekself.recordButton.enabled = NO;

[wekself.recordButton setTitle:@"语音识别在这台设备上受到限制" forState:UIControlStateNormal];

break;

case SFSpeechRecognizerAuthorizationStatusAuthorized:

wekself.recordButton.enabled = YES;

[wekself.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

break;

default:

break;

}

});

}];

}

如果在运行起来会崩溃，原因是在iOS10后需要在info.plist文件中添加麦克分和语音识别权限申请信息：

Privacy - Speech Recognition Usage Description 请允许语音识别

Privacy - Microphone Usage Description 请打开麦克风

运行项目，会提示打开语音识别和打开麦克风权限，至此我们已经完成了权限的申请。

2、初始化语音识别引擎

#pragma mark - property

- (AVAudioEngine *)audioEngine {

if (!_audioEngine) {

_audioEngine = [[AVAudioEngine alloc] init];

}

return _audioEngine;

}

- (SFSpeechRecognizer *)speechRecognizer {

if (!_speechRecognizer) {

// 要为语音识别对象设置语音，这里设置的是中文

NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];

_speechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];

_speechRecognizer.delegate = self;

}

return _speechRecognizer;

}

#pragma mark - SFSpeechRecognizerDelegate

// 语音识别有效状态的回调

- (void)speechRecognizer:(SFSpeechRecognizer *)speechRecognizer availabilityDidChange:(BOOL)available {

if (available) {

self.recordButton.enabled = YES;

[self.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

} else {

self.recordButton.enabled = NO;

[self.recordButton setTitle:@"语音识别不可用" forState:UIControlStateNormal];

}

}

1.初始化SFSpeechRecognizer时需要传入一个NSLocle对象，用于标识用户输入的语种，如"zh-CN"代表普通话，"en_US"代表英文。

2.AVAudioEngine是音频引擎，用于音频输入。

3、启动语音识别引擎

添加以下代码：

- (IBAction)recordButtonClicked {

if ([self.audioEngine isRunning]) {

[self endRecording];

[self.recordButton setTitle:@"正在停止" forState:UIControlStateDisabled];

} else {

[self startRecoding];

[self.recordButton setTitle:@"停止录音" forState:UIControlStateNormal];

}

}

- (IBAction)startRecoding {

if (_recognitionTask) {

[_recognitionTask cancel];

_recognitionTask = nil;

}

AVAudioSession *audioSession = [AVAudioSession sharedInstance];

NSError *error = nil;

[audioSession setCategory:AVAudioSessionCategoryRecord error:&error];

NSParameterAssert(!error);

[audioSession setMode:AVAudioSessionModeMeasurement error:&error];

NSParameterAssert(!error);

[audioSession setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error];

NSParameterAssert(!error);

_recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];

AVAudioInputNode *inputNode = [self.audioEngine inputNode];

NSAssert(inputNode, @"录入设备没有准备好");

NSAssert(_recognitionRequest, @"请求初始化失败");

_recognitionRequest.shouldReportPartialResults = YES;

__weak typeof(self) wekself = self;

_recognitionTask = [self.speechRecognizer recognitionTaskWithRequest:_recognitionRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {

__strong typeof(wekself) strongself = wekself;

BOOL isFinal = NO;

if (result) {

NSLog(@"%@", result.bestTranscription.formattedString);

strongself.resultStringLabel.text = result.bestTranscription.formattedString;

isFinal = result.isFinal;

}

if (error || isFinal) {

[wekself.audioEngine stop];

[inputNode removeTapOnBus:0];

strongself.recognitionTask = nil;

strongself.recognitionRequest = nil;

strongself.recordButton.enabled = YES;

[strongself.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

}

}];

AVAudioFormat *recordingFormat = [inputNode outputFormatForBus:0];

// 在添加tap之前先移除上一个不然可能报错

[inputNode removeTapOnBus:0];

[inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {

__strong typeof(wekself) strongself = wekself;

if (strongself.recognitionRequest) {

[strongself.recognitionRequest appendAudioPCMBuffer:buffer];

}

}];

[self.audioEngine prepare];

[self.audioEngine startAndReturnError:&error];

NSParameterAssert(!error);

self.resultStringLabel.text = LoadingText;

}

1.利用AVAudioSession对象进行音频录制的配置。

2.在语音识别产生最终结果之前可能产生多种结果，设置SFSpeechAudioBufferRecognitionRequest对象的shouldReportPartialResult属性为YES意味着每产生一种结果就马上返回。

3.设置音频录制的格式及音频流回调的处理(把音频流拼接到self.recognitionRequest)。

4.为self.recordButton添加点击事件。

5.开始录制音频。

6.修改按钮文案。

4、重置语音识别引擎

添加以下代码：

- (void)endRecording {

[self.audioEngine stop];

if (_recognitionRequest) {

[_recognitionRequest endAudio];

}

if (_recognitionTask) {

[_recognitionTask cancel];

_recognitionTask = nil;

}

self.recordButton.enabled = NO;

if ([self.resultStringLabel.text isEqualToString:LoadingText]) {

self.resultStringLabel.text = @"";

}

}

1.为self.recordButton添加禁用点击事件。

2.停止音频录制引擎。

3.停止识别器。

4.修改按钮文案。

5、语音识别结果的回调

下面是语音识别器SFSpeechRecognizer的API描述：

// Recognize speech utterance with a request

// If request.shouldReportPartialResults is true, result handler will be called

// repeatedly with partial results, then finally with a final result or an error.

- (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request

resultHandler:(void (^)(SFSpeechRecognitionResult * __nullable result, NSError * __nullable error))resultHandler;

// Advanced API: Recognize a custom request with with a delegate

// The delegate will be weakly referenced by the returned task

- (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request

delegate:(id <SFSpeechRecognitionTaskDelegate>)delegate;

语音识别结果的回调有两种方式，一种是delegate，一种是block，这里为了简单，先采用block的方式回调。

6、识别音频文件

添加以下代码

/**

识别本地音频文件

*/

- (IBAction)recognizeLocalAudioFile {

NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];

SFSpeechRecognizer *localRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];

NSURL *url = [[NSBundle mainBundle] URLForResource:@"录音.m4a" withExtension:nil];

if (!url) return;

SFSpeechURLRecognitionRequest *res = [[SFSpeechURLRecognitionRequest alloc] initWithURL:url];

__weak typeof(self) wekself = self;

[localRecognizer recognitionTaskWithRequest:res resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {

if (error) {

NSString *errMsg = [NSString stringWithFormat:@"语音识别解析失败, %@", error];

[BaseViewController hudWithTitle:errMsg];

NSLog(@"%@", errMsg);

} else {

wekself.resultStringLabel.text = result.bestTranscription.formattedString;

}

}];

}