iOS使用原生框架Speech Kit实现语音识别功能

2019-06-10  本文已影响0人  智狸

一、前言

2016年Apple在发布重磅产品iOS10的同时也发布了Speech Kit语音识别框架,大名鼎鼎的Siri的语音识别就是基于Speech Kit实现的。有了Speech Kit,我们就可以非常简单地实现声音转文字的功能。下面我就简单介绍一下Speech Kit的用法。

二、实现

1、申请用户权限

首先需要引入Speech Kit框架

#import <Speech/Speech.h>

申请权限非常简单,在识别前(viewDidAppear:)加入以下代码即可申请语音识别的权限:

- (void)viewDidAppear:(BOOL)animated {

    [super viewDidAppear:animated];

    __weak typeof(self) wekself = self;

    [SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {

        dispatch_async(dispatch_get_main_queue(), ^{

            switch (status) {

                case SFSpeechRecognizerAuthorizationStatusNotDetermined:

                    wekself.recordButton.enabled = NO;

                    [wekself.recordButton setTitle:@"语音识别未授权" forState:UIControlStateNormal];

                    break;

                case SFSpeechRecognizerAuthorizationStatusDenied:

                    wekself.recordButton.enabled = NO;

                    [wekself.recordButton setTitle:@"用户未授权使用语音识别" forState:UIControlStateNormal];

                    break;

                case SFSpeechRecognizerAuthorizationStatusRestricted:

                    wekself.recordButton.enabled = NO;

                    [wekself.recordButton setTitle:@"语音识别在这台设备上受到限制" forState:UIControlStateNormal];

                    break;

                case SFSpeechRecognizerAuthorizationStatusAuthorized:

                    wekself.recordButton.enabled = YES;

                    [wekself.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

                    break;

                default:

                    break;

            }

        });

    }];

}

如果在运行起来会崩溃,原因是在iOS10后需要在info.plist文件中添加麦克分和语音识别权限申请信息:

Privacy - Speech Recognition Usage Description 请允许语音识别

Privacy - Microphone Usage Description 请打开麦克风

运行项目,会提示打开语音识别和打开麦克风权限,至此我们已经完成了权限的申请。

2、初始化语音识别引擎

#pragma mark - property

- (AVAudioEngine *)audioEngine {

    if (!_audioEngine) {

        _audioEngine = [[AVAudioEngine alloc] init];

    }

    return _audioEngine;

}

- (SFSpeechRecognizer *)speechRecognizer {

    if (!_speechRecognizer) {

        // 要为语音识别对象设置语音,这里设置的是中文

        NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];

        _speechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];

        _speechRecognizer.delegate = self;

    }

    return _speechRecognizer;

}

#pragma mark - SFSpeechRecognizerDelegate

// 语音识别有效状态的回调

- (void)speechRecognizer:(SFSpeechRecognizer *)speechRecognizer availabilityDidChange:(BOOL)available {

    if (available) {

        self.recordButton.enabled = YES;

        [self.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

    } else {

        self.recordButton.enabled = NO;

        [self.recordButton setTitle:@"语音识别不可用" forState:UIControlStateNormal];

    }

}

1.初始化SFSpeechRecognizer时需要传入一个NSLocle对象,用于标识用户输入的语种,如"zh-CN"代表普通话,"en_US"代表英文。

2.AVAudioEngine是音频引擎,用于音频输入。

3、启动语音识别引擎

添加以下代码:

- (IBAction)recordButtonClicked {

    if ([self.audioEngine isRunning]) {

        [self endRecording];

        [self.recordButton setTitle:@"正在停止" forState:UIControlStateDisabled];

    } else {

        [self startRecoding];

        [self.recordButton setTitle:@"停止录音" forState:UIControlStateNormal];

    }

}

- (IBAction)startRecoding {

    if (_recognitionTask) {

        [_recognitionTask cancel];

        _recognitionTask = nil;

    }

    AVAudioSession *audioSession = [AVAudioSession sharedInstance];

    NSError *error = nil;

    [audioSession setCategory:AVAudioSessionCategoryRecord error:&error];

    NSParameterAssert(!error);

    [audioSession setMode:AVAudioSessionModeMeasurement error:&error];

    NSParameterAssert(!error);

    [audioSession setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error];

    NSParameterAssert(!error);

    _recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];

    AVAudioInputNode *inputNode = [self.audioEngine inputNode];

    NSAssert(inputNode, @"录入设备没有准备好");

    NSAssert(_recognitionRequest, @"请求初始化失败");

    _recognitionRequest.shouldReportPartialResults = YES;

    __weak typeof(self) wekself = self;

    _recognitionTask = [self.speechRecognizer recognitionTaskWithRequest:_recognitionRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {

        __strong typeof(wekself) strongself = wekself;

        BOOL isFinal = NO;

        if (result) {

            NSLog(@"%@", result.bestTranscription.formattedString);

            strongself.resultStringLabel.text = result.bestTranscription.formattedString;

            isFinal = result.isFinal;

        }

        if (error || isFinal) {

            [wekself.audioEngine stop];

            [inputNode removeTapOnBus:0];

            strongself.recognitionTask = nil;

            strongself.recognitionRequest = nil;

            strongself.recordButton.enabled = YES;

            [strongself.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

        }

    }];

    AVAudioFormat *recordingFormat = [inputNode outputFormatForBus:0];

    // 在添加tap之前先移除上一个 不然可能报错

    [inputNode removeTapOnBus:0];

    [inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {

        __strong typeof(wekself) strongself = wekself;

        if (strongself.recognitionRequest) {

            [strongself.recognitionRequest appendAudioPCMBuffer:buffer];

        }

    }];

    [self.audioEngine prepare];

    [self.audioEngine startAndReturnError:&error];

    NSParameterAssert(!error);

    self.resultStringLabel.text = LoadingText;

}

1.利用AVAudioSession对象进行音频录制的配置。

2.在语音识别产生最终结果之前可能产生多种结果,设置SFSpeechAudioBufferRecognitionRequest对象的shouldReportPartialResult属性为YES意味着每产生一种结果就马上返回。

3.设置音频录制的格式及音频流回调的处理(把音频流拼接到self.recognitionRequest)。

4.为self.recordButton添加点击事件。

5.开始录制音频。

6.修改按钮文案。

4、重置语音识别引擎

添加以下代码:

- (void)endRecording {

    [self.audioEngine stop];

    if (_recognitionRequest) {

        [_recognitionRequest endAudio];

    }

    if (_recognitionTask) {

        [_recognitionTask cancel];

        _recognitionTask = nil;

    }

    self.recordButton.enabled = NO;

    if ([self.resultStringLabel.text isEqualToString:LoadingText]) {

        self.resultStringLabel.text = @"";

    }

}

1.为self.recordButton添加禁用点击事件。

2.停止音频录制引擎。

3.停止识别器。

4.修改按钮文案。

5、语音识别结果的回调

下面是语音识别器SFSpeechRecognizer的API描述:

// Recognize speech utterance with a request

// If request.shouldReportPartialResults is true, result handler will be called

// repeatedly with partial results, then finally with a final result or an error.

- (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request

                                          resultHandler:(void (^)(SFSpeechRecognitionResult * __nullable result, NSError * __nullable error))resultHandler;

// Advanced API: Recognize a custom request with with a delegate

// The delegate will be weakly referenced by the returned task

- (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request

                                              delegate:(id <SFSpeechRecognitionTaskDelegate>)delegate;

语音识别结果的回调有两种方式,一种是delegate,一种是block,这里为了简单,先采用block的方式回调。

6、识别音频文件

添加以下代码

/**

识别本地音频文件

*/

- (IBAction)recognizeLocalAudioFile {

    NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];

    SFSpeechRecognizer *localRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];

    NSURL *url = [[NSBundle mainBundle] URLForResource:@"录音.m4a" withExtension:nil];

    if (!url) return;

    SFSpeechURLRecognitionRequest *res = [[SFSpeechURLRecognitionRequest alloc] initWithURL:url];

    __weak typeof(self) wekself = self;

    [localRecognizer recognitionTaskWithRequest:res resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {

        if (error) {

            NSString *errMsg = [NSString stringWithFormat:@"语音识别解析失败, %@", error];

            [BaseViewController hudWithTitle:errMsg];

            NSLog(@"%@", errMsg);

        } else {

            wekself.resultStringLabel.text = result.bestTranscription.formattedString;

        }

    }];

}

1.初始化语音识别器SFSpeechRecognizer。

2.获取音频文件路径。

3.初始化语音识别请求SFSpeechURLRecognitionRequest。

4.设置回调。

三、总结

本文章主要介绍了如何利用iOS系统自带的Speech Kit框架实现音频转文字的功能,Speech Kit相当强大,本文章只是非常简单的介绍了录音识别及音频文件识别而已,大家有兴趣可以深入研究,有问题也可以一起探讨。

Demo地址:https://github.com/jayZhangh/PhotosFrameworkBasicUsage.git

四、参考

https://swift.gg/2016/09/30/siri-speech-framework/

https://developer.apple.com/videos/play/wwdc2016/509/

https://www.raywenderlich.com/2422-building-an-ios-app-like-siri

上一篇 下一篇

猜你喜欢

热点阅读