iOS音视频开发音视频开发经验之路iOS实战

iOS完整推流采集音视频数据编码同步合成流

2019-07-12  本文已影响19人  小东邪啊

需求

众所周知,原始的音视频数据无法直接在网络上传输,推流需要编码后的音视频数据以合成的视频流,如flv, mov, asf流等,根据接收方需要的格式进行合成并传输,这里以合成asf流为例,讲述一个完整推流过程:即音视频从采集到编码,同步合成asf视频流,然后可以将流进行传输,为了方便,本例将合成的视频流写入一个asf文件中,以供测试.

注意: 测试需要使用终端通过: ffplay播放demo中录制好的文件,因为asf是windows才支持的格式,mac自带播放器无法播放.


实现原理


阅读前提


代码地址 : iOS完整推流

掘金地址 : iOS完整推流

简书地址 : iOS完整推流

博客地址 : iOS完整推流


总体架构

1.mux

对于iOS而言,我们可以通过底层API捕获视频帧与音频帧数据,捕获视频帧使用AVFoundation框架中的AVCaptureSession, 其实它同时也可以捕获音频数据,而因为我们想使用最低延时与最高音质的音频, 所以需要借助最底层的音频捕捉框架Audio Unit,然后使用VideoToolbox框架中的VTCompressionSessionRef可以对视频数据进行编码,使用AudioConverter可以对音频数据进行编码,我们在采集时可以将第一帧I帧产生时的系统时间作为音视频时间戳的一个起点,往后的视频说都基于此,由此可扩展做音视频同步方案,最终,我们将比那编码好的音视频数据通过FFmpeg进行合成,这里以asf流为例进行合成,并将生成好的asf流写入文件,以供测试. 生成好的asf流可直接用于网络传输.

简易流程

采集视频

采集音频

编码视频数据

编码音频数据

音视频同步

以编码的第一帧视频的系统时间作为音视频数据的基准时间戳,随后将采集到音视频数据中的时间戳减去该基准时间戳作为各自的时间戳, 同步有两种策略,一种是以音频时间戳为准, 即当出现错误时,让视频时间戳去追音频时间戳,这样做即会造成看到画面会快进或快退,二是以视频时间戳为准,即当出现错误时,让音频时间戳去追视时间戳,即声音可能会刺耳,不推荐.所以一般使用第一种方案,通过估计下一帧视频时间戳看看如果超出同步范围则进行同步.

FFmpeg合成数据流

文件结构

2.file

快速使用

- (void)viewDidLoad {
    [super viewDidLoad];
    // Do any additional setup after loading the view.
    
    [self configureCamera];
    [self configureAudioCapture];
    [self configureAudioEncoder];
    [self configurevideoEncoder];
    [self configureAVMuxHandler];
    [self configureAVRecorder];
}
- (void)xdxCaptureOutput:(AVCaptureOutput *)output didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {
    if ([output isKindOfClass:[AVCaptureVideoDataOutput class]] == YES) {
        if (self.videoEncoder) {
            [self.videoEncoder startEncodeDataWithBuffer:sampleBuffer
                                        isNeedFreeBuffer:NO];
            
        }
        
    }
}
#pragma mark Video Encoder
- (void)receiveVideoEncoderData:(XDXVideEncoderDataRef)dataRef {
    [self.muxHandler addVideoData:dataRef->data size:(int)dataRef->size timestamp:dataRef->timestamp isKeyFrame:dataRef->isKeyFrame isExtraData:dataRef->isExtraData videoFormat:XDXMuxVideoFormatH264];
}

#pragma mark Audio Capture and Audio Encode
- (void)receiveAudioDataByDevice:(XDXCaptureAudioDataRef)audioDataRef {
    [self.audioEncoder encodeAudioWithSourceBuffer:audioDataRef->data
                                  sourceBufferSize:audioDataRef->size
                                               pts:audioDataRef->pts
                                   completeHandler:^(XDXAudioEncderDataRef dataRef) {
                                       if (dataRef->size > 10) {
                                           [self.muxHandler addAudioData:(uint8_t *)dataRef->data
                                                                    size:dataRef->size
                                                              channelNum:1
                                                              sampleRate:44100
                                                               timestamp:dataRef->pts];                                           
                                       }
                                       free(dataRef->data);
                                   }];
}
#pragma mark Mux
- (IBAction)startRecordBtnDidClicked:(id)sender {
    int size = 0;
    char *data = (char *)[self.muxHandler getAVStreamHeadWithSize:&size];
    [self.recorder startRecordWithIsHead:YES data:data size:size];
    self.isRecording = YES;
}


- (void)receiveAVStreamWithIsHead:(BOOL)isHead data:(uint8_t *)data size:(int)size {
    if (isHead) {
        return;
    }
    
    if (self.isRecording) {
        [self.recorder startRecordWithIsHead:NO data:(char *)data size:size];
    }
}

具体实现

本例中音视频采集编码模块在前面文章中已经详细介绍,这里不再重复,如需帮助请参考上文的阅读前提.下面仅介绍合成流.

1. 初始化FFmpeg相关对象.

- (void)configureFFmpegWithFormat:(const char *)format {
    if(m_outputContext != NULL) {
        av_free(m_outputContext);
        m_outputContext = NULL;
    }
    
    m_outputContext = avformat_alloc_context();
    m_outputFormat  = av_guess_format(format, NULL, NULL);
    
    m_outputContext->oformat    = m_outputFormat;
    m_outputFormat->audio_codec = AV_CODEC_ID_NONE;
    m_outputFormat->video_codec = AV_CODEC_ID_NONE;
    m_outputContext->nb_streams = 0;
    
    m_video_stream     = avformat_new_stream(m_outputContext, NULL);
    m_video_stream->id = 0;
    m_audio_stream     = avformat_new_stream(m_outputContext, NULL);
    m_audio_stream->id = 1;
    
    log4cplus_info(kModuleName, "configure ffmpeg finish.");
}

2. 配置视频流的详细信息

设置该编码的视频流中详细的信息, 如编码器类型,配置信息,原始视频数据格式,视频的宽高,比特率,帧率,基准时间戳,extra data等.

这里最重要的就是extra data,注意,因为我们要根据extra data才能生成正确的头数据,而asf流需要的是annux b格式的数据,苹果采集的视频数据格式为avcc所以在编码模块中已经将其转为annux b格式的数据,并通过参数传入,这里可以直接使用,关于这两种格式区别也可以参考阅读前提中的码流介绍的文章.

- (void)configureVideoStreamWithVideoFormat:(XDXMuxVideoFormat)videoFormat extraData:(uint8_t *)extraData extraDataSize:(int)extraDataSize {
    if (m_outputContext == NULL) {
        log4cplus_error(kModuleName, "%s: m_outputContext is null",__func__);
        return;
    }
    
    if(m_outputFormat == NULL){
        log4cplus_error(kModuleName, "%s: m_outputFormat is null",__func__);
        return;
    }

    AVFormatContext *formatContext = avformat_alloc_context();
    AVStream *stream = NULL;
    if(XDXMuxVideoFormatH264 == videoFormat) {
        AVCodec *codec = avcodec_find_encoder(AV_CODEC_ID_H264);
        stream = avformat_new_stream(formatContext, codec);
        stream->codecpar->codec_id = AV_CODEC_ID_H264;
    }else if(XDXMuxVideoFormatH265 == videoFormat) {
        AVCodec *codec = avcodec_find_encoder(AV_CODEC_ID_HEVC);
        stream = avformat_new_stream(formatContext, codec);
        stream->codecpar->codec_tag      = MKTAG('h', 'e', 'v', 'c');
        stream->codecpar->profile        = FF_PROFILE_HEVC_MAIN;
        stream->codecpar->format         = AV_PIX_FMT_YUV420P;
        stream->codecpar->codec_id       = AV_CODEC_ID_HEVC;
    }
    
    stream->codecpar->format             = AV_PIX_FMT_YUVJ420P;
    stream->codecpar->codec_type         = AVMEDIA_TYPE_VIDEO;
    stream->codecpar->width              = 1280;
    stream->codecpar->height             = 720;
    stream->codecpar->bit_rate           = 1024*1024;
    stream->time_base.den                = 1000;
    stream->time_base.num                = 1;
    stream->time_base                    = (AVRational){1, 1000};
    stream->codec->flags                |= AV_CODEC_FLAG_GLOBAL_HEADER;
    
    memcpy(m_video_stream, stream, sizeof(AVStream));
    
    if(extraData) {
        int newExtraDataSize = extraDataSize + AV_INPUT_BUFFER_PADDING_SIZE;
        m_video_stream->codecpar->extradata_size = extraDataSize;
        m_video_stream->codecpar->extradata      = (uint8_t *)av_mallocz(newExtraDataSize);
        memcpy(m_video_stream->codecpar->extradata, extraData, extraDataSize);
    }
    
    av_free(stream);

    m_outputContext->video_codec_id = m_video_stream->codecpar->codec_id;
    m_outputFormat->video_codec     = m_video_stream->codecpar->codec_id;
    
    self.isReadyForVideo = YES;
    
    [self productStreamHead];
}

3.配置音频流的详细信息

首先根据编码音频的类型生成编码器并生成流对象,然后 配置音频流的详细信息,如压缩数据格式,采样率,声道数,比特率,extra data等等.这里要注意的是extra data是为了保存mp4文件时播放器能够正确解码播放准备的,可以参考这几篇文章:audio extra data1,audio extra data2

- (void)configureAudioStreamWithChannelNum:(int)channelNum sampleRate:(int)sampleRate {
    AVFormatContext *formatContext  = avformat_alloc_context();
    AVCodec         *codec          = avcodec_find_encoder(AV_CODEC_ID_AAC);
    AVStream        *stream         = avformat_new_stream(formatContext, codec);
    
    stream->index         = 1;
    stream->id            = 1;
    stream->duration      = 0;
    stream->time_base.num = 1;
    stream->time_base.den = 1000;
    stream->start_time    = 0;
    stream->priv_data     = NULL;
    
    stream->codecpar->codec_type     = AVMEDIA_TYPE_AUDIO;
    stream->codecpar->codec_id       = AV_CODEC_ID_AAC;
    stream->codecpar->format         = AV_SAMPLE_FMT_S16;
    stream->codecpar->sample_rate    = sampleRate;
    stream->codecpar->channels       = channelNum;
    stream->codecpar->bit_rate       = 0;
    stream->codecpar->extradata_size = 2;
    stream->codecpar->extradata      = (uint8_t *)malloc(2);
    stream->time_base.den  = 25;
    stream->time_base.num  = 1;
    
    /*
     * why we put extra data here for audio: when save to MP4 file, the player can not decode it correctly
     * http://ffmpeg-users.933282.n4.nabble.com/AAC-decoder-td1013071.html
     * http://ffmpeg.org/doxygen/trunk/mpeg4audio_8c.html#aa654ec3126f37f3b8faceae3b92df50e
     * extra data have 16 bits:
     * Audio object type - normally 5 bits, but 11 bits if AOT_ESCAPE
     * Sampling index - 4 bits
     * if (Sampling index == 15)
     * Sample rate - 24 bits
     * Channel configuration - 4 bits
     * last reserved- 3 bits
     * for exmpale:  "Low Complexity Sampling frequency 44100Hz, 1 channel mono":
     * AOT_LC == 2 -> 00010
     -              * 44.1kHz == 4 -> 0100
     +              * 44.1kHz == 4 -> 0100  48kHz == 3 -> 0011
     * mono == 1 -> 0001
     * so extra data: 00010 0100 0001 000 ->0x12 0x8
     +                  00010 0011 0001 000 ->0x11 0x88
     +
     */
    
    if (stream->codecpar->sample_rate == 44100) {
        stream->codecpar->extradata[0] = 0x12;
        //iRig mic HD have two chanel 0x11
        if(channelNum == 1)
            stream->codecpar->extradata[1] = 0x8;
        else
            stream->codecpar->extradata[1] = 0x10;
    }else if (stream->codecpar->sample_rate == 48000) {
        stream->codecpar->extradata[0] = 0x11;
        //iRig mic HD have two chanel 0x11
        if(channelNum == 1)
            stream->codecpar->extradata[1] = 0x88;
        else
            stream->codecpar->extradata[1] = 0x90;
    }else if (stream->codecpar->sample_rate == 32000){
        stream->codecpar->extradata[0] = 0x12;
        if (channelNum == 1)
            stream->codecpar->extradata[1] = 0x88;
        else
            stream->codecpar->extradata[1] = 0x90;
    }
    else if (stream->codecpar->sample_rate == 16000){
        stream->codecpar->extradata[0] = 0x14;
        if (channelNum == 1)
            stream->codecpar->extradata[1] = 0x8;
        else
            stream->codecpar->extradata[1] = 0x10;
    }else if(stream->codecpar->sample_rate == 8000){
        stream->codecpar->extradata[0] = 0x15;
        if (channelNum == 1)
            stream->codecpar->extradata[1] = 0x88;
        else
            stream->codecpar->extradata[1] = 0x90;
    }
    
    stream->codec->flags|= AV_CODEC_FLAG_GLOBAL_HEADER;
    
    memcpy(m_audio_stream, stream, sizeof(AVStream));
    
    av_free(stream);
    
    m_outputContext->audio_codec_id = stream->codecpar->codec_id;
    m_outputFormat->audio_codec     = stream->codecpar->codec_id;
    
    self.isReadyForAudio = YES;

    [self productStreamHead];
}

4.生成流头数据

当前面2,3部都配置完成后,我们将音视频流注入上下文对象及对象中的流格式中,即可开始生成头数据.avformat_write_header

- (void)productStreamHead {
    log4cplus_debug("record", "%s,line:%d",__func__,__LINE__);
    
    if (m_outputFormat->video_codec == AV_CODEC_ID_NONE) {
        log4cplus_error(kModuleName, "%s: video codec is NULL.",__func__);
        return;
    }
    
    if(m_outputFormat->audio_codec == AV_CODEC_ID_NONE) {
        log4cplus_error(kModuleName, "%s: audio codec is NULL.",__func__);
        return;
    }
    
    /* prepare header and save header data in a stream */
    if (avio_open_dyn_buf(&m_outputContext->pb) < 0) {
        avio_close_dyn_buf(m_outputContext->pb, NULL);
        log4cplus_error(kModuleName, "%s: AVFormat_HTTP_FF_OPEN_DYURL_ERROR.",__func__);
        return;
    }
        
    /*
     * HACK to avoid mpeg ps muxer to spit many underflow errors
     * Default value from FFmpeg
     * Try to set it use configuration option
     */
    m_outputContext->max_delay = (int)(0.7*AV_TIME_BASE);
        
    int result = avformat_write_header(m_outputContext,NULL);
    if (result < 0) {
        log4cplus_error(kModuleName, "%s: Error writing output header, res:%d",__func__,result);
        return;
    }
        
    uint8_t * output = NULL;
    int len = avio_close_dyn_buf(m_outputContext->pb, (uint8_t **)(&output));
    if(len > 0 && output != NULL) {
        av_free(output);
        
        self.isReadyForHead = YES;
        
        if (m_avhead_data) {
            free(m_avhead_data);
        }
        m_avhead_data_size = len;
        m_avhead_data = (uint8_t *)malloc(len);
        memcpy(m_avhead_data, output, len);
        
        if ([self.delegate respondsToSelector:@selector(receiveAVStreamWithIsHead:data:size:)]) {
            [self.delegate receiveAVStreamWithIsHead:YES data:output size:len];
        }
        
        log4cplus_error(kModuleName, "%s: create head length = %d",__func__, len);
    }else{
        self.isReadyForHead = NO;
        log4cplus_error(kModuleName, "%s: product stream header failed.",__func__);
    }
}

5. 然后将传来的音视频数据装入数组中

该数组通过封装C++中的vector实现一个轻量级数据结构以缓存数据.

6.合成音视频数据

新建一条线程专门合成音视频数据,合成策略即取出音视频数据中时间戳较小的一帧先写,因为音视频数据总体偏差不大,所以理想情况应该是取一帧视频,一帧音频,当然因为音频采样较快,可能会相对多一两帧,而当音视频数据由于某种原因不同步时,则会等待,直至时间戳重新同步才能继续进行合成.

    int err = pthread_create(&m_muxThread,NULL,MuxAVPacket,(__bridge_retained void *)self);
    if(err != 0){
        log4cplus_error(kModuleName, "%s: create thread failed: %s",__func__, strerror(err));
    }
    
    void * MuxAVPacket(void *arg) {
    pthread_setname_np("XDX_MUX_THREAD");
    XDXAVStreamMuxHandler *instance = (__bridge_transfer XDXAVStreamMuxHandler *)arg;
    if(instance != nil) {
        [instance dispatchAVData];
    }
    
    return NULL;
}

#pragma mark Mux
- (void)dispatchAVData {
    XDXMuxMediaList audioPack;
    XDXMuxMediaList videoPack;
    
    memset(&audioPack, 0, sizeof(XDXMuxMediaList));
    memset(&videoPack, 0, sizeof(XDXMuxMediaList));
    
    [m_AudioListPack reset];
    [m_VideoListPack reset];

    while (true) {
        int videoCount = [m_VideoListPack count];
        int audioCount = [m_AudioListPack count];
        if(videoCount == 0 || audioCount == 0) {
            usleep(5*1000);
            log4cplus_debug(kModuleName, "%s: Mux dispatch list: v:%d, a:%d",__func__,videoCount, audioCount);
            continue;
        }
        
        if(audioPack.timeStamp == 0) {
            [m_AudioListPack popData:&audioPack];
        }
        
        if(videoPack.timeStamp == 0) {
            [m_VideoListPack popData:&videoPack];
        }
        
        if(audioPack.timeStamp >= videoPack.timeStamp) {
            log4cplus_debug(kModuleName, "%s: Mux dispatch input video time stamp = %llu",__func__,videoPack.timeStamp);
            
            if(videoPack.data != NULL && videoPack.data->data != NULL){
                [self addVideoPacket:videoPack.data
                           timestamp:videoPack.timeStamp
                 extraDataHasChanged:videoPack.extraDataHasChanged];
                
                av_free(videoPack.data->data);
                av_free(videoPack.data);
            }else{
                log4cplus_error(kModuleName, "%s: Mux Video AVPacket data abnormal",__func__);
            }
            videoPack.timeStamp = 0;
        }else {
            log4cplus_debug(kModuleName, "%s: Mux dispatch input audio time stamp = %llu",__func__,audioPack.timeStamp);
            
            if(audioPack.data != NULL && audioPack.data->data != NULL) {
                [self addAudioPacket:audioPack.data
                           timestamp:audioPack.timeStamp];
                av_free(audioPack.data->data);
                av_free(audioPack.data);
            }else {
                log4cplus_error(kModuleName, "%s: Mux audio AVPacket data abnormal",__func__);
            }
            
            audioPack.timeStamp = 0;
        }
    }
}

7.获取合成好的视频流

通过av_write_frame即可获取合成好的数据.

- (void)productAVDataPacket:(AVPacket *)packet extraDataHasChanged:(BOOL)extraDataHasChanged {
    BOOL    isVideoIFrame = NO;
    uint8_t *output       = NULL;
    int     len           = 0;
    
    if (avio_open_dyn_buf(&m_outputContext->pb) < 0) {
        return;
    }
    
    if(packet->stream_index == 0 && packet->flags != 0) {
        isVideoIFrame = YES;
    }
    
    if (av_write_frame(m_outputContext, packet) < 0) {
        avio_close_dyn_buf(m_outputContext->pb, (uint8_t **)(&output));
        if(output != NULL)
            free(output);
        
        log4cplus_error(kModuleName, "%s: Error writing output data",__func__);
        return;
    }
    
    
    len = avio_close_dyn_buf(m_outputContext->pb, (uint8_t **)(&output));
    
    if(len == 0 || output == NULL) {
        log4cplus_debug(kModuleName, "%s: mux len:%d or data abnormal",__func__,len);
        if(output != NULL)
            av_free(output);
        return;
    }
        
    if ([self.delegate respondsToSelector:@selector(receiveAVStreamWithIsHead:data:size:)]) {
        [self.delegate receiveAVStreamWithIsHead:NO data:output size:len];
    }
    
    if(output != NULL)
        av_free(output);
}

上一篇下一篇

猜你喜欢

热点阅读