iOS下WebRTC视频编码

2017-09-28 本文已影响389人音视频直播技术专家

前言

在 iOS下WebRTC视频采集一文中，向大家介绍了 WebRTC 是如何在 iOS下进行视频采集的。本文则介绍一下 iOS 下 WebRTC 是如何进行视频编码的。

WebRTC在初始化时，先要创建并配置好编码器，然后开始采集视频数据。视频采集到一帧数据后，通过回调接口，将采集到的数据传递给VideoStreamEncoder 类的 OnFrame() 函数。在该函数中，会为每一个视频帧创建一个 EncodeTask 任务，并将其插入到编码队列中。

而在编码线程，则不断的从编码队列中取出任务进行编码，并最终通过编码器的回调函数将编码后的数据输出。

通过上面的描述，我们知道有两个重要的回调函数，一个是在Camera采集到视频数据后进行回调；另一个是在编码完成后进行回调。

在WebRTC中大量使用了回调函数。回调函数是一条主线，大家要牢记。否则很难弄懂WebRTC的代码。

iOS基本数据结构

在详细介绍WebRTC编码之前，首先我们介绍一下在iOS下视频编码经常使用的一些基本数据结构。这些数据结构对我们阅读 WebRTC 代码也是至关重要的。

CV: CoreVideo
CM: CoreMedia
VT: VideoToolbox

CMSampleBuffer: 存放视频数据的容器。它即可以存放原始视频数据，也可以存放编码后的视频数据。
CMVideoFormatDescription: 存放图像信息的数据结构，如宽/高
格式类型(kCMPixelFormat_32BGRA, kCMVideoCodecType_H264,…)，扩展(像素宽高比,颜色空间,…)。
CVPixelBuffer: 存放未压缩／未编码的原始数据。
CVPixelBufferPool: CVPixelBuffer 对象池。
pixelBufferAttributes: 存放视频的宽／高，像素类型（32BGRA, YCbCr420），兼容性 (OpenGL ES, Core Animation)
CMBlockBuffer: 存放编码后的视频数据。
CMTime/CMClock/CMTimebase: 存放时间戳。

下面是 CMSampleBuffer 的示意图：

�iOS系统函数介绍

在iOS下进行视频编码的最重要的数据类型就是 VTCompressionSession，它管理着 VideoEncoder。

编码的基本流程

创建编码器。
从 Camera 获取视频帧。获取到的视频帧是 CVPixelBuffers 类型。一般Camera采集数据都是每秒 30 帧。
通过 VTCompressionSession 管理的 VideoEncoder 对视频帧进行编码。
输出 H264 数据。它由 CMSampleBuffers 容器进行管理。

编码器的创建与初始化

创建 VTCompressionSession 对象。

VTCompressionSessionCreate( 
    allocator: CFAllocator,  //session分配器，NULL使用默认分配器
    width: Int32, //视频帧的像素宽度
    height: Int32, //视频帧的像素高度
    codecType: CMVideoCodecType, //编码类型，如 kCMVideoCodecType_H264
    encoderSpecification: CFDictionary, //使用的视频编码器，NULL让VideoToolbox自己选择。
    sourceImageBufferAttributes: CFDictionary, //指定源图像属性，如YUV类型为 NV12
    compressedDataAllocator: CFAllocator, //压缩数据分配器，NULL使用默认的分配器。
    outputCallback: VTCompressionOutputCallback, //编码后的回调函数。该函数会在不同的线程中异步调用。
    outputCallbackRefCon: UnsafeMutableRawPointer, //用户自定义的回调上下文，一般设置为NULL。
    compressionSessionOut: UnsafeMutablePointer<VTCompressionSession> //compression session的返回值。
) -> OSStatus //创建是否成功的状态

各参数详解：

allocator: session分配器，如果是NULL，表示使用默认分配器。
width: 视频帧的像素宽度。
height: 视频帧的像素高度。
codecType: 编码类型，如 kCMVideoCodecType_H264
encoderSpecification: 使用的视频编码器，如果是NULL，表式让VideoToolbox自己选择。
sourceImageBufferAttributes: 指定源图像属性，如YUV类型为 NV12。
compressedDataAllocator: 压缩数据分配器，NULL表式使用默认的分配器。
outputCallback: 编码后的回调函数。该函数会在不同的线程中被异步调用。
outputCallbackRefCon: 用户自定义的回调上下文，一般设置为NULL。
compressionSessionOut: 返回VTCompressionSession对象。

配置 CompressionSession

在iOS下通过VTSessionSetProperty函数来配置 CompressionSession。函数原型如下：

VTSessionSetProperty(
    session: VTSession, //就是上面创建的 VTCompressionSession 对象。
    propertyKey: CFString, // 属性
    propertyValue: CFTypeRef //属性值
) -> OSStatus //返回的状态值

一般情况下都会配置以下几项：

设置 RealTime, 即是否时实时编码。
设置 profile level。baseline, mainline, highlevel等。
设置是否允许录制。
设置平均比特率及最大码流。最大码流是平均比特率的 1.5 倍。
设置关键帧最大间隔为60fps。
设置关键帧间持续时间 240s, 4分钟。

对视频帧进行编码

调用 VTCompressionSessionEncodeFrame 进行编码，函数原型如下：

VTCompressionSessionEncodeFrame(
    session: VTCompressionSession, //上面定义的 Session
    imageBuffer: CVImageBuffer, // 它里面包含了被压缩的视频帧。
    presentationTimeStamp: CMTime, //pts
    duration: CMTime, // 没什么用
    frameProperties: CFDictionary, //k/v键值对，指明了额外的属性。
    sourceFrameRefCon: UnsafeMutableRawPointer, //可用于存放上下文，它将被透传给回调函数。
    infoFlagsOut: UnsafeMutablePointer<VTEncodeInfoFlags> //不知道啥作用
) -> OSStatus //返回的状态值

参数详细介绍：

session: CompressionSession 对象。
imageBuffer: 它里面包含了被压缩的视频帧。
presentationTimeStamp: pts，视频帧展示时的时间戳。
duration: 没什么用
frameProperties: k/v键值对，指明了额外的属性。
sourceFrameRefCon: 可用于存放上下文，它将被透传给回调函数。。
infoFlagsOut: 不知道啥作用。

看看WebRTC是如何做的

WebRTC专门写了一个类，用于封装iOS编码相关的操作。下面我们就详细看下 WebRTC 是如何使用 iOS 硬编码器的。

封装的文件位于 webrtc/sdk/obj/Framework/Classes/Video/VideoToolbox/RTCVideoEncoderH264.mm

如何创建编码器

在 RTCVideoEncoderH264.mm:512 调用了 VTCompressionSessionCreate 函数。

OSStatus status =
VTCompressionSessionCreate(nullptr,  // use default allocator
                         _width,
                         _height,
                         kCMVideoCodecType_H264,
                         encoder_specs,  // use hardware accelerated encoder if available
                         sourceAttributes,
                         nullptr,  // use default compressed data allocator
                         compressionOutputCallback,
                         nullptr,
                         &_compressionSession);

下面对 VTCompressionSessionCreate 几个重要参数做下分析：

第4个参数：kCMVideoCodecType_H264 指明该编码器是 H264 编码器。
第5个参数：encoder_specs 为 nullptr，VideoToolbox根据编码类型自己选择编码器。

第6个参数：sourceAttributes 指定了源图像与OpenGL ES兼容，使用IOSurface的默认选项以及YUV格式是 NV12 。

// Set source image buffer attributes. These attributes will be present on
// buffers retrieved from the encoder's pixel buffer pool.
const size_t attributesSize = 3;
CFTypeRef keys[attributesSize] = {
    #if defined(WEBRTC_IOS)
    kCVPixelBufferOpenGLESCompatibilityKey,
    #elif defined(WEBRTC_MAC)
    kCVPixelBufferOpenGLCompatibilityKey,
    #endif
    kCVPixelBufferIOSurfacePropertiesKey,
    kCVPixelBufferPixelFormatTypeKey
};
CFDictionaryRef ioSurfaceValue = CreateCFTypeDictionary(nullptr, nullptr, 0);
int64_t nv12type = kCVPixelFormatType_420YpCbCr8BiPlanarFullRange;
CFNumberRef pixelFormat = CFNumberCreate(nullptr, kCFNumberLongType, &nv12type);
CFTypeRef values[attributesSize] = {kCFBooleanTrue, ioSurfaceValue, pixelFormat};
CFDictionaryRef sourceAttributes = CreateCFTypeDictionary(keys, values, attributesSize);

第8个参数：编码后的回调函数。在该回调函数中可以做一些编码后的处理，并最终通过网络传输给远端。代码在 RTCVideoEncoderH264.mm:144

// This is the callback function that VideoToolbox calls when encode is
// complete. From inspection this happens on its own queue.
void compressionOutputCallback(void *encoder,
                               void *params,
                               OSStatus status,
                               VTEncodeInfoFlags infoFlags,
                               CMSampleBufferRef sampleBuffer) {
                               
......

}

至此，编码器的创建工作就完成了。下一步是配置编码器。

配置编码器

在 RTCVideoEncoderH264.mm:561 的 configureCompressionSession()中对编码器的配置工作。

- (void)configureCompressionSession {

  SetVTSessionProperty(_compressionSession, kVTCompressionPropertyKey_RealTime, true); //实时编码
  SetVTSessionProperty(_compressionSession, kVTCompressionPropertyKey_ProfileLevel, _profile);
  SetVTSessionProperty(_compressionSession, kVTCompressionPropertyKey_AllowFrameReordering, false);
  [self setEncoderBitrateBps:_targetBitrateBps];
  
  ......

  // Set a relatively large value for keyframe emission (7200 frames or 4 minutes).
  SetVTSessionProperty(_compressionSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, 7200);
  SetVTSessionProperty(
      _compressionSession, kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration, 240);
}

在WebRTC中，设置编码为实时编码，profile为WebRTC-H264HighProfile，不允许录制，关键帧最大间隔是7200。其中 setEncoderBitrateBps 函数是设置编码码流，代码如下：

VTSessionSetProperty(_compressionSession, kVTCompressionPropertyKey_AverageBitRate, bitrateBps);

......

VTSessionSetProperty(
        _compressionSession, kVTCompressionPropertyKey_DataRateLimits, dataRateLimits);

WebRTC会根据分辨率大小设置码流。而 DataRateLimits 是平均码流的1.5倍。

编码

配置好编码器后，就可以对 Camera 采集到的数据进行编码了。在 iOS 下WebRTC视频采集一文中，我们已经介绍了视频采集的过程。视频数据被采集后，最终会通过回调函数一层层传到 RTCVideoEncoderH264.mm:329 的encode()内。

最终调用 VTCompressionSessionEncodeFrame() 进行视频编码。

OSStatus status =
VTCompressionSessionEncodeFrame(_compressionSession,
                                pixelBuffer,
                                presentationTimeStamp,
                                kCMTimeInvalid,
                                frameProperties,
                                encodeParams.release(),
                                nullptr);

下面是该函数的重要参数的详细说明：

第2个参数：pixelBuffer 是 CVPixelBufferRef 类型，它指向需要编码的视频帧。

第5个参数：指明本次编码是否强制生成关键帧。

CFDictionaryRef frameProperties = nullptr;
if (isKeyframeRequired) {
    CFTypeRef keys[] = {kVTEncodeFrameOptionKey_ForceKeyFrame}; //强制生成关键帧
    CFTypeRef values[] = {kCFBooleanTrue};
    frameProperties = CreateCFTypeDictionary(keys, values, 1);
}

第6个参数：用户自定义的上下文，它会被透传给编码器的回调函数。

// Struct that we pass to the encoder per frame to encode. We receive it again
// in the encoder callback.
struct RTCFrameEncodeParams {   

    ......
    
}

数据被编码后，编码器就会调用，（创建CompressionSession时）注册的编码器回调函数做进一频的处理。

需要注意的是，该回调函数可以在不同的线程被异频调用。

释放编码器

当编码结束后，在iOS中需要主动释放编码器。释放的接口为VTCompressionSessionInvalidate()。代码如下：

- (void)destroyCompressionSession {
  if (_compressionSession) {
    VTCompressionSessionInvalidate(_compressionSession);
    CFRelease(_compressionSession);
    _compressionSession = nullptr;
  }
}

小结

上面详细介绍了WebRTC在 iOS 下进行编码的细节。本来还打算将数据从采集阶段到编码阶段的整个数据的流转描述清楚的，但由于水平所限，几经努力还是无法清晰，简明的描述它。

它的难点在于如果概略的描术就无法将 WebRTC 的细节讲清楚，而细节里又存在‘魔鬼’。如果描术的太细，篇幅又太长，人们的思想很难长时间高度集中。

虽然有以上困难，但我还是要完成这项工作，不过需要一些时间。

请大家多多观注，谢谢！