iOS下WebRTC视频解码

2017-10-19 本文已影响380人音视频直播技术专家

前言

今天介绍一下 iOS下WebRTC 是如何进行视频解码的。关于iOS下WebRTC视频采集与编码可以看下面的文章：

解码的基本流程

与编码器流程基本一致，流程如下：

创建解码器实例。
配置解码器。
解码。
释放解码器。

下面我们看一下几个关键函数的原型。

创建解器

在 iOS 中使用 VTDecompressionSessionCreate 方法创建解码器。其函数原型如下：

VTDecompressionSessionCreate(
    allocator: CFAllocator, //session分配器，NULL使用默认分配器
    videoFormatDescription: CMVideoFormatDescription, //视频帧格式描述信息
    videoDecoderSpecification: CFDictionary, //视频解码器，如果NULL，表示让VideoToolbox选择视频解码器。
    destinationImageBufferAttributes: CFDictionary, //像素缓冲区要求的属性
    outputCallback: VTDecompressionOutputCallbackRecord *, //解码完一帧后的回调函数。
    decompressionSessionOut: VTDecompressionSession* //创建出的解码Session实例。
) -> OSStatus

各参数详细介绍：

allocator : session分配器，NULL使用默认分配器。
videoFormatDescription : 源视频帧格式描述信息。
videoDecoderSpecification : 视频解码器。如果是NULL表式让 VideoToolbox自己选择视频解码器。
destinationImageBufferAttributes: 像素缓冲区要求的属性。
outputCallback: 解码后的回调函数。
decompressionSessionOut: 输出Session实列。

配置参数

在iOS下通过VTSessionSetProperty函数来配置 CompressionSession。函数原型如下：

VTSessionSetProperty(
    session: VTSession, //就是上面创建的 VTCompressionSession 对象。
    propertyKey: CFString, // 属性
    propertyValue: CFTypeRef //属性值
) -> OSStatus //返回的状态值

其与编码器的设置是一样的。在设置解码器时只需要设置是否实时解码即可。

解码

使用 VTDecompressionSessionDecodeFrame 函数进行解码。其原型如下：

OSStatus
VTDecompressionSessionDecodeFrame(
    VTDecompressionSessionRef   session, //解码器 Session
    CMSampleBufferRef   sampleBuffer, // 源视频帧
    VTDecodeFrameFlags  decodeFlags, // 解码标志位。bit 0 is enableAsynchronousDecompression
    void *  sourceFrameRefCon,  //用户自定义参数指针。
    VTDecodeInfoFlags * infoFlagsOut //解码输出标志
)

session : 创建解码器时创建的 Session。
sampleBuffer : 准备被解码的视频帧。
decodeFlags : 解码标志符。 0:代表异步解码。
sourceFrameRefCon : 用户自定义参数。
infoFlagsOut : 输出参数标记。

释放解码器

与释放编码器一样，iOS下使用 VTDecompressionSessionInvalidate 释放解码器。其原型如下：

- (void)destroyCompressionSession {
  if (_compressionSession) {
    VTCompressionSessionInvalidate(_compressionSession);
    CFRelease(_compressionSession);
    _compressionSession = nullptr;
  }
}

看看WebRTC是如何使用解码器的

与编码器一样，WebRTC专门写了一个类，用于封装iOS解码相关的操作。文件地址如下：

webrtc/sdk/obj/Framework/Classes/Video/VideoToolbox/RTCVideoDecoderH264.mm

创建解码器

在 RTCVideoDecoderH264.mm:208 行调用了 VTDecompressionSessionCreate 函数。用于创建解码器。代码如下：

OSStatus status = VTDecompressionSessionCreate(
      nullptr, 
      _videoFormat, 
      nullptr, 
      attributes, 
      &record, 
      &_decompressionSession
);

WebRTC在创建解码器Session时，使用默认Session分配器。
第2个参数 _videoFormat 存放的是视频解码格式。它是通过解析 sps, pps 得到的。在iOS WebRTC中，并不是先创建解码器，然后再开始接收数据进行解码的。而是反过来先接收视频数据，然后解码。在解码的过程中判断是不是 SPS,PPS包。如果是的话，这时才正式创建解码器。不明白的同学可以仔细看一下 decode()函数
第3个参数设置为 null，表式让 VideoToolbox 自己选择解码器。

第4个参数 attributes 设置如下：

static size_t const attributesSize = 3;
CFTypeRef keys[attributesSize] = {
#if defined(WEBRTC_IOS)
    kCVPixelBufferOpenGLESCompatibilityKey,
#elif defined(WEBRTC_MAC)
    kCVPixelBufferOpenGLCompatibilityKey,
#endif
    kCVPixelBufferIOSurfacePropertiesKey,
    kCVPixelBufferPixelFormatTypeKey
};

CFDictionaryRef ioSurfaceValue = CreateCFTypeDictionary(nullptr, nullptr, 0);
int64_t nv12type = kCVPixelFormatType_420YpCbCr8BiPlanarFullRange;
CFNumberRef pixelFormat = CFNumberCreate(nullptr, kCFNumberLongType, &nv12type);
CFTypeRef values[attributesSize] = {kCFBooleanTrue, ioSurfaceValue, pixelFormat};

CFDictionaryRef attributes = CreateCFTypeDictionary(keys, values, attributesSize);

指明了解码后yuv数据格式，与OpenGL的兼容性。

第5个参数，指明解码后的回调函数是record. 其设置如下：
```
VTDecompressionOutputCallbackRecord record = {
    decompressionOutputCallback, //回调函数
    nullptr,//回调函数参数
};
```
VTDecompressionOutputCallbackRecord类型是一个结构体，它由真正的回调函数和回调函数指针参数构成。下面是回调函数原型：
```
typedef void (*VTDecompressionOutputCallback)(
    void * decompressionOutputRefCon,
    void * sourceFrameRefCon,
    OSStatus status, 
    VTDecodeInfoFlags infoFlags,
    CM_NULLABLE CVImageBufferRef imageBuffer,
    CMTime presentationTimeStamp, 
    CMTime presentationDuration );
    
struct VTDecompressionOutputCallbackRecord {
    VTDecompressionOutputCallback  decompressionOutputCallback;
    void * decompressionOutputRefCon;
};

typedef struct VTDecompressionOutputCallbackRecord VTDecompressionOutputCallbackRecord;
```
VTDecompressionOutputCallback函数参数解释如下：
- decompressionOutputRefCon: 是 VTDecompressionOutputCallbackRecord结构体的第2个参数的引用。
- sourceFrameRefCon: 是VTDecompressionSessionDecodeFrame函数的sourceFrameRefCon参数的引用。
- 存放解码成功还是失败的状态。
- infoFlags: 包括了下面转码操作的信息
  - 如果设置了 kVTDecodeInfo_Asynchronous 则是异步解码。
  - 如果设置了 kVTDecodeInfo_FrameDropped 则可以丢帧。
  - 如果设置了 kVTDecodeInfo_ImageBufferModifiable 可以安全的修改 imageBuffer.
- imageBuffer: 存放解码后的视频数据。
- presentationTimeStamp: 存放 pts。

配置解码器参数

在解码中，主要是设置实时解码。代码如下：

...

#if defined(WEBRTC_IOS)
  VTSessionSetProperty(_decompressionSession, kVTDecompressionPropertyKey_RealTime, kCFBooleanTrue);
#endif

...

解码

在WebRTC中，调用 decode()函数进行解码。该函数最终调用 iOS的系统函数 VTDecompressionSessionDecodeFrame 进行解码。

- (NSInteger)decode:(RTCEncodedImage *)inputImage
          missingFrames:(BOOL)missingFrames
    fragmentationHeader:(RTCRtpFragmentationHeader *)fragmentationHeader
      codecSpecificInfo:(__nullable id<RTCCodecSpecificInfo>)info
           renderTimeMs:(int64_t)renderTimeMs {
       
       ...
       
       VTDecodeFrameFlags decodeFlags = kVTDecodeFrame_EnableAsynchronousDecompression;
           
        status = VTDecompressionSessionDecodeFrame(
                _decompressionSession, 
                sampleBuffer, 
                decodeFlags, 
                frameDecodeParams.release(), 
                nullptr);
        
        ...

这里需要重点讲一下 VTDecompressionSessionDecodeFrame 的第3和第4个参数。

第3个参数 decodeFlags: 设置为 kVTDecodeFrame_EnableAsynchronousDecompression，表示开启异步方式进行解码。
第4个参数 frameDecodeParams 是用户自定义参数。它里面存放了解码的 callback 函数。在 WebRTC中，frameDecodeParams 设置如下：
```
frameDecodeParams.reset(new RTCFrameDecodeParams(
    _callback, 
    inputImage.timeStamp
));
```
自定义参数里，设置了另外一个回调函数。也就是说，视频在解码后，调用解码回调函数。而在解码回调函数中，又调用了frameDecodeParams里设置的回调函数。这块稍微有点绕，我们看一下回调的代码就清楚是怎么回事了。

解码回调函数

对该回调函数的参数说明前面已经介绍过了，回调函数具体的逻辑操作过程见代码中的注释。

// This is the callback function that VideoToolbox calls when decode is complete.
void decompressionOutputCallback(void *decoder,
                             void *params,
                             OSStatus status,
                             VTDecodeInfoFlags infoFlags,
                             CVImageBufferRef imageBuffer,
                             CMTime timestamp,
                             CMTime duration) {

// 该函数的第2个参数就是解码方法中设置的用户自定义参数

// 先进行类型转换
std::unique_ptr<RTCFrameDecodeParams> decodeParams(
  reinterpret_cast<RTCFrameDecodeParams *>(params));

// status存放着转码状态，如果不是noErr说明转码失败了。 
if (status != noErr) {
    LOG(LS_ERROR) << "Failed to decode frame. Status: " << status;
    return;
}

// TODO(tkchin): Handle CVO properly.
// 将转码后的数据重新进行了封装，以便于标准 C++ 使用。
RTCCVPixelBuffer *frameBuffer = [[RTCCVPixelBuffer alloc] initWithPixelBuffer:imageBuffer];
RTCVideoFrame *decodedFrame =
  [[RTCVideoFrame alloc] initWithBuffer:frameBuffer
                               rotation:RTCVideoRotation_0
                            timeStampNs:CMTimeGetSeconds(timestamp) * rtc::kNumNanosecsPerSec];

//在这里又调了另外一个回调函数。也就是我们解码时设置在自定议参数里的回调函数。
decodedFrame.timeStamp = decodeParams->timestamp;
decodeParams->callback(decodedFrame);
}

在上面回调函数里的 callback 函数是在 objc_video_decoder_factory.mm:72行定义的。WebRTC解码后就会一层一层的回调上去。最终将解码后的数据交给使用者，如视频渲染模块。

小结

本文首先介绍了iOS下 WebRTC 解码用到的几个函数的原型及其参数的详细介绍。最后以 WebRTC为例，讲解了 WebRTC是如何使用这几个函数的。其中还介绍了一些 WebRTC处理数据的逻辑流程。

希望本文能对你有所帮助，并请多多关注。谢谢！