Metal与图形渲染五：链式架构的实现

2021-06-15 本文已影响0人肠粉白粥_Hoben

零. 前言

在之前提到的渲染指令都是单次渲染，但当我们需要复用之前渲染的结果的时候，单次渲染显然就不能满足我们的需求，因此，链式结构就应运而生了。在链式结构中，我们可以利用一次渲染产生的输出再次作为输入，最后渲染到屏幕上，例如，我们依旧采取Metal与图形渲染二：透明图片的渲染的例子，需要得到透明图片的效果。

我们之前的实现原理其实是一次渲染实现的：

之前的实现会导致所有渲染操作都堆在一次渲染，导致OC层、Metal层的代码全部一次性放在一个地方，难以维护。

这次，我们不再把代码堆砌到一次渲染中实现，而是用链式结构来实现这个效果：

而链式结构的代码会更加直观简洁，更重要的是，无论后续想复用Picture的纹理，亦或是某个Filter的纹理，只需要在该Filter再加一层链即可再次复用。

提起链式结构，就不得不提到大神库GPUImage3了，该库可以支持一次渲染多次使用，但由于该库语言是基于Swift来编写的，除此之外，GPUImage3在处理视频还有致命的高CPU和高内存问题，一个视频没播放完内存就已经爆了，搜了下issue，19年就有人提到相关问题，但作者的回复也仅仅是 "We still have a lot of work to do on the inputs and outputs to get this to be ready for regular use."

坑爹..这样的开源库用来播放特效，怕是基本的需求都搞不定，再加上目前项目中运用的还是OC，没办法，只能借鉴前人的思路，自己手撸一个链式框架了，还得把开源库的坑给填掉。

一. 基本架构

链式结构的工作流程如下图所示：

而实现该工作流程的基础组成部分有：

基础库MetalKit、渲染层Renderer、纹理生产者Provider、纹理消费者Consumer，他们的关系如下图所示。

二. 渲染原理及基础组成部分

在介绍组成部分前，我们有必要简要回顾介绍一下单次渲染操作的流程图，即，在单次渲染操作中，一个输入源（UIImage）是如何通过层层处理渲染到屏幕上面的：

--- 初始化阶段 ---

配置 Device 、 Queue、MTKView（初始化阶段，只初始化一次）
配置 PipelineState （设置和.metal文件映射方法，只初始化一次）
创建资源，读取纹理MTLTexture（只初始化一次）
设置顶点MTLBuffer（最好只初始化一次）

--- 渲染阶段，drawInMTKView回调，每帧渲染一次 ---

根据Queue获取 CommandBuffer
根据CommandBuffer和RenderPassDescriptor配置 CommandBufferEncoder
Encoder Buffer 【如有需要的话可以用 Threadgroups 来分组 Encoder 数据】

--- 结束，提交渲染命令，在完成渲染后，将命令缓存区提交至GPU ---

提交到 Queue 中

我们可以看到，在单次渲染操作中，有些部分是只会初始化一次，而有些部分需要频繁地创建和读取。

在本次链式结构中，对于一次链式渲染（从UIImage到MTKView）来说，我们只需要创建一次的内容包括：Device、CommanQueue、CommandBuffer、Library、Pipeline。

而需要多次读取的内容为CommandEncoder，多次Encode之后，直到MTKView，将该次渲染所有Encode操作得到的CommandBuffer提交Commit，让GPU进行渲染。

1. 基础库MetalKit

MetalKit负责管理和存储只需要创建一次的内容，基本都是Lazy Load得到的，这样就避免了渲染的时候频繁创建对象，消耗CPU和内存。

- (id<MTLDevice>)device {
    if (!_device) {
        _device = MTLCreateSystemDefaultDevice();
    }
    return _device;
}

- (id<MTLCommandQueue>)commandQueue {
    if (!_commandQueue) {
        _commandQueue = [self.device newCommandQueue];
    }
    return _commandQueue;
}

- (id<MTLCommandBuffer>)commandBuffer {
    if (!_commandBuffer) {
        _commandBuffer = self.commandQueue.commandBuffer;
    }
    return _commandBuffer;
}

- (id<MTLLibrary>)library {
    if (!_library) {
        NSString *libPath = [METAL_BUNDLE pathForResource:@"alpha_video_renderer" ofType:@"metallib"];
        if (!libPath) {
            NSAssert(NO, @"[HobenMetalKit] libPath is nil!");
            [CCAlphaVideoUtils handleMetalSetupError:CCAlphaVideoMetalErrorTypeLibLoadError reason:@"libPath is nil"];
            HobenLog(@"[HobenMetalKit] libPath is nil!");
            return nil;
        }
        NSError *error;
        id <MTLLibrary> defaultLibrary = [MTL_DEVICE newLibraryWithFile:libPath error:&error];
        if (error || !defaultLibrary) {
            [CCAlphaVideoUtils handleMetalSetupError:CCAlphaVideoMetalErrorTypeLibLoadError reason:@"defaultLibrary load failed"];
            HobenLog(@"[HobenMetalKit] newLibraryWithFile error: %@", error);
            return nil;
        }
        _library = defaultLibrary;
    }
    return _library;
}

- (NSMutableDictionary<NSString *,id<MTLRenderPipelineState>> *)pipelineDict {
    if (!_pipelineDict) {
        _pipelineDict = [NSMutableDictionary dictionary];
    }
    return _pipelineDict;
}

这里将Pipeline管理也放到MetalKit中，加入缓存机制，同样也是为了避免渲染中频繁创建管线

+ (id <MTLRenderPipelineState>)pipelineStateWithVertexName:(NSString *)vertexName fragmentName:(NSString *)fragmentName {
    NSMutableDictionary *pipelineDict = [HobenMetalKit sharedInstance].pipelineDict;
    NSString *vName = vertexName ?: @"oneInputVertex";
    NSString *fName = fragmentName ?: @"passthroughFragment";
    NSString *key = [NSString stringWithFormat:@"%@_%@", vName, fName];
    id <MTLRenderPipelineState> cachedPipeline = pipelineDict[key];
    if (cachedPipeline) {
        [HobenMetalKit sharedInstance].didLoadMetalLibSuccess = YES;
        return cachedPipeline;
    }
    MTLRenderPipelineDescriptor *pipelineDesc = [MTLRenderPipelineDescriptor new];
    id <MTLLibrary> library = [self sharedLibrary];
    id <MTLFunction> vertexFunction = [library newFunctionWithName:vName];
    id <MTLFunction> fragmentFunction = [library newFunctionWithName:fName];
    if (!vertexFunction || !fragmentFunction) {
        NSAssert(NO, @"fuction is nil");
        return nil;
    }
    pipelineDesc.vertexFunction = vertexFunction;
    pipelineDesc.fragmentFunction = fragmentFunction;
    pipelineDesc.colorAttachments[0].pixelFormat = MTLPixelFormatBGRA8Unorm;

    NSError *pipelineError;
    id <MTLRenderPipelineState> pipelineState = [[self sharedDevice] newRenderPipelineStateWithDescriptor:pipelineDesc error:nil];
    if (pipelineError) {
        [CCAlphaVideoUtils handleMetalSetupError:CCAlphaVideoMetalErrorTypeLibLoadError reason:@"pipelinestate error"];
        HobenLog(@"[CCAlphaVideoMetalFunctionLoader] pipelinestate error: %@", pipelineError);
    }
    if (pipelineState) {
        [HobenMetalKit sharedInstance].didLoadMetalLibSuccess = YES;
    }
    pipelineDict[key] = pipelineState;
    return pipelineState;
}

2. 渲染层Renderer

渲染层的主要目的是将传进来的Pipeline、顶点坐标、各种缓冲、输入的纹理进行操作，进行Encode操作后得到输出的纹理

/**
 单次渲染操作
 @param pipelineState 渲染管线
 @param inputTextures 输入的纹理，结构体包含纹理数据和纹理坐标
 @param imageVertices 顶点坐标，输入nil则为默认顶点坐标
 @param vertexBuffers 顶点着色器缓冲数组
 @param fragmentBuffers 片段着色器缓冲数组
 @param loadAction 读取/清除之前渲染的内容，默认MTLLoadActionClear
 @param outputTexture 输出的纹理，可复用
 */
+ (void)renderQuad:(id <MTLRenderPipelineState>)pipelineState
     inputTextures:(NSArray <HobenMetalTexture *> *)inputTextures
     imageVertices:(nullable NSArray *)imageVertices
     vertexBuffers:(nullable NSArray <id<MTLBuffer>> *)vertexBuffers
   fragmentBuffers:(nullable NSArray <id<MTLBuffer>> *)fragmentBuffers
        loadAction:(MTLLoadAction)loadAction
     outputTexture:(id <MTLTexture>)outputTexture {
    
    NSAssert(!imageVertices || imageVertices.count == 8, @"imageVertices.count must be 8");
    
    AUTO_RELEASE_BEGIN
        
    if (!pipelineState) {
        NSAssert(NO, @"pipelineState is nil");
        return;
    }
    NSArray *defaultImageVertices = @[
        @-1.0, @1.0,
        @1.0, @1.0,
        @-1.0, @-1.0,
        @1.0, @-1.0,
    ];
    NSArray *vertice = imageVertices ?: defaultImageVertices;
    float verticeCoordinates[8] = {
        [vertice[0] floatValue], [vertice[1] floatValue],
        [vertice[2] floatValue], [vertice[3] floatValue],
        [vertice[4] floatValue], [vertice[5] floatValue],
        [vertice[6] floatValue], [vertice[7] floatValue],
    };
    id <MTLBuffer> vertexBuffer = [[HobenMetalKit sharedDevice] newBufferWithBytes:verticeCoordinates length:sizeof(verticeCoordinates) options:MTLResourceStorageModeShared];
    
    MTLRenderPassDescriptor *renderPass = [MTLRenderPassDescriptor renderPassDescriptor];
    renderPass.colorAttachments[0].texture = outputTexture;
    renderPass.colorAttachments[0].clearColor = MTLClearColorMake(0, 0, 0, 0);
    renderPass.colorAttachments[0].storeAction = MTLStoreActionStore;
    renderPass.colorAttachments[0].loadAction = loadAction;
    
    id <MTLRenderCommandEncoder> renderEncoder = [MTL_COMMAND_BUFFER renderCommandEncoderWithDescriptor:renderPass];
    [renderEncoder setRenderPipelineState:pipelineState];
    [renderEncoder setVertexBuffer:vertexBuffer offset:0 atIndex:0];
    
    for (NSInteger i = 0; i < vertexBuffers.count; i++) {
        id <MTLBuffer> extraVertexBuffer = vertexBuffers[i];
        [renderEncoder setVertexBuffer:extraVertexBuffer offset:0 atIndex:1 + i];
    }
    
    for (NSInteger i = 0; i < inputTextures.count; i++) {
        HobenMetalTexture *texture = inputTextures[i];
        if (![texture isKindOfClass:[HobenMetalTexture class]]) {
            NSAssert(NO, @"texture class must be HobenMetalTexture");
            [renderEncoder setVertexBuffer:nil offset:0 atIndex:1 + i + vertexBuffers.count];
            [renderEncoder setFragmentTexture:nil atIndex:i];
            continue;
        }
        NSArray *textureCoor = texture.textureCoordinates;
        NSAssert(textureCoor.count == 8, @"textureCoor.count must be 8");
        float textureCoordinates[8] = {
            [textureCoor[0] floatValue], [textureCoor[1] floatValue],
            [textureCoor[2] floatValue], [textureCoor[3] floatValue],
            [textureCoor[4] floatValue], [textureCoor[5] floatValue],
            [textureCoor[6] floatValue], [textureCoor[7] floatValue],
        };
        id <MTLBuffer> textureBuffer = [[HobenMetalKit sharedDevice] newBufferWithBytes:textureCoordinates length:sizeof(textureCoordinates) options:MTLResourceStorageModeShared];
        [renderEncoder setVertexBuffer:textureBuffer offset:0 atIndex:1 + i + vertexBuffers.count];
        [renderEncoder setFragmentTexture:texture.texture atIndex:i];
    }
    
    for (NSInteger i = 0; i < fragmentBuffers.count; i++) {
        id <MTLBuffer> fragmentBuffer = fragmentBuffers[i];
        [renderEncoder setFragmentBuffer:fragmentBuffer offset:0 atIndex:i];
    }
    [renderEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip vertexStart:0 vertexCount:4];
    [renderEncoder endEncoding];
        
    AUTO_RELEASE_END
}

3. 纹理生产者Provider

生产者的主要工作是根据渲染层获得的纹理，提供给对应的消费者，从而进行下一步操作，在这里我们定义了Provider需要遵循的协议：

@protocol HobenMetalProviderProtocol <NSObject>

- (void)transmitTexture:(id<MTLTexture>)texture
                 target:(id<HobenMetalConsumerProtocol>)target
                  index:(NSInteger)index;

@end

再定义一个遵循Provider协议的纹理生产者MetalOutput，该生产者主要是管理自己所拥有的Consumer（根据addTarget方法加入），并在必要时刻通知给对应的Consumer，让其调用相应的方法。

@interface HobenMetalOutput : NSObject <HobenMetalProviderProtocol> {
    id<MTLTexture> outputTexture;
}

#pragma mark - Public Method

- (void)addTarget:(id <HobenMetalConsumerProtocol>)target {
    NSInteger index = 0;
    if ([target respondsToSelector:@selector(nextAvailableTextureIndex)]) {
        index = [target nextAvailableTextureIndex];
    }
    [self addTarget:target atIndex:index];
}

- (void)addTarget:(id <HobenMetalConsumerProtocol>)target atIndex:(NSInteger)index {
    if (!target) {
        return;
    }
    if ([self.targets containsObject:target]) {
        return;
    }
    if ([target respondsToSelector:@selector(textureIndexUnavailable:)]) {
        [target textureIndexUnavailable:index];
    }
    [self.targets addObject:target];
    [self.targetTextureIndices addObject:@(index)];
}

- (void)transmitTextureToAllTargets:(id<MTLTexture>)texture {
    for (id <HobenMetalConsumerProtocol> target in self.targets) {
        NSInteger indexOfObject = [self.targets indexOfObject:target];
        NSInteger textureIndex = [[self.targetTextureIndices objectAtIndex:indexOfObject] integerValue];
        [self transmitTexture:texture target:target index:textureIndex];
    }
}

#pragma mark - HobenMetalProviderProtocol

- (void)transmitTexture:(id<MTLTexture>)texture target:(id<HobenMetalConsumerProtocol>)target index:(NSInteger)index {
    [target newTextureAvailable:texture index:index];
}

在本架构中，属于生产者的有HobenMetalPicture（根据UIImage获取到纹理）、HobenMetalMovieReader（根据CVPixelBufferRef获取到纹理）、HobenMetalFilter（根据链式上层获取到纹理），他们得到纹理后将会进行处理，输出给链式下层。

4. 纹理消费者Consumer

消费者的主要工作是根据Provider提供的纹理信息，进行进一步操作，在这里我们也定义了Consumer需要遵循的协议：

@protocol HobenMetalConsumerProtocol <NSObject>

- (void)newTextureAvailable:(id <MTLTexture>)texture index:(NSInteger)index;

@optional

- (NSInteger)nextAvailableTextureIndex;

- (void)textureIndexUnavailable:(NSInteger)index;

@end

在本架构中，属于消费者的有HobenMetalRenderView（根据获取到的纹理提交渲染指令）、HobenMetalFilter（根据获取到的纹理进行这一层的Encode），他们的职责是根据上一层Provider提供的纹理，在这一层进行编码。

三. 生产者和消费者们

1. 资源处理器

资源处理器，即将一些现有的资源对象（UIImage、CVPixelBufferRef）转化为纹理的工具，他们属于生产者Provider，转化为纹理后可以提供给链式下层Consumer。

HobenMetalPicture根据MTKTextureLoader提供的纹理读取方法，在init的时候就将CGImage转换为了纹理。

- (instancetype)initWithImage:(UIImage *)newImageSource {
    if (self = [self initWithCGImage:newImageSource.CGImage]) {
        
    }
    return self;
}

- (instancetype)initWithCGImage:(CGImageRef)newImageSource {
    if (self = [super init]) {
        [self renderCGImage:newImageSource];
    }
    return self;
}

- (void)renderCGImage:(CGImageRef)cgImage {
    MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:MTL_DEVICE];
    NSDictionary *options = @{
        MTKTextureLoaderOptionSRGB : @(NO),
    };
    self.texture = [loader newTextureWithCGImage:cgImage options:options error:nil];
}

当开发者需要开始传递创建好的纹理的时候，调用以下方法即可

- (void)processImage {
    [self transmitTextureToAllTargets:self.texture];
}

而HobenMetalMovieReader则需要定义好自己的YUV转换矩阵，加入到片段着色器缓冲当中，原理在Metal与图形渲染三：透明通道视频有提及，这里只是将过去的逻辑抽离得更简洁和可读一点：

- (BOOL)renderPixelBuffer:(CVPixelBufferRef)pixelBuffer {
    AUTO_RELEASE_BEGIN
    
    id <MTLTexture> textureY = [self textureWithPixelBuffer:pixelBuffer pixelFormat:MTLPixelFormatR8Unorm planeIndex:0];
    id <MTLTexture> textureUV = [self textureWithPixelBuffer:pixelBuffer pixelFormat:MTLPixelFormatRG8Unorm planeIndex:1];
    [self setupMatrixWithPixelBuffer:pixelBuffer];
    
    if (!textureY || !textureUV || !self.convertMatrix) {
        return NO;
    }
    CVPixelBufferLockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
    NSMutableArray *inputTextureArray = [NSMutableArray array];
    for (id <MTLTexture> texture in @[textureY, textureUV]) {
        HobenMetalTexture *inputTexture = [[HobenMetalTexture alloc] initWithTexture:texture];
        [inputTextureArray addObject:inputTexture];
    }
    
    CVPixelBufferUnlockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
    
    if (!outputTexture) {
        outputTexture = [HobenMetalTexture defaultTextureByWidth:textureY.width height:textureY.height];
    }
        
    [HobenMetalKit renderQuad:MTL_PIPELINE(@"oneInputVertex", @"movieFragment") inputTextures:inputTextureArray imageVertices:nil vertexBuffers:nil fragmentBuffers:@[_convertMatrix] outputTexture:outputTexture];
    
    [self transmitTextureToAllTargets:outputTexture];
    
    AUTO_RELEASE_END
    
    return YES;
}

2. 中间层Filter

在链式图中，我们可以发现一个很重要的中间层——Filter，它既是生产者，也是消费者，它既可以消费上一层提供的纹理，又可以加入自己想要渲染的管线、缓冲、坐标，进行这一层的渲染，将得到的纹理提供给下一层。

Filter支持多个输入纹理，自己可以编写多个顶点缓冲、纹理缓冲，加上自己对应的Pipeline传递给渲染层，而最终只会得到一个输出。

根据Filter又是生产者又是消费者的特性，我们可以得出，它是一个继承HobenMetalOutput同时遵循HobenMetalConsumerProtocol的类：

@interface HobenMetalFilter : HobenMetalOutput <HobenMetalConsumerProtocol>
{
    NSMutableArray <HobenMetalTexture *> *inputTextures;
}

由于Filter支持多输入，所以我们需要等待所有的输入源准备好了，再进行该次渲染操作，在渲染时，如果上一层的Provider传来纹理，且所有纹理已经准备完毕，那就可以开始处理了：

- (void)newTextureAvailable:(id<MTLTexture>)texture index:(NSInteger)index {
    if (!texture) {
        return;
    }
    NSInteger numberOfInputs = MAX(_numberOfInputs, 1);
    
    HobenMetalTexture *inputTexture = [[HobenMetalTexture alloc] initWithTexture:texture];
    inputTexture.textureIndex = index;
    [inputTextures addObject:inputTexture];
    
    if (inputTextures.count < numberOfInputs) {
        return;
    }
    
    if (!outputTexture) {
        outputTexture = [HobenMetalTexture defaultTextureByWidth:texture.width height:texture.height];
    }
    
    [inputTextures sortUsingComparator:^NSComparisonResult(HobenMetalTexture *obj1, HobenMetalTexture *obj2) {
        if (obj1.textureIndex <= obj2.textureIndex) {
            return NSOrderedAscending;
        } else {
            return NSOrderedDescending;
        }
    }];
    [self renderToTextureWithVertices:nil textureCoordinates:nil];
    [inputTextures removeAllObjects];
}

- (void)renderToTextureWithVertices:(NSArray *)vertices textureCoordinates:(NSArray *)textureCoordinates {
    for (HobenMetalTexture *inputTexture in inputTextures) {
        inputTexture.textureCoordinates = textureCoordinates;
    }
    [HobenMetalKit renderQuad:MTL_PIPELINE(_vertexName, _fragmentName) inputTextures:inputTextures imageVertices:vertices outputTexture:outputTexture];
    
    [self transmitTextureToAllTargets:outputTexture];
}

值得注意的是，由于MTLTextureDescriptor创建纹理是一个很耗CPU的操作，因此，我们只创建一次outputTexture就好了（GPUImage3可能是因为这个问题，渲染视频的时候CPU占比很高，坑了我好久。。）

这里将renderToTextureWithVertices:textureCoordinates:抽了出来，开发者可以根据自己的需要自定义顶点坐标或纹理坐标，或者自己实现一套渲染逻辑，比如这次需要用到的裁剪操作CropFilter就是这样实现的：

- (void)calculateCropTextureCoordinates {
    CGFloat minX = _cropRegion.origin.x;
    CGFloat minY = _cropRegion.origin.y;
    CGFloat maxX = CGRectGetMaxX(_cropRegion);
    CGFloat maxY = CGRectGetMaxY(_cropRegion);
    
    _cropTextureCoordinates = @[
        @(minX), @(minY),
        @(maxX), @(minY),
        @(minX), @(maxY),
        @(maxX), @(maxY),
    ];
}

#pragma mark - Override

- (void)renderToTextureWithVertices:(NSArray *)vertices textureCoordinates:(NSArray *)textureCoordinates {
    [super renderToTextureWithVertices:vertices textureCoordinates:_cropTextureCoordinates];
}

3. 输出视图

输出视图继承于MTKView，其职责是将上一层提供的纹理进行展示，属于消费者Consumer，是将编码指令提交给GPU的最终结点。而这次，我们不需要让系统每帧回调drawInMtkView:了，而是我们自己决定调用的时机，代码如下：

@interface HobenMetalRenderView : MTKView <HobenMetalConsumerProtocol>

static const NSUInteger MaxFramesInFlight = 3;

- (void)setup {
    // 设置enableSetNeedsDisplay为NO且paused为YES，开发者自决定draw时机
    self.enableSetNeedsDisplay = NO;
    self.paused = YES;
    self.autoResizeDrawable = YES;
    self.device = MTL_DEVICE;
    self.opaque = NO;
    _inFlightSemaphore = dispatch_semaphore_create(MaxFramesInFlight);
}

- (void)newTextureAvailable:(id<MTLTexture>)texture index:(NSInteger)index {
    self.drawableSize = CGSizeMake(texture.width, texture.height);
    self.currentTexture = texture;
    [self draw];
}

- (void)drawRect:(CGRect)rect {
    if (!self.currentTexture) {
        return;
    }
    if (!self.currentDrawable) {
        NSAssert(NO, @"drawable is nil");
        return;
    }
    dispatch_semaphore_wait(_inFlightSemaphore, DISPATCH_TIME_FOREVER);
    
    id <MTLCommandBuffer> commandBuffer = MTL_COMMAND_BUFFER;
    HobenMetalTexture *texture = [[HobenMetalTexture alloc] initWithTexture:self.currentTexture];
    [HobenMetalKit renderQuad:MTL_PASSTHROUGH_PIPELINE inputTextures:@[texture] outputTexture:self.currentDrawable.texture];
    __block dispatch_semaphore_t block_semaphore = _inFlightSemaphore;
    [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer)
     {
         dispatch_semaphore_signal(block_semaphore);
     }];
    [commandBuffer presentDrawable:self.currentDrawable];
    [commandBuffer commit];
    self.currentTexture = nil;
    [HobenMetalKit resetCommandBuffer];
}

MTKView的currentDrawable也就是当前屏幕的画布，当渲染指令commit完毕后，这次链式结构的所有编码好的命令缓冲就会提交给GPU，至此，该条链式结构就能完成了。

需要注意的是，当CommandBuffer提交上去后，需要重置，下次渲染的时候，会从命令缓冲队列里面再创建一条命令缓冲，直到下次MTKView又将渲染指令提交上去完毕。

四. 业务层的继承和调用

1. 自定义一个Filter

经过这次重构之后，业务层的逻辑显然简洁了很多，如果需要自定义一个Filter，我们只需要指定对应的顶点着色器、片段着色器即可进行操作，有需要的话还可以自定义顶点坐标、片段坐标，例如，裁剪操作CropFilter可以简化为以下代码：

- (instancetype)initWithCropRegin:(CGRect)newCropRegion {
    if (self = [super init]) {
        self.cropRegion = newCropRegion;
    }
    return self;
}

- (void)calculateCropTextureCoordinates {
    CGFloat minX = _cropRegion.origin.x;
    CGFloat minY = _cropRegion.origin.y;
    CGFloat maxX = CGRectGetMaxX(_cropRegion);
    CGFloat maxY = CGRectGetMaxY(_cropRegion);
    
    _cropTextureCoordinates = @[
        @(minX), @(minY),
        @(maxX), @(minY),
        @(minX), @(maxY),
        @(maxX), @(maxY),
    ];
}

#pragma mark - Override

- (void)renderToTextureWithVertices:(NSArray *)vertices textureCoordinates:(NSArray *)textureCoordinates {
    [super renderToTextureWithVertices:vertices textureCoordinates:_cropTextureCoordinates];
}

- (void)setCropRegion:(CGRect)newValue {
    NSParameterAssert(newValue.origin.x >= 0 && newValue.origin.x <= 1 &&
                      newValue.origin.y >= 0 && newValue.origin.y <= 1 &&
                      newValue.size.width >= 0 && newValue.size.width <= 1 &&
                      newValue.size.height >= 0 && newValue.size.height <= 1);

    _cropRegion = newValue;
    [self calculateCropTextureCoordinates];
}

而融合操作由于没有自定义顶点坐标的需求，在OC层就更简单了

- (instancetype)init {
    if (self = [super initWithVertexName:@"twoInputVertex" fragmentName:@"mixFragment" numberOfInputs:2]) {
        
    }
    return self;
}

对应的.metal文件也只是之前的融合操作：

vertex TwoInputVertexIO twoInputVertex(const device packed_float2 *position [[buffer(0)]],
                                       const device packed_float2 *texturecoord [[buffer(1)]],
                                       const device packed_float2 *texturecoord2 [[buffer(2)]],
                                       uint vid [[vertex_id]])
{
    TwoInputVertexIO outputVertices;
    
    outputVertices.position = float4(position[vid], 0, 1.0);
    outputVertices.textureCoordinate = texturecoord[vid];
    outputVertices.textureCoordinate2 = texturecoord2[vid];

    return outputVertices;
}

fragment float4 mixFragment(TwoInputVertexIO fragmentInput [[stage_in]],
                            texture2d<float> inputTexture [[texture(0)]],
                            texture2d<float> inputTexture2 [[texture(1)]])
{
    constexpr sampler quadSampler;
    float4 color1 = inputTexture.sample(quadSampler, fragmentInput.textureCoordinate);
    float4 color2 = inputTexture2.sample(quadSampler, fragmentInput.textureCoordinate2);

    return float4(color1.rgb, color2.r);
}

2. 业务层的调用

业务层需要指定链式结构的走向，也只需要一个可读性非常好的操作：

- (void)viewDidLoad {
    [super viewDidLoad];

    if (!_renderView) {
        _renderView = [[HobenMetalRenderView alloc] initWithFrame:CGRectMake(0, 0, self.view.frame.size.width, self.view.frame.size.height)];
    }
    if (!_cropLeftFilter) {
        _cropLeftFilter = [[HobenMetalCropFilter alloc] initWithCropRegin:CGRectMake(0, 0, .5f, 1.f)];
    }
    
    if (!_cropRightFilter) {
        _cropRightFilter = [[HobenMetalCropFilter alloc] initWithCropRegin:CGRectMake(.5f, 0, .5f, 1.f)];
    }

    if (!_mixFilter) {
        _mixFilter = [[HobenMetalMixFilter alloc] init];
    }

    if (!_picture) {
        _picture = [[HobenMetalPicture alloc] initWithImage:[UIImage imageNamed:@"crop_image"]];
    }
    
    [self.view addSubview:_renderView];
    
    [_picture addTarget:_cropLeftFilter];
    [_picture addTarget:_cropRightFilter];
    
    [_cropLeftFilter addTarget:_mixFilter];
    [_cropRightFilter addTarget:_mixFilter];
    
    [_mixFilter addTarget:_renderView];
    
    [_picture processImage];
}

至此，一个链式结构就完成啦！

五. 内存和CPU优(Cai)化(Keng)的一些思考

GPUImage3处理视频的高CPU和高内存情况，预估原因体现在以下几点：

AutoReleasePool

苹果的对Metal渲染的官方文档是建议使用autoRelease的，对此我们渲染的操作也需要加上这个操作。

对CommandBuffer的频繁Commit

在GPUImage3的设计中，无论是Provider、Consumer还是Filter，他的每次编码操作之后都进行了一次commit，事实上，对于单次渲染来说，只需要一次commit、多次编码即可完成，而commit恰恰是CPU和GPU沟通的桥梁。

根据苹果官方的描述，Drawable其实是一个非常有限的资源（只有3个），他由系统进行调度，而官方的Sample Code：Synchronizing CPU and GPU Work，建议使用信号量来控制commit，GPUImage3这番频繁的commit估计会很影响CPU的性能。

// The maximum number of frames in flight.
static const NSUInteger MaxFramesInFlight = 3;

...

/// Handles view rendering for a new frame.
- (void)drawInMTKView:(nonnull MTKView *)view
{
    // Wait to ensure only `MaxFramesInFlight` number of frames are getting processed
    // by any stage in the Metal pipeline (CPU, GPU, Metal, Drivers, etc.).
    dispatch_semaphore_wait(_inFlightSemaphore, DISPATCH_TIME_FOREVER);

...

    // Add a completion handler that signals `_inFlightSemaphore` when Metal and the GPU have fully
    // finished processing the commands that were encoded for this frame.
    // This completion indicates that the dynamic buffers that were written-to in this frame, are no
    // longer needed by Metal and the GPU; therefore, the CPU can overwrite the buffer contents
    // without corrupting any rendering operations.
    __block dispatch_semaphore_t block_semaphore = _inFlightSemaphore;
    [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer)
     {
         dispatch_semaphore_signal(block_semaphore);
     }];

    // Finalize CPU work and submit the command buffer to the GPU.
    [commandBuffer commit];
}

频繁地使用MTLTextureDescriptor创建outputTexture

在视频的每一帧渲染中，这个是非常非常消耗CPU的，一个视频有非常多帧，每一帧都初始化一个纹理肯定是不行的，因为这个，我渲染视频的CPU飙升到了50%左右，而优化之后CPU维持在10%左右，有多耗性能可想而知，事实上这个也不需要频繁创建，只需要Lazy Load就好了~

下图就是经过优化之后，渲染视频中，CPU和内存的峰值啦：

六. 总结

本次链式化架构的实现，大大地提升渲染逻辑的维护性和可读性，支持按照渲染功能对Filter文件和.metal文件进行分类，简化了业务层开发的逻辑。

即便需要自定义渲染操作，也只需要继承HobenMetalFilter，自行决定所需的顶点着色器、片段着色器、顶点坐标、纹理坐标、顶点缓冲、纹理缓冲即可，非常方便。

该链式结构遵循生产者-消费者结构，将输入作为生产者，输出作为消费者，中间层Filter作为生产者和消费者，从而使得单次的命令缓冲CommandBuffer集成了多个指令编码CommandEncode，最后让MTKView提交命令缓冲至GPU，完成该次渲染。

而本次链式架构不仅用OC完成了开源库GPUImage3的代码逻辑，而且还解决了高内存和高CPU问题，虽然过程比较煎熬，但收获真的很多，继续加油！