推荐系统推理优化

2022-02-08  本文已影响0人  MatrixOnEarth

推荐系统推理优化

@(Engineering Practice)[Deep Learning, Recommendation System, Inference]

姚伟峰

推荐系统(RecSys) - “沉默的大多数”

互联网企业

算力提供商

RecSys黑盒

输入-输出

在给定用户和用户上下文(如入口、时间、地域、用户的人口统计学数据等)的情况下,计算用户与库存(如商品、文章、用户等)发生交互(如点击、购买、连接等)的概率,并筛选最有可能k个库存推荐给用户,促成交互和转化。

RecSys IO model

KPI

RecSys算法模型

RecSys算法分类

算法设计上,大致可以按下图来划分。目前主流工业使用以DNN models为主,这也是本文的目标workload。


RecSys algorithms

DNN RecSys模型范式

DNN RecSys Model = Feature Engineering + Feature Interaction + Predictor DNN
不同的feature engineering, feature interaction和predictor DNN的选型造就了不同的模型和workload特性。

DNN RecSys model paradigm

典型DNN RecSys模型

WDL

DIN

DIEN

DLRM

DNN RecSys模型特征

Small Tensor + Big Model

It leads to lower Computational Intensity than CNN workloads.

Tensor Operations matter

Tensor operations which are Embedding Lookup & Tensor Manipulation occupy a non-negligible part.


time spent in Caffe2 operators in Facebook data centers

Workload Heterogeneity

Diverse combinations of Feature\ Engineering : Feature\ Interaction : DNN\ Predictor lead to workload heterogeneity.

RecSys models heterogeneity
RecSys models characteristics heterogeneity

RecSys workload性能优化

Overview

optimization methods

其中,模型优化专注于优化模型自身的性能,部署优化专注于优化模型在部署环境尤其是混部环境下的性能。

模型优化

优化Principles

Tensor Operation Sub-graph

主要优化方法

graph fusion/stitching

涉及的优化principles

Case Studies

FC&Attention Sub-graph

Sub-graph fusion

MatMul + BiasAdd + Activation

"MatMul + BiasAdd + Activation" 是FC子图中的典型子图,也是graph optimizer(如TF Grappler等)一般都会实现的graph optimization pass。目前主要是基于模板匹配的方式来实现。


MatMul fusion

在RecSys中的一个复杂性在于,对于同一个"MatMul + BiasAdd + Activation"语义,经常会有不同子图形式,下面给出两种:


MatMul subgraph variant#1
MatMul subgraph variant#2

可以看到,虽然上述两个子图语义上仍然是"MatMul+BiasAdd+Activation", 但由于形式上已经产生变化,基于模板匹配的子图融合pass对他们并不能正确地辨识和融合,需要使用更高抽象度的融合pass去辨识。实践也表明,增强的pass会给线上inference带来20%左右的latency减少。


MatMul fusion brings online latency improvement
Multi-Head Attention

Multi-Head Attention作为attention结构的基本子图,仔细分析并做极致优化是非常有必要的。


MHA fusion

Operator optimization

Increase Computation Intensity
Increase Peak Memory BW
Example
假想系统参数
L2$ peak BW(TB/s) 4
HBM2e peak BW(TB/s) 0.8
BF16 peak TFLOPS 512


部署优化

Problem statement

Mixed deployment brings deployment optimization

前期探索

Facebook

Facebook proposed DeepRecSched to search good deployment configurations with dry-run. Facebook的实验报告了在CPU上2x的QPS,在GPU上5x的QPS。

Facebook DeepRecSched
其他

其他探索可见《深度学习推理性能优化》 部署优化部分。

Micro-Architecture探索

主要有两个方向:

References

  1. DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference
  2. The Architectural Implications of Facebook’s DNN-based Personalized Recommendation
  3. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
  4. Cross-Stack Workload Characterization of Deep Recommendation Systems
  5. High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models
  6. Accelerating the Wide & Deep Model Workflow from 25 Hours to 10 Minutes Using NVIDIA GPUs
  7. Applying the Roofline Model for Deep Learning performance optimizations
  8. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing
  9. MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions
  10. AI Matrix: A Deep Learning Benchmark for Alibaba Data Centers
  11. Deep Learning Recommendation Model for Personalization and Recommendation Systems
  12. Download Terabyte Click Logs
  13. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
  14. Roofline Model
  15. GPU Performance Background User Guide
  16. Matrix Multiplication Background User Guide
  17. 推理性能提升一倍,TensorFlow Feature Column性能优化实践
  18. Accelerate INT8 Inference Performance for Recommender Systems with Intel® Deep Learning Boost (Intel® DL Boost)
  19. Optimizing Recommendation System Inference Performance Based on GPU
  20. Deep Learning: It’s Not All About Recognizing Cats and Dogs
  21. 深度学习推理性能优化
  22. Recommendation Systems Algorithms,Challenges,Metrics, and Business Opportunities
上一篇 下一篇

猜你喜欢

热点阅读