机器学习-算法理论

Types of Generalization,Can Memo

2021-12-29  本文已影响0人  shudaxu
问题1:Memorization can't generalize?

由wide&deep文中定义:
Memorization can be loosely defined as learning the frequent co-occurrence of items or features and exploiting the correlation available in the historical data.
Generalization, on the other hand, is based on transitivity of correlation and explores new feature combinations that have never or rarely occurred in the past.

问题2: Generalization如何定义?
问题3:我们为什么要讨论OOD Generalization?
问题3.1:OOD的研究?
问题4: wide侧交叉无法泛化到样本中没有出现的query-item pair上?

One limitation of cross-product transformations is that they do not generalize to query-item feature pairs that have not appeared in the training data.

问题5:紧接上一个问题,那么DNN如何能generalize to query-item feature pairs that have not appeared in the training data?
问题6:DNN对high-rank sparse interaction matrix的over-generalize问题是如何体现的?
问题7:Wide 侧如何和DNN互补?

问题7.1:Wide 侧交叉特征选择?

问题8:总结在泛化方面,NN Model与Statistical Model的Pros & Cons

Refer:
[1] All inference is about generalizing from sample to population
https://statmodeling.stat.columbia.edu/2013/08/24/all-inference-is-about-generalizing-from-sample-to-population/

[2] OOD generalization
http://out-of-distribution-generalization.com/
综述:Towards Out-Of-Distribution Generalization: A Survey

[3] 表征学习
Towards Out-Of-Distribution Generalization: A Survey 中Section3:
UNSUPERVISED REPRESENTATION LEARNING METHOD leverage human’s prior knowledge to design and restrict the representation learning procedure

[4] OOD 识别
EXPLORING COVARIATE AND CONCEPT SHIFT FOR
DETECTION AND CONFIDENCE CALIBRATION OF OUT�OF-DISTRIBUTION DATA

[5]对于OOD,引入Uncertainty Estimate
Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift

[6] multi-domain learning,其实domain-specific也可以理解成是不同domain间有很强差异即bias
思路是利用domain-specific的强bias特征训练一个FCN,然后element-wise product施加到原模型上。(另外,我理解这只是一个mutli-task的子集,而非不同范式)
One Model to Serve All: Star Topology Adaptive Recommender for Multi-Domain CTR Prediction
相关的关于在dnn中加入先验(也算是domain吧)的讨论:https://www.zhihu.com/question/279012198

[7] 不充分收敛的表现:
由于收敛速度慢以及正则/early stop等操作的影响都会导致最终结果存在bias。从结果表现出来就是在某些维度上(比如男女),模型的预估值与训练集上的经验统计值有较大的偏差(bias)

[8] 不充分收敛的原因:
DNN通过embedding的学习来表征特征的交叉,参考DCN的paper指出DNNs are inefficient to even approximately model 2nd or 3rd-order feature crosses。由于DNN很难学习到充分的交叉,因此会导致上层的bias。
一些解决方案:
1、abacus也是采用冻结别的变量以单独的流程来训练embedding.
2、很多Few shot-Learning,Transfer-Learning,Meta-Learning的paradigm中,都是大量的数据以及网络来训练底层以及embedding,再迁移到新的网络上,对上层少量参数进行fine tuning。
3、DCN等通过实现自动先验交叉来弥补其emb收敛慢,交叉bias的问题。
4、最raw的方式,手动加入交叉。
5、上述的multi-domain learning,也是加入domain认知先验,单独小模型保障其交叉。

上一篇 下一篇

猜你喜欢

热点阅读