Uncertainty(confidence score)与Ca

2021-01-05 本文已影响0人 shudaxu

model confidence score:
Standard methods would assess the confidence of predictions of a fully-converged batch model without regularization;
如何评估模型的confidence？
预估值就是似然概率【empirical prob】（在没有regularization的情况下，有reg的情况下，可以视作是posterior prob）

model uncertainty
expected calibration error（ECE）

Uncertainty：

评估方式：Brier Score，NLL，ECE（https://zhuanlan.zhihu.com/p/120856234）

多种方法综述
https://blog.csdn.net/weixin_44864049/article/details/108232061
https://zhuanlan.zhihu.com/p/110687124
https://zhuanlan.zhihu.com/p/98756147
不确定性的来源：[Decomposition of Uncertainty in Bayesian Deep Learning
for Efficient and Risk-sensitive Learning]
1、认知不确定性。
（数据模型本身的covariate shift（训练的数据集与预估的数据集分布不同，不是严格iid），或者模型根本没有见过的数据OOD，可能会导致模型confidently wrong.【这个问题理论上可以靠收集更多的数据改善，但是真实业务场景中也不一定能消除，譬如你无法收集未知随机用户的特征】）
所以认知不确定性也分为两种，第一种，是能够收集数据解决的。第二种，无法解决，需要识别OOD来决定。
2、偶然不确定性。
（数据本身的不确定性，这种无法通过收集数据来消除，譬如相同的数据不同的label【标注错误，或者本身就有不确定的随机性，比如本身用户的随机性】但是理论上这点带来的不确定性也应该由模型来给出。另外，从另一种角度来讲，这种情况其实也能被ECE这个指标捕捉，即模型其实面对数据内的不确定性，理论上也是能handle的，譬如最后预估值也与随机后的期望一致，那么ECE也很低）

Ad Click Prediction: a View from the Trenches （the learning algorithm itself maintains a notion
of uncertainty in the per-feature counters，即对于一个fully converge的模型，其梯度本身，就涵盖了模型对当前预估的不确定性，但是这种方法有局限性？在nn模型拓展？）
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning （MC dropout）
Simple and scalable predictive uncertainty estimation using deep ensembles（deep ensembles）
1、将原本点估计预估y的task改进为预估mean与variance。（这里假设了高斯分布，其实也可以是其他的分布，长尾分布，或者更复杂的分布[mixture density network]），** 其实这种方式就能一定程度上捕获偶然不确定性**

image.png
2、由于这种方式能预估in-distribution数据的variance（在分布内的，其实就是偶然不确定性），但对ood和covariant shift的适用性不佳。所以使用bagging的方式，获得多个估计值，并计算出整体的mean和variance以此提升对OOD数据的robustness。

image.png

On Calibration of Modern Neural Networks（ temperature scaling）
思路就是看真实数据上的不确定性是多少（p？）然后做post-training的calibration
思路如下：Histogram binning，Isotonic regression，Platt scaling（详见校准）
https://ai.googleblog.com/2020/01/can-you-trust-your-models-uncertainty.html（evaluate the uncertainty metrics，Deep Ensemble most robust*）
Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
这里其中的一个结论是说，在on-distribution data做calibration，并不能提升对OOD或者covariant shift的data的性能【即无法评估这些数据的uncertainty】
另一个结论是，deep ensemble是目前看来最好的。

image.png
OOD Detection Improving Out-of-Distribution Detection in Machine Learning Models
对于out of distribution的数据，一个非常naive的想法就是通过generative model的思路来做。即，我们模型输出数据来自于in-distribution分布的likelihood。这种最直接的方式在“图像”领域经常是错误的。文章提出likelihood-ratio来解决部分问题。

Calibration

对于校准，其实用logloss的模型与我们校准的目标NLL是一样的，所以LR等模型本身校准度就比较高。
https://zhuanlan.zhihu.com/p/90479183

Platt scaling，其实就是使用LR校准（适用于数据量较小）:
用validation set训练如下模型：

image.png
Isotonic regression

Refer：
[1]:非均衡数据分类，采样：https://www.jianshu.com/p/c2a543d68e71

Uncertainty(confidence score)与Ca

Uncertainty：

Calibration

猜你喜欢

热点阅读