Facebook机器学习实践指南-1

2018-06-29  本文已影响36人  4v3r9

from
https://research.fb.com/the-facebook-field-guide-to-machine-learning-video-series/
2018.6.28

Contents
- 1 problem definition
- 2 data
- 3 evaluation
- 4 features
- 5 models
- 6 experimentation

$1 Problem Definition

1.1 Look for

1.2 Ask yourself these questions (e.g.)

在“定义问题”阶段思考清楚,好过做完实验发现思路不对

1.3 To conclude

$2 Data

Contents
- 1 Data recency and real time training
- -  时间性,如2017年 vs 2017年
- -  周期性,如黑色星期五和其前一周
- 2 Training / Prediction consistency
- 3 Records and sampling

2.2 Training / Prediction consistency

For this problem, we face two challenges:

2.3 Records and sampling

$3 Evaluation

Usually a good flow is to use offline evalution until you have a viable candidate, and then validate this with online experiments.

3.1 Baseline model

simplest possible model

3.2 Best offline practice

Split the dataset into three parts:

3.3 Evaluation

3.3.1 cross validation

need to shuffle dataset randomly before spliting

3.3.2 progressive evaluation

3.3.3 Metrics

3.3.4 To conclude

In both cases, we're interested in the performance of our models in the testset, if the model performs much better on the training or evaluation set compred to the test set, we're most likely overfitting the training data and in this case our models does not generalize well to new examples.

3.4 Calibration of model

calibration =
$$\frac{sum-of-labels}{sum-of-predictions} $$

about calibration

3.5 Spliting dataset by category

diving into the performance for different sub-sets of the evaluation data is a useful way to understand where the performance comes from and whether this is expected


spliting and observing distinctively

3.6 To conclude

evaluation summary
上一篇下一篇

猜你喜欢

热点阅读