大数据,机器学习,人工智能

11. Maching learing system desig

2020-08-20  本文已影响0人  玄语梨落

Maching learing system design

Prioritizing what to work on: Spam classification example

Buliding a spam classifier

How to spend your time to make it have low error?

Error analysis

Recommended approach

Error metrics for skewed classed (偏斜类)

skewed class: The ratio of positvie to native examplse is very close to one of two extremes.

Precison (P): Of all patients where we predicted y=1 , what fraction actually has cancer?\frac{True\ positives}{predicted\ positives}
Recall (R):Of all patients that actually have cancer, what fraction did we correctly detect as hvaing cancer?\frac{True\ positives}{actual\ positives}

Trading off precision and recall

By change the threshold of the h_\theta(x), we can blance precision and recall.

F1 Score (F Score)

F1 Score: 2\frac{PR}{P+R}
0\le F\le 1

Data for machine learing

How much data to train on?
There is a saying, "It's not who has the best algorithm that wins. It's who that has the most data."

Large data rationale

Assume feature x\in R^{n+1} has sufficient information to predict y accurately.

Assume training set is large enough to use a learing algorithm with many prameters.

上一篇 下一篇

猜你喜欢

热点阅读