11. Maching learing system desig

2020-08-20 本文已影响0人玄语梨落

Maching learing system design

Prioritizing what to work on: Spam classification example

Buliding a spam classifier

How to spend your time to make it have low error?

Collect lots of data
Develop sophisticated features base on email routing information (from email header).
Develop sophisticated features for message body.
Develop sophisticated algorithm to detect misspellings.

Error analysis

Recommended approach

Start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data.
Plot learning curves to decide if more data, more features are likely to help.
Error analysis: Manually examine the examples (in cross validation set) that your algorithm made errors on. See if you spot any systematic trend in what type of examples it is making errors on.
numerical evaluation.Try to find a way to numerical analysis your algorthim performance.

Error metrics for skewed classed (偏斜类)

skewed class: The ratio of positvie to native examplse is very close to one of two extremes.

Precison (P): Of all patients where we predicted $y=1$ , what fraction actually has cancer? $\frac{True\ positives}{predicted\ positives}$
Recall (R):Of all patients that actually have cancer, what fraction did we correctly detect as hvaing cancer? $\frac{True\ positives}{actual\ positives}$