数据挖掘：朴素贝叶斯Naive Bayes Classifier

2022-04-10 本文已影响0人 Cache_wood

@[toc]

A probabilistic framework for solving classification problems

Conditional Probability:
$P(C|A) = \frac{P(A,C)}{P(A)}\\ P(A|C) = \frac{P(A,C)}{P(C)}$
Bayes theorem:
$P(C|A) = \frac{P(A|C)P(C)}{P(A)}$
Consider each attribute and class label as random variables

Given a record with attributes $(A_1,A_2,…,A_n)$

Goal is to predict class C
Specifically, we want to find the value of C that maximizes $P(C|A_1,A_2,…,A_n)$

Approach:

compute the posterior probability $P(C|A_1,A_2,…,A_n)$ for all values of C using the Bayes theorem
$P(C|A_1A_2…A_n) = \frac{P(A_1A_2…A_n)P(C)}{P(A_1A_2…A_n)}$
Choose value of C that maximizes
$P(C|A_1,A_2,…,A_n)$
Equivalent to choosing value of C that maximizes
$P(A_1,A_2,…,A_n|C)P(C)$

Naive Bayes Classifier

Assume independence among attributes $A_i$ when class is given:

$P(A_1,A_2,…,A_n|C_j) = P(A_1|C_j)P(A_2|C_j)…P(A_n|C_j)$
Can estimate $P(A_i|C_j)$ for all $A_i$ and $C_j$ .
New point is classified to $C_j$ if $P(C_j)\Pi P(A_i|C_j)$ is maximal.

How to Estimate Probabilities from Data

For continuous attributes:

Discretize the range into bins
- one ordinal attribute per bin
- violates independence assumption
Two-way split: (A<v) or (A>v)
- cjoose only one of the two splits as new attribute
Probability density estimation
- Assume attribute follows a normal distribution
- Use data to estimate parameters of distribution(e.g., mean and standard deviation) b
- Once probability distribution is known, can use it to estimate the conditional probability $P(A_i|c)$

Normal distribution : $P(A_i|c_j) = \frac{1}{\sqrt{2\pi\sigma_{ij}^2}}e^{-\frac{(A_i-\mu_{ij})^2}{2\sigma_{ij}^2}}$

One for each $(A_i,c_i)$ pair

If one of the conditional probability is zero, then the entire expression becomes zero

Probability estimation:

c :number of classes, p :prior probability, m :parameter
$Original: P(A_i|C) = \frac{N_{ic}}{N_c}\\ Laplace:P(A_i|C) = \frac{N_{ic}+1}{N_c+c}\\ m-estimate:P(A_i|C)= \frac{N_{ic}+mp}{N_c+m}\\$

Naive Bayes(Summary)

Robust to isolated noise points.

Handle missing values by ignoring the instance during probability estimate calculations

Robust to irrelevant attributes

Independence assumption may not hold for some attributes

Use other techniques such as Bayesian Belief Networks (BBN)

数据挖掘：朴素贝叶斯Naive Bayes Classifier

Naive Bayes Classifier

How to Estimate Probabilities from Data

Naive Bayes(Summary)

猜你喜欢

热点阅读