想法简友广场

数据挖掘:朴素贝叶斯Naive Bayes Classifier

2022-04-10  本文已影响0人  Cache_wood

@[toc]

A probabilistic framework for solving classification problems

Conditional Probability:
P(C|A) = \frac{P(A,C)}{P(A)}\\ P(A|C) = \frac{P(A,C)}{P(C)}
Bayes theorem:
P(C|A) = \frac{P(A|C)P(C)}{P(A)}
Consider each attribute and class label as random variables

Given a record with attributes (A_1,A_2,…,A_n)

Approach:

Naive Bayes Classifier

Assume independence among attributes A_i when class is given:

How to Estimate Probabilities from Data

For continuous attributes:

Normal distribution :P(A_i|c_j) = \frac{1}{\sqrt{2\pi\sigma_{ij}^2}}e^{-\frac{(A_i-\mu_{ij})^2}{2\sigma_{ij}^2}}

One for each (A_i,c_i) pair

If one of the conditional probability is zero, then the entire expression becomes zero

Probability estimation:

c :number of classes, p :prior probability, m :parameter
Original: P(A_i|C) = \frac{N_{ic}}{N_c}\\ Laplace:P(A_i|C) = \frac{N_{ic}+1}{N_c+c}\\ m-estimate:P(A_i|C)= \frac{N_{ic}+mp}{N_c+m}\\

Naive Bayes(Summary)

Robust to isolated noise points.

Handle missing values by ignoring the instance during probability estimate calculations

Robust to irrelevant attributes

Independence assumption may not hold for some attributes

上一篇 下一篇

猜你喜欢

热点阅读