ML 监督学习 分类 朴素贝叶斯分类器
2020-03-12 本文已影响0人
XinY_VV
朴素贝叶斯分类
基于贝叶斯定理来计算概率和条件概率,且假设每个属性独立地对分类结果产生影响
![](https://img.haomeiwen.com/i14820305/999c3df999ca93d9.png)
为什么称为“朴素的”?
因为朴素贝叶斯假定所有属性都是同等重要且相互独立的。
例子
![](https://img.haomeiwen.com/i14820305/62a12c59070d7596.png)
高斯贝叶斯分类
![](https://img.haomeiwen.com/i14820305/6ea076aa667bd951.png)
3种不同的贝叶斯分类
![](https://img.haomeiwen.com/i14820305/6b7620c43650fdae.png)
![](https://img.haomeiwen.com/i14820305/1d1555fb7e055e05.png)
![](https://img.haomeiwen.com/i14820305/ed20b6bffe01965e.png)
拉普拉斯修正(Laplacian correction)
如果训练集中未出现属性值,条件概率的值为0。为了避免这种情况,在估计概率值时通常要进行"平滑",常用'拉普拉斯修正"。
![](https://img.haomeiwen.com/i14820305/52af21b95bfe9da8.png)
例子
![](https://img.haomeiwen.com/i14820305/26075c923f580d33.png)
![](https://img.haomeiwen.com/i14820305/c1c813d375543f07.png)
![](https://img.haomeiwen.com/i14820305/ad5706235988848c.png)
代码
#GausianNB
from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
dataset = datasets.load_iris()
model = GaussianNB()
model.fit(dataset.data, dataset.target)
expected = dataset.target
predicted = model.predict(dataset.data)
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))
![](https://img.haomeiwen.com/i14820305/ddf2dae4ddecadac.png)
文本分类
#####Text Classification
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB,GaussianNB
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report, confusion_matrix
categories = ['alt.atheism', 'talk.religion.misc',
'comp.graphics', 'sci.space']
newsgroups_train = fetch_20newsgroups(subset='train',
categories=categories)
newsgroups_test = fetch_20newsgroups(subset='test',
categories=categories)
X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target
pipe = make_pipeline(TfidfVectorizer(), MultinomialNB())
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
垃圾邮件过滤
别人写的例子
https://github.com/Surya-Murali/Email-Spam-Classifier-Using-Naive-Bayes/blob/master/SpamClassifier.py
总结
![](https://img.haomeiwen.com/i14820305/75c2d1ccbd7c1fbe.png)