互联网&大数据应用学习机器学习与数据挖掘

机器学习CS229:朴素贝叶斯&exercise6

2017-09-25  本文已影响9人  小太阳花儿

用朴素贝叶斯实现垃圾邮件分类器,解题代码如下

numTrainDocs = 700;

numTokens = 2500;

M = dlmread('F:\machine\ex6DataPrepared\train-features.txt', ' ');

spmatrix = sparse(M(:,1), M(:,2), M(:,3), numTrainDocs, numTokens);

train_matrix = full(spmatrix);

y = dlmread('F:\machine\ex6DataPrepared\train-labels.txt', ' ');

spam=find(y==1);

nonspam=find(y==0);

p_y=length(spam)/numTrainDocs;

xofspam=zeros(numTokens,1);

xofnonspam=zeros(numTokens,1);

for i=1:numTokens

xofspam(i,1)=sum(train_matrix(spam,i));

xofnonspam(i,1)=sum(train_matrix(nonspam,i));

end

word=sum(train_matrix,2);

fi_y1=(xofspam+1)./(sum(word(spam))+numTokens);

fi_y0=(xofnonspam+1)./(sum(word(nonspam))+numTokens);

%以上是train

%以下是test

numTestDocs = 260;

M =dlmread('F:\machine\ex6DataPrepared\test-features.txt', ' ');

test_spmatrix = sparse(M(:,1), M(:,2), M(:,3), numTestDocs, numTokens);

test_matrix = full(test_spmatrix);

test_result=zeros(numTestDocs,1);

a=test_matrix*log(fi_y1);

b=test_matrix*log(fi_y0);

test_result=a>b;

test_labels=dlmread('F:\machine\ex6DataPrepared\test-labels.txt', ' ');

length(find(test_result-test_labels));

对公式理解的两处错误导致我改了一晚上bug,以及MATLAB使用不熟练导致代码冗余,一个矩阵运算或者一个函数就可以搞定的问题我就傻傻的写了for循环。

上一篇 下一篇

猜你喜欢

热点阅读