OC实现Softmax识别手写数字
简介
Softmax回归模型是logistic回归模型在多分类问题上的推广,在多分类问题中,类标签 y 可以取两个以上的值。Softmax模型运用广泛,很多复杂精细的训练模型最后一步都会用softmax来分配概率。
公式
Softmax回归最主要的代价函数如下:
代价函数其中,1{-} 是示性函数,其取值规则为:1{值为真的表达式}= 1。
根据代价函数,分别对k个分类进行求导,就可以得到:
k分类偏导函数
其中,Xi为类别j的概率:
Xi为类别j的概率有了上面的偏导数公式以后,我们就可以用梯度下降算法更新theta值,每一次迭代需要进行如下更新:
j = 1,...,k
以上公式并没有涉及到bias优化,加上bias的公式形如:
softmax-regression-vectorequation.png
其实bias的迭代更新和theta一样,只需要找到bias的偏导数就行,我的代码中包含了对bias求导优化。
数据
训练所用数据集是MNIST,MNIST数据集的官网是Yann LeCun's website。这个数据集包含60000行的训练数据集和10000行的测试数据集。
文件 | 内容 |
---|---|
train-images-idx3-ubyte.gz | 训练集图片 - 60000 张 训练图片 |
train-labels-idx1-ubyte.gz | 训练集图片对应的数字标签 |
t10k-images-idx3-ubyte.gz | 测试集图片 - 10000 张 图片 |
t10k-labels-idx1-ubyte.gz | 测试集图片对应的数字标签 |
其中每一张图片都是28*28的,矩阵表示如下:
MNIST-Matrix.png
在数据集中,每张图片都是一个展开的向量,长度是 28x28 = 784。因此,在MNIST训练数据集中,train-images-idx3-ubyte解析后是一个形状为 [60000, 784]的张量。
mnist-train-xs.png
而train-labels-idx1-ubyte则是一个长度60000的向量,每一个值对应train-images-idx3-ubyte中代表的数字范围0~9。
测试数据与训练数据结构一样,只是数量只有10000张。
下载过后读取与解析到数据代码:
//
// MLLoadMNIST.m
// MNIST
//
// Created by Jiao Liu on 9/23/16.
// Copyright © 2016 ChangHong. All rights reserved.
//
#import "MLLoadMNIST.h"
@implementation MLLoadMNIST
int reverseInt(int input)
{
unsigned char ch1, ch2, ch3, ch4;
ch1=input&255;
ch2=(input>>8)&255;
ch3=(input>>16)&255;
ch4=(input>>24)&255;
return((int)ch1<<24)+((int)ch2<<16)+((int)ch3<<8)+ch4;
}
double **readImageData(const char *filePath)
{
FILE *file = fopen(filePath, "rb");
double **output = NULL;
if (file) {
int magic_number=0;
int number_of_images=0;
int n_rows=0;
int n_cols=0;
fread((char*)&magic_number, sizeof(magic_number), 1, file);
magic_number= reverseInt(magic_number);
fread((char*)&number_of_images, sizeof(number_of_images), 1, file);
number_of_images= reverseInt(number_of_images);
fread((char*)&n_rows, sizeof(n_rows), 1, file);
n_rows= reverseInt(n_rows);
fread((char*)&n_cols, sizeof(n_cols), 1, file);
n_cols= reverseInt(n_cols);
output = (double **)malloc(sizeof(double) * number_of_images);
for(int i=0;i<number_of_images;++i)
{
output[i] = (double *)malloc(sizeof(double) * n_rows * n_cols);
for(int r=0;r<n_rows;++r)
{
for(int c=0;c<n_cols;++c)
{
unsigned char temp=0;
fread((char*)&temp, sizeof(temp), 1, file);
output[i][(n_rows*r)+c]= (double)temp;
}
}
}
}
fclose(file);
return output;
}
int *readLabelData(const char *filePath)
{
FILE *file = fopen(filePath, "rb");
int *output = NULL;
if (file) {
int magic_number=0;
int number_of_items=0;
fread((char*)&magic_number, sizeof(magic_number), 1, file);
magic_number= reverseInt(magic_number);
fread((char*)&number_of_items, sizeof(number_of_items), 1, file);
number_of_items= reverseInt(number_of_items);
output = (int *)malloc(sizeof(int) * number_of_items);
for(int i=0;i<number_of_items;++i)
{
unsigned char temp=0;
fread((char*)&temp, sizeof(temp), 1, file);
output[i]= (int)temp;
}
}
fclose(file);
return output;
}
Softmax实现
- 首先我这里选用的梯度下降算法去逼近最值,所以需要一个迭代次数,代码中默认的是500次。输入训练图片是60000,如果每次迭代都全部用上,训练会花去很多时间,所以每次迭代默认随机取100张图片进行训练。
- 下降梯度的速率默认是0.01。
- 代码中实现两种梯度下降逼近,一种是每次迭代依次使用每张图片去更新所有分类变量,另一种是每次迭代顺序更新每个分类变量,更新每个分类时候使用所有随机图片数据。两种方法其实效率一样,但是测试时候发现第二种方法的正确率略高。
代码如下:
//
// MLSoftMax.m
// MNIST
//
// Created by Jiao Liu on 9/26/16.
// Copyright © 2016 ChangHong. All rights reserved.
//
#import "MLSoftMax.h"
@implementation MLSoftMax
- (id)initWithLoopNum:(int)loopNum dim:(int)dim type:(int)type size:(int)size descentRate:(double)rate
{
self = [super init];
if (self) {
_iterNum = loopNum == 0 ? 500 : loopNum;
_dim = dim;
_kType = type;
_randSize = size == 0 ? 100 : size;
_bias = malloc(sizeof(double) * type);
_theta = malloc(sizeof(double) * type * dim);
for (int i = 0; i < type; i++) {
_bias[i] = 0;
for (int j = 0; j < dim; j++) {
_theta[i * dim +j] = 0.0f;
}
}
_descentRate = rate == 0 ? 0.01 : rate;
}
return self;
}
- (void)dealloc
{
if (_bias != NULL) {
free(_bias);
_bias = NULL;
}
if (_theta != NULL) {
free(_theta);
_theta = NULL;
}
if (_randomX != NULL) {
free(_randomX);
_randomX = NULL;
}
if (_randomY != NULL) {
free(_randomY);
_randomY = NULL;
}
}
#pragma mark - SoftMax Main
- (void)randomPick:(int)maxSize
{
long rNum = random();
for (int i = 0; i < _randSize; i++) {
_randomX[i] = _image[(rNum+i) % maxSize];
_randomY[i] = _label[(rNum+i) % maxSize];
}
}
/*
- (double *)MaxPro:(double *)index
{
long double maxNum = index[0];
for (int i = 1; i < _kType; i++) {
maxNum = MAX(maxNum, index[i]);
}
long double sum = 0;
for (int i = 0; i < _kType; i++) {
index[i] -= maxNum;
index[i] = expl(index[i]);
sum += index[i];
}
for (int i = 0; i < _kType; i++) {
index[i] /= sum;
}
return index;
}
- (void)updateModel:(double *)index currentPos:(int)pos
{
double *delta = malloc(sizeof(double) * _kType);
for (int i = 0; i < _kType; i++) {
if (i != _randomY[pos]) {
delta[i] = 0.0 - index[i];
}
else
{
delta[i] = 1.0 - index[i];
}
_bias[i] -= _descentRate * delta[i];
for (int j = 0; j < _dim; j++) {
_theta[i * _dim +j] += _descentRate * delta[i] * _randomX[pos][j] / _randSize;
}
}
if (delta != NULL) {
free(delta);
delta = NULL;
}
}
- (void)train
{
_randomX = malloc(sizeof(double) * _randSize);
_randomY = malloc(sizeof(int) * _randSize);
double *index = malloc(sizeof(double) * _kType);
for (int i = 0; i < _iterNum; i++) {
[self randomPick:_trainNum];
for (int j = 0; j < _randSize; j++) {
// calculate wx+b
vDSP_mmulD(_theta, 1, _randomX[j], 1, index, 1, _kType, 1, _dim);
vDSP_vaddD(index, 1, _bias, 1, index, 1, _kType);
index = [self MaxPro:index];
[self updateModel:index currentPos:j];
}
}
if (index != NULL) {
free(index);
index = NULL;
}
}
*/
- (int)indicator:(int)label var:(int)x
{
if (label == x) {
return 1;
}
return 0;
}
- (double)sigmod:(int)type index:(int) index
{
double up = 0;
for (int i = 0; i < _dim; i++) {
up += _theta[type * _dim + i] * _randomX[index][i];
}
up += _bias[type];
double *down = malloc(sizeof(double) * _kType);
double maxNum = -0xfffffff;
vDSP_mmulD(_theta, 1, _randomX[index], 1, down, 1, _kType, 1, _dim);
vDSP_vaddD(down, 1, _bias, 1, down, 1, _kType);
for (int i = 0; i < _kType; i++) {
maxNum = MAX(maxNum, down[i]);
}
double sum = 0;
for (int i = 0; i < _kType; i++) {
down[i] -= maxNum;
sum += exp(down[i]);
}
if (down != NULL) {
free(down);
down = NULL;
}
return exp(up - maxNum) / sum;
}
- (double *)fderivative:(int)type
{
double *outP = malloc(sizeof(double) * _dim);
for (int i = 0; i < _dim; i++) {
outP[i] = 0;
}
double *inner = malloc(sizeof(double) * _dim);
for (int i = 0; i < _randSize; i++) {
long double sig = [self sigmod:type index:i];
int ind = [self indicator:_randomY[i] var:type];
double loss = -_descentRate * (ind - sig) / _randSize;
_bias[type] += loss * _randSize;
vDSP_vsmulD(_randomX[i], 1, &loss, inner, 1, _dim);
vDSP_vaddD(outP, 1, inner, 1, outP, 1, _dim);
}
if (inner != NULL) {
free(inner);
inner = NULL;
}
return outP;
}
- (void)train
{
_randomX = malloc(sizeof(double) * _randSize);
_randomY = malloc(sizeof(int) * _randSize);
for (int i = 0; i < _iterNum; i++) {
[self randomPick:_trainNum];
for (int j = 0; j < _kType; j++) {
double *newTheta = [self fderivative:j];
for (int m = 0; m < _dim; m++) {
_theta[j * _dim + m] = _theta[j * _dim + m] - _descentRate * newTheta[m];
}
if (newTheta != NULL) {
free(newTheta);
newTheta = NULL;
}
}
}
}
- (void)saveTrainDataToDisk
{
NSFileManager *fileManager = [NSFileManager defaultManager];
NSString *thetaPath = [[NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) objectAtIndex:0] stringByAppendingString:@"/Theta.txt"];
// NSLog(@"%@",thetaPath);
NSData *data = [NSData dataWithBytes:_theta length:sizeof(double) * _dim * _kType];
[fileManager createFileAtPath:thetaPath contents:data attributes:nil];
NSString *biasPath = [[NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) objectAtIndex:0] stringByAppendingString:@"/bias.txt"];
data = [NSData dataWithBytes:_bias length:sizeof(double) * _kType];
[fileManager createFileAtPath:biasPath contents:data attributes:nil];
}
- (int)predict:(double *)image
{
double maxNum = -0xffffff;
int label = -1;
double *index = malloc(sizeof(double) * _kType);
vDSP_mmulD(_theta, 1, image, 1, index, 1, _kType, 1, _dim);
vDSP_vaddD(index, 1, _bias, 1, index, 1, _kType);
for (int i = 0; i < _kType; i++) {
if (index[i] > maxNum) {
maxNum = index[i];
label = i;
}
}
return label;
}
- (int)predict:(double *)image withOldTheta:(double *)theta andBias:(double *)bias
{
double maxNum = -0xffffff;
int label = -1;
double *index = malloc(sizeof(double) * _kType);
vDSP_mmulD(theta, 1, image, 1, index, 1, _kType, 1, _dim);
vDSP_vaddD(index, 1, bias, 1, index, 1, _kType);
for (int i = 0; i < _kType; i++) {
if (index[i] > maxNum) {
maxNum = index[i];
label = i;
}
}
return label;
}
@end
最后训练结果:
Simulator Screen Shot不断改变循环次数,下降速率等参数,都会带来正确率的变化。测试结果发现默认参数能带来接近最好的识别的正确率,90%左右👏。
结语
90%的正确率并非达到最优,因为这仅仅是个简单的模型,可以加上卷积神经网络来改善效果。
关于卷积神经网络实现,我也实现了一版,但由于效率与正确率不高,后面优化后再分享😊。