逻辑回归LogisticRegression
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=1000, n_features=4)
print(X) #np.array类型,1000*3
print(y) #np.array类型,1000*1
lr = LogisticRegression()
X_train = X[:-200]
X_test = X[-200:]
y_train = y[:-200]
y_test = y[-200:]
lr.fit(X_train, y_train)
y_train_predictions = lr.predict(X_train)
print(type(y_train_predictions))
y_test_predictions = lr.predict(X_test)
print ((y_train_predictions == y_train).sum().astype(float) / y_train.shape[0])
print ((y_test_predictions == y_test).sum().astype(float) / y_test.shape[0])
LogisticRegression()中的可加入参数较多,包含有:
(1)penalty:正则化项,l2正则化的目的是为防止过拟合,其内容为各权重的平方和加权
(2)C:目标函数的系数;因此C越大时,表示正则化的能力越弱
(3)tol:迭代停止值
(4)solver:求梯度的方法,默认选择‘liblinear’---线性分类器。其可选参数类型包含{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default: ‘liblinear’.
根据API文档,各参数的优势为:
For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.
For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to one-versus-rest schemes.
‘newton-cg’, ‘lbfgs’ and ‘sag’ only handle L2 penalty, whereas ‘liblinear’ and ‘saga’ handle L1 penalty.
Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
(5)dual:是否采用对偶方式进行求解;dual=true表示对偶方式,primal为原问题方式。
算法语言描述
逻辑回归算法语言描述.png以上算法描述部分若有误,敬请留言~