机器学习

机器学习--有监督--回归正则化(岭回归与lasso)

2021-11-07  本文已影响0人  小贝学生信

当进行线性回归拟合,有非常多特征变量(features)时,不仅会极大增加模型复杂度,造成对于训练集的过拟合,从而降低泛化能力;此外也增加了变量间共线性的可能(multicollinearity),使模型的系数难以解释。
regularization正则化是一种防止过拟合的方法,经常与线性回归配合使用,岭回归lasso回归便是其中两种常见的形式。

1、回归正则化的简单理解

1.1 岭回归

1.2 lasso回归

1.3 Elastic nets

2、R代码实操

R包与相关函数
library(glmnet)
glmnet(
  x = X,
  y = Y,
  alpha = 1
)
示例数据
ames <- AmesHousing::make_ames()
dim(ames)
set.seed(123)
library(rsample)
split <- initial_split(ames, prop = 0.7, 
                       strata = "Sale_Price")
ames_train  <- training(split)
# [1] 2049   81
ames_test   <- testing(split)
# [1] 881  81

# Create training feature matrices
# we use model.matrix(...)[, -1] to discard the intercept
X <- model.matrix(Sale_Price ~ ., ames_train)[, -1]
# transform y with log transformation
Y <- log(ames_train$Sale_Price)

parametric models such as regularized regression are sensitive to skewed response values so transforming can often improve predictive performance.

2.1 岭回归

Step1:初步建模,观察不同λ 值对应的参数值
ridge <- glmnet(x = X, y = Y,
                alpha = 0)

str(ridge$lambda)
# num [1:100] 286 260 237 216 197 ...

#lambda值越小,对参数的抑制越低
coef(ridge)[c("Latitude", "Overall_QualVery_Excellent"), 100]
# Latitude Overall_QualVery_Excellent 
# 0.60703722                 0.09344684

#lambda值越大,对参数的抑制越高
coef(ridge)[c("Latitude", "Overall_QualVery_Excellent"), 1]
# Latitude Overall_QualVery_Excellent 
# 6.115930e-36               9.233251e-37

plot(ridge, xvar = "lambda")
Step2:10折交叉验证确认最佳的λ
ridge <- cv.glmnet(x = X, y = Y,
                   alpha = 0)
plot(ridge, main = "Ridge penalty\n\n")
# the value with the minimum MSE
ridge$lambda.min
# [1] 0.1525105
ridge$cvm[ridge$lambda == ridge$lambda.min]
min(ridge$cvm) 
# [1] 0.0219778

# the largest value within one standard error of it
ridge$lambda.1se
# [1] 0.6156877
ridge$cvm[ridge$lambda == ridge$lambda.1se]
# [1] 0.0245219
Step3:最后结合交叉验证得出的最佳λ值,可视化对应的参数值
ridge <- cv.glmnet(x = X, y = Y,
                   alpha = 0)
ridge_min <- glmnet(x = X, y = Y,
                    alpha = 0)

plot(ridge_min, xvar = "lambda", main = "Ridge penalty\n\n")
abline(v = log(ridge$lambda.min), col = "red", lty = "dashed")
abline(v = log(ridge$lambda.1se), col = "blue", lty = "dashed")

2.2 lasso回归

Step1:初步建模,观察不同λ 值对应的参数值
lasso <- glmnet(x = X, y = Y,
                alpha = 1)

str(lasso$lambda)
# num [1:96] 0.286 0.26 0.237 0.216 0.197 ...

#lambda值越小,对参数的抑制越低
coef(lasso)[c("Latitude", "Overall_QualVery_Excellent"), 96]
# Latitude Overall_QualVery_Excellent 
# 0.8126079                  0.2222406

#lambda值越大,对参数的抑制越高
coef(lasso)[c("Latitude", "Overall_QualVery_Excellent"), 1]
# Latitude Overall_QualVery_Excellent 
# 0                          0

plot(lasso, xvar = "lambda")
Step2:10折交叉验证确认最佳的λ
lasso <- cv.glmnet(x = X, y = Y,
                   alpha = 1)
plot(lasso, main = "lasso penalty\n\n")
# the value with the minimum MSE
lasso$lambda.min
# [1] 0.003957686
lasso$cvm[lasso$lambda == lasso$lambda.min]
min(lasso$cvm) 
# [1] 0.0229088

# the largest value within one standard error of it
lasso$lambda.1se
# [1] 0.0110125
lasso$cvm[lasso$lambda == lasso$lambda.1se]
# [1] 0.02566636
Step3:最后结合交叉验证得出的最佳λ值,可视化对应的参数值
lasso <- cv.glmnet(x = X, y = Y,
                   alpha = 1)
lasso_min <- glmnet(x = X, y = Y,
                    alpha = 1)

plot(lasso_min, xvar = "lambda", main = "lasso penalty\n\n")
abline(v = log(lasso$lambda.min), col = "red", lty = "dashed")
abline(v = log(lasso$lambda.1se), col = "blue", lty = "dashed")

Although this lasso model does not offer significant improvement over the ridge model, we get approximately the same accuracy by using only 64 features!

# predict sales price on training data
pred <- predict(lasso, X)
# compute RMSE of transformed predicted
RMSE(exp(pred), exp(Y))
## [1] 34161.13

Elastic Net是通过调整α参数,使用岭回归与lasso回归的混合,进行拟合;可以通过caret包寻找最合适的比例,就不演示了~

上一篇 下一篇

猜你喜欢

热点阅读