tidymodels包的利用KNN进行插补
2022-08-03 本文已影响0人
灵活胖子的进步之路
官网地址
https://recipes.tidymodels.org/reference/step_impute_knn.html?search-input=imPute
官网介绍
step_impute_knn creates a specification of a recipe step that will impute missing data using nearest neighbors.
tidymodels里面其他插补的方法还有以下
插补方法
#原来官网教程网址 https://recipes.tidymodels.org/reference/step_impute_knn.html?search-input=imPute
# 1. 上采样数据分析----------------------------------------------------------------------
library(recipes)
data(biomass, package = "modeldata")
#定义数据集
# 产生随机缺失数据的位置信息
set.seed(9039)
carb_missing <- sample(1:nrow(biomass), 100)
nitro_missing <- sample(1:nrow(biomass), 100)
#创造数据数据
biomass$carbon[carb_missing] <- NA
biomass$nitrogen[nitro_missing] <- NA
#利用recipe在训练集中构建方程
rec <- recipe(
HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
data = biomass)
#对训练集进行KNN插补
ratio_recipe <- rec %>%
step_impute_knn(all_predictors(), neighbors = 5)
#利用prep构建训练集方程
ratio_recipe2 <- prep(ratio_recipe, training = biomass)
#获得测试集插补数据集
imputed <- bake(ratio_recipe2, biomass)
# 2.查看分布情况 ----------------------------------------------------------------
# how well did it work?
summary(biomass$carbon)
cbind(
before = biomass$carbon[carb_missing],
after = imputed$carbon[carb_missing]
)
summary(biomass_te_whole$nitrogen)
cbind(
before = biomass_te_whole$nitrogen[nitro_missing],
after = imputed$nitrogen[nitro_missing]
)
tidy(ratio_recipe, number = 1)
tidy(ratio_recipe2, number = 1)