tidymodels包的利用KNN进行插补

2022-08-03  本文已影响0人  灵活胖子的进步之路

官网地址

https://recipes.tidymodels.org/reference/step_impute_knn.html?search-input=imPute

官网介绍

step_impute_knn creates a specification of a recipe step that will impute missing data using nearest neighbors.

tidymodels里面其他插补的方法还有以下


插补方法

#原来官网教程网址 https://recipes.tidymodels.org/reference/step_impute_knn.html?search-input=imPute


# 1. 上采样数据分析----------------------------------------------------------------------

library(recipes)
data(biomass, package = "modeldata")

#定义数据集


# 产生随机缺失数据的位置信息
set.seed(9039)
carb_missing <- sample(1:nrow(biomass), 100)
nitro_missing <- sample(1:nrow(biomass), 100)

#创造数据数据
biomass$carbon[carb_missing] <- NA
biomass$nitrogen[nitro_missing] <- NA

#利用recipe在训练集中构建方程
rec <- recipe(
  HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
  data = biomass)

#对训练集进行KNN插补
ratio_recipe <- rec %>%
  step_impute_knn(all_predictors(), neighbors = 5)

#利用prep构建训练集方程
ratio_recipe2 <- prep(ratio_recipe, training = biomass)

#获得测试集插补数据集
imputed <- bake(ratio_recipe2, biomass)



# 2.查看分布情况 ----------------------------------------------------------------

# how well did it work?
summary(biomass$carbon)
cbind(
  before = biomass$carbon[carb_missing],
  after = imputed$carbon[carb_missing]
)

summary(biomass_te_whole$nitrogen)
cbind(
  before = biomass_te_whole$nitrogen[nitro_missing],
  after = imputed$nitrogen[nitro_missing]
)

tidy(ratio_recipe, number = 1)
tidy(ratio_recipe2, number = 1)



上一篇下一篇

猜你喜欢

热点阅读