统计绘图 | 归一化 vs 标准化
2022-08-19 本文已影响0人
shwzhao
具体请参考:
https://en.wikipedia.org/wiki/Feature_scaling
https://en.wikipedia.org/wiki/Normalization_(statistics)
CSDN | 为什么要做特征归一化/标准化?
CSDN | 标准化和归一化,请勿混为一谈,透彻理解数据变换)
公众号 | 数据处理中的标准化、归一化,究竟是什么?
公众号 | 数据标准化_z-score
先看一下3组数据处理前的分布
mtcars %>%
select(mpg, disp, hp) %>%
rownames_to_column("car") %>%
pivot_longer(-car, names_to = "terms", values_to = "values") %>%
ggplot() +
geom_violin(aes(terms, values)) +
theme_bw()
image.png
再看一下数据处理后的分布,进入了同一量纲,趋势看起来都比较一致,归一化有确定区间,标准化没有。
mtcars %>%
select(mpg, disp, hp) %>%
rownames_to_column("car") %>%
pivot_longer(-car, names_to = "terms", values_to = "values") %>%
group_by(terms) %>%
mutate(normal_values = (values-min(values))/(max(values)-min(values)),
standard_values = (values-mean(values))/sd(values)) %>%
pivot_longer(-c(car, terms), names_to = "termss", values_to = "valuess") %>%
filter(termss != "values") %>%
ggplot() +
geom_violin(aes(termss, valuess)) +
# geom_boxplot(aes(termss, valuess)) +
facet_wrap(~terms) +
theme_bw()
image.png
mtcars %>%
select(mpg, disp, hp) %>%
scale() %>%
head()
#> mpg disp hp
#> Mazda RX4 0.1508848 -0.57061982 -0.5350928
#> Mazda RX4 Wag 0.1508848 -0.57061982 -0.5350928
#> Datsun 710 0.4495434 -0.99018209 -0.7830405
#> Hornet 4 Drive 0.2172534 0.22009369 -0.5350928
#> Hornet Sportabout -0.2307345 1.04308123 0.4129422
#> Valiant -0.3302874 -0.04616698 -0.6080186
mtcars %>%
select(mpg, disp, hp) %>%
rownames_to_column("car") %>%
pivot_longer(-car, names_to = "terms", values_to = "values") %>%
group_by(terms) %>%
mutate(normal_values = (values-min(values))/(max(values)-min(values)),
standard_values = (values-mean(values))/sd(values)) %>%
pivot_wider(id_cols = car, names_from = terms, values_from = standard_values) %>%
head()
#> # A tibble: 6 x 4
#> car mpg disp hp
#> <chr> <dbl> <dbl> <dbl>
#> 1 Mazda RX4 0.151 -0.571 -0.535
#> 2 Mazda RX4 Wag 0.151 -0.571 -0.535
#> 3 Datsun 710 0.450 -0.990 -0.783
#> 4 Hornet 4 Drive 0.217 0.220 -0.535
#> 5 Hornet Sportabout -0.231 1.04 0.413
#> 6 Valiant -0.330 -0.0462 -0.608