生物信息学统计分析与数据挖掘一些需要知道的概念

Normalization和Standardization的差异

2022-02-27  本文已影响0人  杨康chin

Normalization:

image.png

使数值在0-1之间

Standardization (又称为Z-score normalization):


image.png

使均值为0,且标准差为1

Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbors and Neural Networks.

Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. Also, unlike normalization, standardization does not have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization.

两种都可以被称为scaling。数据符合高斯分布的时候可以用standardization,不符合高斯分布的时候可以用normalization。Normalization适合于不考虑数据分布的模型,例如KNN和神经网络。

Ref:

  1. https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/
  2. https://towardsai.net/p/data-science/how-when-and-why-should-you-normalize-standardize-rescale-your-data-3f083def38ff
  3. https://en.wikipedia.org/wiki/Feature_scaling
上一篇下一篇

猜你喜欢

热点阅读