2020-03-27 Scale, Standardize, o
What do These Terms Mean?
Scale generally means to change the range of the values. The shape of the distribution doesn’t change. Think about how a scale model of a building has the same proportions as the original, just smaller. That’s why we say it is drawn to scale. The range is often set at 0 to 1.
Standardize generally means changing the values so that the distribution standard deviation from the mean equals one. It outputs something very close to a normal distribution. Scaling is often implied.
Normalize can be used to mean either of the above things (and more!). I suggest you avoid the term normalize, because it has many definitions and is prone to creating confusion.
If you use any of these terms in your communication, I strongly suggest you define them.
Why Scale, Standardize, or Normalize?
Many machine learning algorithms perform better or converge faster when features are on a relatively similar scale and/or close to normally distributed. Examples of such algorithm families include:
linear and logistic regression
nearest neighbors
neural networks
support vector machines with radial bias kernel functions
principal components analysis
linear discriminant analysis
Scaling and standardizing can help features arrive in more digestible form for these algorithms.