Most Common Feature Scaling methods in Machine Learning

Definition

Feature scaling is the process of normalizing the range of feature in a dataset.
Real-world datasets often contain features that are varying in degrees of magnitude, range and units. Therefore, in order for machine learning models to interpret these features on the same scale, we need to perform scaling.
Feature scaling makes the model learns fast compared to the unscaled data. For ex, in gradient decent, to minimize the cost function, if the range of values is small then the algorithm converges much faster.

This is also known as Min-Max scaling. It scales the data to the range between 0 and 1.
This scaling is performed based on the below formula.

Where,

X is the current value to be scaled,

Min(x) is the minimum value in the list of value and max(x) is the maximum value in the list of values.

It represents the value in standard deviation from the mean.
The scaled values are distributed such that the mean of the values is 0 and the standard deviation is 1.

Where,

X is the current value to be scaled, µ is the mean of the list of value and σ is the standard deviation of the list of values.

This method centres the median value at zero and this method is robust to outliers.
It scales the data accordingly to the interquartile range (IQR = 75 Quantile – 25 Quantile).

Where,

X is the current value to be scaled.

X median is the median of the list of values and IQR is the interquartile range of the list of values.

Standardization is useful when the values of the feature are normal distributed (i.e., the value follow the bell-shaped curve).
Normalization is useful when data is not normally distributed and for algorithms that do not assumes any distribution of the data like K-Nearest Neighbors.
Robust Scaling is useful when outliers are present in the data.