Feature scaling stands for transforming variable values into a certain standard range. Feature scaling can quite important for certain machine learning algorithms, such as gradient descent, support vector machine. This post is about introducing several feature scaling techniques.

## Why Scale Features?

Machine Learning algorithms don't perform well when the input numerical attributes have different scales significantly (Ge), i.e. some features may range from 0 to 10 while another range from 1,000 to 10,000. This may cause some algorithms to assign these two variables with different importance.

For example, in the k-nearest neighbor algorithm, the classifier mainly calculates the Euclidean distance between two points. If a feature has a larger range value than other features, the distance will be dominated by this eigenvalue. Therefore each feature should be normalized, such as processing the range of values between 0 and 1.

In addition, feature scaling would make the algorithms (gradient descent) to converge faster.

Decision tree like algorithms are usually not sensitive to feature scales.

## Methods of Feature Scaling

### 1. Rescaling/Min-Max Scaling

In this approach, the data is scaled to a fixed range - usually [0,1] or [-1,1]. The cost of having this bounded range is that we will end up with smaller standard deviations, which can suppress the effect of outliers.

$x^{\prime}=\frac{x-\min (x)}{\max (x)-\min (x)}$

#### Usage

from sklearn.preprocessing import MinMaxScaler


1. enhances the stability of attributes with small variance
2. keeps the 0s in a sparse matrix

### 2. Standardization

Feature standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit-variance (1).

This method is widely used in machine learning algorithms, e.g. SVM, logistic regression and neural networks.

$x^{\prime}=\frac{x-\overline{x}}{\sigma}$

$\sigma=\sqrt{\frac{\sum(x-\operatorname{mean}(x))^{2}}{n}}$

#### Usage

from sklearn.preprocessing import StandardScaler


### 3. Mean normalisation

$x^{\prime}=\frac{x-\operatorname{mean}(x)}{\max (x)-\min (x)}$

### 4. Scaling to unit length

Devided by the Euclidean length of the vector, two norm.

$x^{\prime}=\frac{x}{\|x\|}$