Introduction to Anomaly Detection

Anomaly Detection (a.k.a Outlier Detection) is a process of detecting unexpected observations in specified datasets.

Simon

Python

20 Aug 2019

Feature Engineering: Label Encoding & One-Hot Encoding

Unlike Decision Tree Classifier, some machine learning models doesn't have the ability to deal with categorical data. The categorical data are often requires a certain transformation technique if we want to include them, namely Label Encoding and One-Hot Encoding.

Simon

Python

22 Jul 2019

Treatments for Imbalanced Dataset

Imbalanced datasets are a common problem in classification tasks in machine learning. Take credit card fraud prediction as a simple example: the target values are either fraud (1) or not fraud (0), but the number of fraud (1) could only be less than one percent of the whole dataset.

Simon

Python

15 Jul 2019

Data Cleaning: Filter Records Base on Conditions

Simon

Machine Learning

20 Jun 2019

Feature Scaling in Machine Learning

Feature scaling stands for transforming variable values into a certain standard range. Feature scaling can quite important for certain machine learning algorithms, such as gradient descent, support vector machine. This post is about introducing several feature scaling techniques.

Simon

Machine Learning

25 May 2019

Gaussian Mixture Model

From K-means we know that:

K-means forces clusters to be spherical
In K-means clustering every point can only belong to one cluster

Simon

Machine Learning

25 May 2019

Hessian Matrix

In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. Hessian Matrices are often used in optimization problems within Newton-Raphson's method.

Simon

Machine Learning

25 May 2019

K-means Clustering

K-means clustering is a type of unsupervised learning, which is used for unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K (defined manually as an input).

Simon

Machine Learning

23 May 2019

Matrix Factorization

Matrix factorization is a class of algorithms used for recommendation systems in machine learning. Matrix factorization algorithms work by decomposing dimensionality. Commonly known matrix factorization algorithms are SVD and PCA.

Simon