Rewrite SQL Queries in Pandas

From time to time, I have done various tasks in SQL and Python. However, Pandas’ syntax is quite different from SQL. With SQL, you declare what you want in a sentence that almost reads like English. In Pandas, you apply operations on the dataset, and chain them, in order to transform and reshape the data the way you want it.

Simon

Python

16 Dec 2020

Python’s Requests Library Notes

The requests library is the de facto standard for making HTTP requests in Python. It abstracts the complexities of making requests behind a beautiful, simple API so that you can focus on interacting with services and consuming data in your application.

Simon

SQL

03 Jan 2020

Fetch Data from PostgreSQL Databases in Python

We use pandas and psycopg2 together to connect with PostgreSQL. psycopg2 is a package allows us to create a connection with PostgreSQL databases in Python, and we will use sqlio within pandas to interact with the database.

Simon

Data Visualization

18 Oct 2019

Deploy Dash on Server by Gunicorn

Dash is an open-sourced Python Dashboard package from plot.ly. It's pretty easy to use and has a lot of components to build beautiful and informative graphs and charts.

Simon

Python

21 Aug 2019

Censored Data and Survival Analysis

Censorships in data is a condition in which the value of a measurement or observation is only partially observed. Censored data is one kind of missing data, but is different from the common meaning of missing value in machine learning. We usually observe censored data in a time-based dataset. In such datasets, the event is been cut off beyond a certain time boundary. We can apply survival analysis to overcome the censorship in the data.

Simon

Python

20 Aug 2019

Feature Engineering: Label Encoding & One-Hot Encoding

Unlike Decision Tree Classifier, some machine learning models doesn't have the ability to deal with categorical data. The categorical data are often requires a certain transformation technique if we want to include them, namely Label Encoding and One-Hot Encoding.

Simon

Python

22 Jul 2019

Treatments for Imbalanced Dataset

Imbalanced datasets are a common problem in classification tasks in machine learning. Take credit card fraud prediction as a simple example: the target values are either fraud (1) or not fraud (0), but the number of fraud (1) could only be less than one percent of the whole dataset.

Simon

Python

15 Jul 2019

Data Cleaning: Filter Records Base on Conditions

Simon