Rewrite SQL Queries in Pandas
From time to time, I have done various tasks in SQL and Python. However, Pandas’ syntax is quite different from SQL. With SQL, you declare what you want in a sentence that almost reads like English. In Pandas, you apply operations on the dataset, and chain them, in order to transform and reshape the data the way you want it.
Python’s Requests Library Notes
The requests library is the de facto standard for making HTTP requests in Python. It abstracts the complexities of making requests behind a beautiful, simple API so that you can focus on interacting with services and consuming data in your application.
Fetch Data from PostgreSQL Databases in Python
We use pandas and psycopg2
together to connect with PostgreSQL. psycopg2
is a package allows us to create a connection with PostgreSQL databases in Python, and we will use sqlio
within pandas
to interact with the database.
Deploy Dash on Server by Gunicorn
Dash is an open-sourced Python Dashboard package from plot.ly. It's pretty easy to use and has a lot of components to build beautiful and informative graphs and charts.
Censored Data and Survival Analysis
Censorships in data is a condition in which the value of a measurement or observation is only partially observed. Censored data is one kind of missing data, but is different from the common meaning of missing value in machine learning. We usually observe censored data in a time-based dataset. In such datasets, the event is been cut off beyond a certain time boundary. We can apply survival analysis to overcome the censorship in the data.
Feature Engineering: Label Encoding & One-Hot Encoding
Unlike Decision Tree Classifier, some machine learning models doesn't have the ability to deal with categorical data. The categorical data are often requires a certain transformation technique if we want to include them, namely Label Encoding and One-Hot Encoding.
Treatments for Imbalanced Dataset
Imbalanced datasets are a common problem in classification tasks in machine learning. Take credit card fraud prediction as a simple example: the target values are either fraud (1) or not fraud (0), but the number of fraud (1) could only be less than one percent of the whole dataset.