Take the following as an example to filter a data frame based on certain conditions:

There are 2 users and some of their corresponding bill dates and bill amounts, we want to only keep one record for each user with their billing's starting date and the average bill amount.

Dataframe

import pandas as pd

raw_data = {'User':[1,1,1,2,2,2,1], 
        'Time':['2014-01-01','2014-02-01','2014-03-01','2018-02-01','2018-01-01','2018-03-01','2014-04-01'],
        'Amount':[32,13,48,98,64,23,27]}
data = pd.DataFrame(raw_data)
data
User Time Amount
0 1 2014-01-01 32
1 1 2014-02-01 13
2 1 2014-03-01 48
3 2 2018-02-01 98
4 2 2018-01-01 64
5 2 2018-03-01 23
6 1 2014-04-01 27

Solution 1

Package: Pandas
Function: dataframe.groupby()
Docs: pandas.DataFrame.groupby

# generate new columns for grouped results
data_amount = round(data.groupby(['User'], as_index=False)['Amount'].mean(),2)
data_time = data.groupby('User', as_index=False)['Time'].min()
# merge columns to main dataframe
data_merge1 = pd.merge(data,data_time,on='User',how='left')
data_merge2 = pd.merge(data_merge1,data_amount,on='User',how='left')
data_merge2
User Time_x Amount_x Time_y Amount_y
0 1 2014-01-01 32 2014-01-01 30.00
1 1 2014-02-01 13 2014-01-01 30.00
2 1 2014-03-01 48 2014-01-01 30.00
3 2 2018-02-01 98 2018-01-01 61.67
4 2 2018-01-01 64 2018-01-01 61.67
5 2 2018-03-01 23 2018-01-01 61.67
6 1 2014-04-01 27 2014-01-01 30.00
# drop old columns and rename new columns
data_new = data_merge2.drop(columns=['Time_x','Amount_x'])
data_new = data_new.drop_duplicates(['User'])
data_new = data_new.rename(columns={'Amount_y': 'Amount_mean','Time_y':'Date'})
data_new
User Date Amount_mean
0 1 2014-01-01 30.00
3 2 2018-01-01 61.67

Alternative

Alternatively, we could also drop_duplicates first then merge new columns:

data2 = data.drop_duplicates(['User'])
data2_merge1 = pd.merge(data2,data_time,on='User',how='left')
data2_merge2 = pd.merge(data2_merge1,data_amount,on='User',how='left')
data2_merge2
User Time_x Amount_x Time_y Amount_y
0 1 2014-01-01 32 2014-01-01
1 2 2018-02-01 98 2018-01-01
data2_new = data2_merge2.drop(columns=['Time_x','Amount_x'])
data2_new = data2_new.drop_duplicates(['User'])
data2_new = data2_new.rename(columns={'Amount_y': 'Amount_mean','Time_y':'Date'})
data2_new
User Date Amount_mean
0 1 2014-01-01 30.00
1 2 2018-01-01 61.67

Solution 2

Here we sort the value by User and Time ascendingly, then drop_cuplicates. We get a dataframe ordered by User then Time.

data3 = data.sort_values(by=['User','Time'],ascending=True).drop_duplicates(['User'])
data3
User Time Amount
0 1 2014-01-01 32
4 2 2018-01-01 64

Now we only need to merge the Amount_mean to the dataframe:

data3 = pd.merge(data3,data_amount,on='User',how='left')
data3
User Time Amount_x Amount_y
0 1 2014-01-01 32 30.00
1 2 2018-01-01 64 61.67

Lastly, drop old column and rename new column:

data3_new = data3.drop(columns=['Amount_x'])
data3_new = data3_new.rename(columns={'Amount_y': 'Amount_mean','Time':'Date'})
data3_new
User Date Amount_mean
0 1 2014-01-01 30.00
1 2 2018-01-01 61.67

Source Code