数据分析

python实战——UCI Data Cohort Analys

2019-10-18  本文已影响0人  许志辉Albert

数据来自 https://archive.ics.uci.edu/ml/datasets/online+retail#记录了01/12/2010至 09/12/2011 期间某个英国电商网站所有用户真实的交易数据,共约50万行

1.加载模块

import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib as mpl

2.加载数据文件

df = pd.read_csv('uci_csv')
df.head(10)

输出结果:

UCI数据前10行

3.定义日期相关函数

def get_month(x):
    date_time = pd.Timestamp(x)
    return dt.datetime(date_time.year,date.month,1)

def month_differ(x,y):
    date_time_x = pd.Timestamp(x)
    date_time_y = pd.Timestamp(y)
    month_differ = (date_time_x.year - date_time_y.year) *12 +(date_time_x.month - date_time_y.month)
    return month_differ

4.获得每个用户最早购买的月份CohortMonth

df['OrderMonth'] = df['InvoiceDate'].apply(get_month)  
df['cohortMonth'] = df.groupby("CustomerID")["OrderMonth"].transform(np.min)
df.head()

输出结果:

5.获得每个订单是用户在第几个月购买的

df["CohortIndex"] = df.apply(lambda x:month_differ(x,OrderMonth,x.CohortMonth),axis = 1)
df.head()

输出结果:

6.Group BY 统计每个Cohort Group 第0个月到第n个月的用户数

cohort_data = df.groupby(['CohortMonth','CohortIndex'])['CustomerID'].agg('nunique').reset_index()
cohort_data.head()

输出结果:

7.Pivot

cohort_pivot = cohort_data.pivot_table(index = 'CohortIndex',columns = 'CohortMonth',values = 'CustomerID')
cohort_pivot.columns = cohort_pivot.columns.date
cohort_pivot.fillna(' ')

输出结果:

8.计算百分比

cohort_base = cohort_pivot.iloc[0,:]
retention = cohort_pivot.divide(cohort_base,axis=1)
retention.fillna(' ')
retention

输出结果:

9.绘制图形

retention.iloc[:,:5].plt()
plt.show()

输出结果:

上一篇下一篇

猜你喜欢

热点阅读