数据分析Python 数据分析

支付宝营销策略分析

2020-07-24  本文已影响0人  Crystal_皓严
涉及字段如下:

一. 数据清洗(python)

1. 导入常用库并导入数据

import pandas as pd
import numpy as np
data = pd.read_csv('E:\datalist\zf/effect_tb.csv')
data.columns = ['dt','user_id','label','dmp_id']
# 日志天数属性用不上,删除该列
data = data.drop(columns = 'dt')
data.head()
image.png

2. 重复值处理

data.shape
data.nunique()
image.png
data[data.duplicated(keep = False)].sort_values(by = ['user_id'])
image.png
data.drop_duplicates()
data[data.duplicated(keep = False)]
image.png

3. 空值及数据类型

data.info(null_counts = True)
image.png

4. 保存数据

data.to_csv("./output.csv", index = False)
image.png

二. 【A/B测试】(mysql)

1. 导入数据

image.png

2. 假设检验

select dmp_id,avg(label) 点击率 
from output 
group by dmp_id;
image.png
a. 零假设和备择假设
b. 分布类型、检验类型和显著性水平

2.1 用户数

create table user_num as 
(select dmp_id,count(dmp_id) 用户数
from output where dmp_id=1
union
select dmp_id,count(dmp_id) 
from output where dmp_id=3);
image.png

2.2 点击数

create table click as 
(select dmp_id,count(dmp_id) 点击数
from output where dmp_id=1 and label=1
union
select dmp_id,count(dmp_id) 
from output where dmp_id=3 and label=1);
image.png

2.3 计算点击率

create table rate as 
(select a.dmp_id,b.`点击数`/a.`用户数` as 点击率
from user_num a 
join click b on a.dmp_id = b.dmp_id);
image.png

2.4 总和点击率

select sum(b.`点击数`)/sum(a.`用户数`) 总和点击率
from user_num a 
join click b on a.dmp_id = b.dmp_id;
image.png

2.5 计算统计量Z

select (list1.`差`)/sqrt(c.`总和点击率`*(1-c.`总和点击率`)*(list2.`和`)) 值 
from 
(select a.`点击率`-b.`点击率` 差 
from rate a ,rate b where b.dmp_id =3 and a.dmp_id =1) list1,
(select sum(`点击数`)/sum(`用户数`) 总和点击率 
from user_num  join click  on user_num.dmp_id = click.dmp_id) c,
(select (1/d.`用户数`)+(1/e.`用户数`) 和 
from user_num d ,user_num e where d.dmp_id = 3 and e.dmp_id =1) list2;
image.png
image.png

总结

上一篇 下一篇

猜你喜欢

热点阅读