淘宝用户行为分析(mysql)

2020-05-30  本文已影响0人  妖妖棂

一数据理解

1.总体介绍

2.字段介绍

image.png image.png

二分析思路

1.提出问题

本次分析主要目标是通过对淘宝的行为数据分析,为以下问题提供解释或改进建议:
1.分析用户使用app过程中的常见电商指标,了解运营现状,查看各个环节流失率,并找到改进的点
2.研究用户在不同维度下的行为规律,了解用户行为特征,优化运营策略
3.研究用户价值,针对不同用户进行精细化运营
4.研究用户生命周期,针对不同周期用户采取不同的运营策

2.数据摸索

数据为电商类型数据,包括用户商品,商品类别,用户行为,时间5个维度.因为是电商类的数据,所以的先了解电商运营基础指标,指标如下:


image.png

了解了电商基础运营指标后,结合本数据看看能做的分析有哪几个方面:item_ID category_ID.timestamp
1.基于user_ID,.behavior_type等字段数据可以分析PV,UV . PV/UV 跳失率.总订单量等整体运营指标 。
2.基于 AARRR模型,可以分析用户生命周期,划分不同用户所处周期阶段。
3.基于time字段可以和behavior_type等字段,可以分析用户行为特征。

3.分析思路

可查看本文结构中的模型建立可视化部分

三数据清洗

1.导入数据

通过数据库工具Navicat将数据导入Mysql本地数据库。

2.列新增

表里新建1列 date 存放年-月-

UPDATE userbehavior_test SET date = from_unixtime(timestamp,'%Y-%m-%d');
UPDATE userbehavior_test set timestamp = FROM_UNIXTIME(timestamp,'%H');

把时间戳的数据分别转换成 年-月-日 和 小时 2种格式,方便统计


image.png
3.缺失值处理

因为导入数据的时候都设置NOT NULL,所以是没有缺失值,这里可以用代码验证:

SELECT 
    COUNT(us.user_ID),
    COUNT(us.item_ID),
    COUNT(us.behavior_type),
  COUNT(us.`timestamp`),
    COUNT(us.date)
FROM userbehavior as us;
查询结果.png
4.行为值检测
SELECT DISTINCT u.behavior_type FROM userbehavior as u;
查询结果.png

四.构建模型

1.总体运营指标

1.流量指标

页面访客数(pv),独立访客少(uv),人均点击数(pv/uv)


结构展示.png
SELECT
  us.date as 日期,
    COUNT(*) AS 'pv',
    COUNT(DISTINCT us.user_ID) AS 'uv',
    COUNT(*) / COUNT(DISTINCT us.user_ID) AS 'pv/uv'
FROM
userbehavior AS us 
WHERE
us.behavior_type = 'pv'
GROUP BY
us.date
image.png
可视化.png
2.AARRR漏斗转化率
1.按照页面访客计算转化率
SELECT
    us.behavior_type,
    COUNT(*)
FROM
    userbehavior AS us
GROUP BY
    us.behavior_type
image.png image.png
每日行为分析
SELECT
  us.date as 日期,
    SUM(CASE WHEN us.behavior_type='pv' THEN 1 ELSE 0 END)AS '浏览数',
  SUM(CASE WHEN us.behavior_type='cart' OR us.behavior_type='fav' THEN 1 ELSE 0 END)AS '购物车@收藏',
 
 SUM(CASE WHEN us.behavior_type='buy' THEN 1 ELSE 0 END)AS '付费数'
FROM
    userbehavior AS us
GROUP BY
us.date
查询结果.png
可视化.png
跳失率
SELECT
    COUNT(*) AS 只停留在详情页
FROM
    (
        SELECT
            a.user_ID,
            COUNT(*)
        FROM
            (
                SELECT
                    us.user_ID
                FROM
                    userbehavior AS us
                GROUP BY
                    us.user_ID,
                    us.behavior_type
            ) AS a
        GROUP BY
            a.user_ID
        HAVING
            COUNT(*) = 1
    ) AS b
image.png

(4)订单指标

(1)成交量

1.先看总体成交量

SELECT
    COUNT(us.behavior_type)
FROM
    userbehavior AS us
WHERE
    us.behavior_type = 'buy'
image.png

2.每日成交量

SELECT
  us.date,
    COUNT(us.behavior_type)
FROM
    userbehavior AS us
WHERE
    us.behavior_type = 'buy'
GROUP BY
    us.date
image.png
可视化.png

人均购买次数

SELECT
  COUNT(us.behavior_type) AS '总单量',
    COUNT(DISTINCT us.user_ID) AS  '总用户数',
    COUNT(us.behavior_type)/COUNT(DISTINCT us.user_ID)
FROM
    userbehavior AS us
WHERE
    us.behavior_type = 'buy'
image.png
再看每日
SELECT
  us.date,
    COUNT(us.behavior_type) AS '总单量',
    COUNT(DISTINCT us.user_ID) AS '总用户数',
    COUNT(us.behavior_type) / COUNT(DISTINCT us.user_ID) AS '人均购买数'
FROM
    userbehavior AS us
WHERE
    us.behavior_type = 'buy'
GROUP BY
    us.date
image.png
image.png

分析
平均一个用户购买了3次,每天的订单量与流量指标的走势也是一致的。

复购率

至少购买一次的用户
SELECT
    COUNT(DISTINCT us.user_ID)
FROM
    userbehavior AS us
WHERE
    us.behavior_type = 'buy'

购买大于一次的用户
SELECT
    COUNT(DISTINCT user_ID)
FROM
    (
        SELECT
            us.user_ID,
            COUNT(us.behavior_type) AS num
        FROM
            userbehavior AS us
        WHERE
            us.behavior_type = 'buy'
        GROUP BY
            us.user_ID
    ) AS a
WHERE
    num > 1
image.png
image.png

复购率= 购买大于1次的人数/至少成交一单的人数 = 16712/25400=65.7%

那些商品被复购次数最多?那些用户进行了更多的重复购买呢?

商品类型复购排行榜

SELECT 
    us.item_ID,
COUNT(us.behavior_type) as '购买次数'
FROM
    userbehavior as us
WHERE 
    us.behavior_type ='buy'
GROUP BY 
    us.item_ID
ORDER BY
    COUNT(us.behavior_type) DESC
LIMIT 10
查询结果.png

用户复购排行榜

SELECT 
    us.user_ID,
COUNT(us.behavior_type) as '购买次数'
FROM
    userbehavior as us
WHERE 
    us.behavior_type ='buy'
GROUP BY 
    us.user_ID
ORDER BY
    COUNT(us.behavior_type) DESC
LIMIT 10
image.png

用户行为特征

用户行为日期的特征
SELECT 
    us.date,
    COUNT(*) as '行为总数',
    SUM(CASE WHEN us.behavior_type = 'pv' THEN 1 ELSE 0 END) AS '点击次数',
    SUM(CASE WHEN us.behavior_type = 'fav' THEN 1 ELSE 0 END) AS '收藏次数',
    SUM(CASE WHEN us.behavior_type = 'cart' THEN 1 ELSE 0 END) AS '加购次数',
    SUM(CASE WHEN us.behavior_type = 'buy' THEN 1 ELSE 0 END) AS '购买次数',
    COUNT(DISTINCT us.user_ID) AS '总用户数',
    COUNT(*)/COUNT(DISTINCT us.user_ID) as '人均行为次数',
    CONCAT(ROUND(SUM(CASE WHEN us.behavior_type = 'pv' THEN 1 ELSE 0 END)/COUNT(*)*100,2),'%') AS '点击数占比' ,
  CONCAT(ROUND(SUM(CASE WHEN us.behavior_type = 'buy' THEN 1 ELSE 0 END)/COUNT(*)*100,2),'%') AS '成交数占比' 
FROM
    userbehavior as us
GROUP BY 
    us.date
ORDER BY
    us.date asc
image.png
image.png
用户行为小时的特征
SELECT 
    us.`timestamp`,
    COUNT(*) as '行为总数',
    SUM(CASE WHEN us.behavior_type = 'pv' THEN 1 ELSE 0 END) AS '点击次数',
    SUM(CASE WHEN us.behavior_type = 'fav' THEN 1 ELSE 0 END) AS '收藏次数',
    SUM(CASE WHEN us.behavior_type = 'cart' THEN 1 ELSE 0 END) AS '加购次数',
    SUM(CASE WHEN us.behavior_type = 'buy' THEN 1 ELSE 0 END) AS '购买次数',
    COUNT(DISTINCT us.user_ID) AS '总用户数',
    COUNT(*)/COUNT(DISTINCT us.user_ID) as '人均行为次数',
    CONCAT(ROUND(SUM(CASE WHEN us.behavior_type = 'pv' THEN 1 ELSE 0 END)/COUNT(*)*100,2),'%') AS '点击数占比' ,
  CONCAT(ROUND(SUM(CASE WHEN us.behavior_type = 'buy' THEN 1 ELSE 0 END)/COUNT(*)*100,2),'%') AS '成交数占比' 
FROM
    userbehavior as us
GROUP BY 
    us.`timestamp`
ORDER BY
    us.`timestamp` asc
image.png
image.png

同样也从点击率和成交率的关系对以上结论进行验证


image.png
用户商品偏好
SELECT  
    us.category_ID,
    SUM(CASE WHEN us.behavior_type = 'pv' THEN 1 ELSE 0 END) AS '点击次数',
    SUM(CASE WHEN us.behavior_type = 'buy' THEN 1 ELSE 0 END) AS '购买次数'
END)/COUNT(us.behavior_type)*100,2),'%') AS '成交数占比' 
FROM
    userbehavior as us
GROUP BY 
    us.category_ID
ORDER BY
    SUM(CASE WHEN us.behavior_type = 'pv' THEN 1 ELSE 0 END) DESC
,SUM(CASE WHEN us.behavior_type = 'buy' THEN 1 ELSE 0 END) DESC
image.png image.png
用户行为路径上的特征:
SELECT 
    COUNT(qq.user_ID) AS aaa1
FROM
    (SELECT user_ID
        ,SUM(CASE WHEN behavior_type = 'pv' THEN 1 ELSE 0 END) AS `点击`
        ,SUM(CASE WHEN behavior_type = 'fav' THEN 1 ELSE 0 END) AS `收藏`
        ,SUM(CASE WHEN behavior_type = 'cart' THEN 1 ELSE 0 END) AS `加购`
        ,SUM(CASE WHEN behavior_type = 'buy' THEN 1 ELSE 0 END) AS `购买`
    FROM
        userbehavior    AS us
GROUP BY user_ID    
) as qq
WHERE
    (qq.`点击`=0 AND qq.`收藏`=0 AND qq.`加购`=0 AND qq.`购买`>=1)

从AARRR漏斗转化模型来看,用户购买商品可分类如下几种过程:
浏览收藏购买
浏览加购购买
浏览购买
加购购买
收藏购买
直接购买
通过代码得出以下数据

image.png image.png
用户复购率特征
SELECT uo.`购买次数`,COUNT(uo.user_ID)
FROM (SELECT us.user_ID,COUNT(us.behavior_type) as '购买次数' FROM userbehavior as us
                WHERE
                    us.behavior_type = 'buy'
                GROUP BY
                    us.user_ID) as uo
GROUP BY
    uo.`购买次数`

image.png

可视化

image.png image.png

RFM用户价值

统计用户R值和F值

R统计基准日期为 2017-12-5

CREATE VIEW RF AS
SELECT 
    user_ID
    ,DATEDIFF('2017-12-5',max(us.date))AS R
    ,COUNT(us.behavior_type) AS F
FROM userbehavior as us
WHERE us.behavior_type= 'buy'
GROUP BY
    us.user_ID
查询结果.png

给RF分组,给用户打分

SELECT 
MAX(R)
,MIN(R),MAX(F),MIN(F)
    FROM
        rf
查询结果.png

将R/F按四组
R值:0-2,3-5,6-8,9-10分别对应4分·3分·2分·2分
F值:1-6,7-11,11-20,21-84分别对应1分·2分·3分··4分
创建视图 rf_rum

CREATE VIEW rf_rum AS
SELECT  user_ID,(
    CASE WHEN R BETWEEN 0 AND 2 THEN 4
    WHEN R BETWEEN 5 AND 5 THEN 3
    WHEN R BETWEEN 6 AND 8 THEN 2
    WHEN R BETWEEN 9 AND 10 THEN 2
    ELSE 0
    END) AS R_score
    ,(
    CASE WHEN F BETWEEN 1 AND 6 THEN 1
    WHEN F BETWEEN 7 AND 11 THEN 2
    WHEN F BETWEEN 11 AND 20 THEN 3
    WHEN F BETWEEN 21 AND 84 THEN 4
    ELSE 0
    END) AS F_score
FROM rf 
image.png

计算R_score和F_score的平均值,确定评分标准
按照RF的平均值打标签,然后将RF的评分跟平均分比较,比如重要价值客户两项评分比较都高于平均分:

SELECT AVG(R_score),AVG(F_score)
FROM rf_rum AS fr
image.png
给不同用户贴上价值标签
CREATE VIEW 用户标签 AS
SELECT rfa.user_ID,(CASE WHEN rfa.R_score  >2 AND rfa.F_score >1.1 THEN '重要价值'
                                            WHEN rfa.R_score  >2 AND rfa.F_score <=1.1 THEN '重要保持'
                                            WHEN rfa.R_score  <=2 AND rfa.F_score >1.1 THEN '重要发展'
                                            WHEN rfa.R_score  <=2 AND rfa.F_score <=1.1 THEN '一般价值'
                                            ELSE 0
                                            END) as  客户类型
FROM fr_rum as rfa
查询结果.png
查看不同价值用户占比
SELECT
    客户类型,
    COUNT(user_ID) AS 数量
FROM
    用户标签
GROUP BY
    客户类型
ORDER BY
    数量
查询结果.png image.png
现象描述/分析

对不同类型的用户实行不同的营销策略:

总结

通过对运营指标,用户行为,用户价值的分析,的出以下结论:

(1)、总体运营指标方面:

1、从对流量指标和订单产生效率指标分析可知,促销活动对用户各项指标的影响非常大。

2、用户行为转化率很低但跳失率却不高,这说明用户在首页进行多次点击后并未找到中意的商品,有意向,购买转化率50%,说明有过半数的加购最终也没有成交。

3、页面访客基准的转化率仅为5%和1%6,而基于独立访客转化率为100%和51%,跳失率也不高,这说明商品还是符合用户需求的,只是"道路不够通畅”

(2)、用户行为特征:

1、时间上的活跃度特征:活动日活跃平日平稳;周末活跃平日平稳;晚间活跃白天一般。

2、商品上的活跃度特征:仅购买一次的用户占479%,购买5次以内占比90%以上,说明留存率很低,结合基于独立访客的转化率来看,运营效果不好。

3、购买路径上的特征:直接购买占比5896,浏览购占比47%,用户并不会走完每一个步骤,商品品质和首页引导对销量转化非常关键。

4、复购率上的特征;总体复购率53%,可针对复购率高的商品和用户,分析其画像,提升复购,增加留存。

(3)、用户价值:

1、重要价值客户仅为5%,就是最近有消费目频次高的用户,这说明高价信用户比例少/留存低

2、重要保持客户32%,就是那些最近买过但频率不高,比例较高,需丰富运营活动,刺激消费,让更多的人成为重要价值客户。

3、重要发展娩回客户仅为4%,就是那些以前消费频率高但已经很久没有消费了,这说明平台流失率低,用户忠诚度较好。

上一篇 下一篇

猜你喜欢

热点阅读