数据思维与方法数据分析

电商平台用户行为分析(SQL)

2020-11-19  本文已影响0人  分类讨论

本文是对淘宝业务分析案例的汇总。本次分析从用户,商品以及平台三大方面展开,利用了漏斗分析、维度拆分、象限分层、RFM模型等分析方法,通过MySql进行数据的处理,最后利用PowerBI进行数据的可视化。

项目背景

移动互联网企业从粗放式到精细化运营管理过程中,需要结合市场、渠道、用户行为等数据分析,对用户开展有针对性的运营活动,提供个性化、差异化的运营策略,以实现运营业务指标。本项目利用sql对淘宝用户行为数据进行分析,通过用户行为分析业务问题,提供针对性的运营策略。

分析目标

本次分析希望通过淘宝的用户行为数据,了解淘宝基本情况。建立用户行为漏斗,探索需要优化的空间。同时对用户行为及商品展开多维度拆分,针对不同类别的用户及商品开展不同策略的精细化运营。

数据来源及描述:

分析框架

image.png

数据导入:

  1. 将数据保存到本地;
  2. 进入MySql,进入要保存的database,创建表;
  3. 导入数据
mysql> load data local infile "C:\\Users\\MaJie\\Desktop\\tbt\\UserBehavior\\UserBehavior.csv"
    -> into table user
    -> fields terminated by ","
    -> lines terminated by "\n";

备注:由于数据量较大,运行较慢,只取了原数据中的2000000条进行分析

mysql> create table userbehavior as select * from user limit 2000000;

数据清洗

  1. 查看数据缺失值:判断各列的数量是否一致
mysql> SELECT count(user_id),count(item),count(category),count(behavior),count(time) from userbehavior;
  1. 通过查看数据发现,time为时间戳格式,需要进行时间格式转化
mysql> alter table userbehavior add date varchar(20);
mysql> alter table userbehavior add hour varchar(20);
mysql> update userbehavior set date = from_unixtime(time,"%Y-%m-%d");
mysql> update userbehavior set hour = from_unixtime(time,"%H");
mysql> select * from userbehavior limit 2;
+---------+---------+----------+----------+------------+------------+------+
| user_id | item    | category | behavior | time       | date       | hour |
+---------+---------+----------+----------+------------+------------+------+
|       1 | 2268318 |  2520377 | pv       | 1511544070 | 2017-11-25 | 01   |
|       1 | 2333346 |  2520771 | pv       | 1511561733 | 2017-11-25 | 06   |
+---------+---------+----------+----------+------------+------------+------+
  1. 通过查看发现数据主要聚集在11月25日到12-03日之间,取此区间,删除其他数据
# 查看时间聚集情况
select date, count(*) from userbehavior group by date;
# 删除其余数据
DELETE FROM userbehavior
WHERE date < '2017-11-25' or date > '2017-12-03';
mysql> select date, count(*) from userbehavior group by date;
+------------+----------+
| date       | count(*) |
+------------+----------+
| 2017-11-25 |   207782 |
| 2017-11-26 |   213387 |
| 2017-11-27 |   198661 |
| 2017-11-28 |   197818 |
| 2017-11-29 |   202267 |
| 2017-11-30 |   208777 |
| 2017-12-01 |   219657 |
| 2017-12-02 |   273945 |
| 2017-12-03 |   276715 |
+------------+----------+

数据分析

1. 常规指标情况

1.1 PV,UV,用户平均访问量(PV/UV)
mysql> select
    -> sum(case when behavior = "pv" then 1 else 0 end) as PV,
    -> (select count(*) from (select user_id from userbehavior group by user_id) as g) as UV,
    -> (sum(case when behavior = "pv" then 1 else 0 end) / (select count(*) from (select user_id from userbehavior group by user_id) as g)) as 'PV/UV'
    -> from userbehavior;
+---------+-------+---------+
| PV      | UV    | PV/UV   |
+---------+-------+---------+
| 1790225 | 19544 | 91.5997 |
+---------+-------+---------+
image.png

通过统计可知,在2017-11-25 到 2017-12-03之间:
访问量PV:1790225
访问用户数UV:19544
用户平均访问量:约92

1.2 日平均访问量DAU:
mysql> select date, count(user_id) as DAU from (select date,user_id from userbehavior group by date, user_id) as g group by date;
+------------+-------+
| date       | DAU   |
+------------+-------+
| 2017-11-25 | 13927 |
| 2017-11-26 | 14215 |
| 2017-11-27 | 14024 |
| 2017-11-28 | 14037 |
| 2017-11-29 | 14247 |
| 2017-11-30 | 14502 |
| 2017-12-01 | 14685 |
| 2017-12-02 | 19169 |
| 2017-12-03 | 19166 |
+------------+-------+
image.png

日活跃用户数整体上升,在12/02与12/03开始增幅较大,可能与双十二活动推广有关。

1.3 复购率

复购率:在某时间窗口内重复消费用户(消费两次及以上的用户)在总消费用户中占比

mysql> select date,
    ->     count(user_id) as "购买总人数",
    ->     sum(case when buy_num_day > 1 then 1 else 0 end) as "单日复购人数",
    ->     (sum(case when buy_num_day > 1 then 1 else 0 end) / count(user_id)) as "复购率"
    -> from(
    ->     select date, user_id, count(*) as buy_num_day
    ->     from userbehavior
    ->     where behavior = "buy"
    ->     group by date, user_id) as g
    -> group by date;
+------------+-----------------+--------------------+-----------+
| date       | 购买总人数       | 单日复购人数        | 复购率    |
+------------+-----------------+--------------------+-----------+
| 2017-11-25 |            2626 |                747 |    0.2845 |
| 2017-11-26 |            2635 |                792 |    0.3006 |
| 2017-11-27 |            2882 |                866 |    0.3005 |
| 2017-11-28 |            2741 |                849 |    0.3097 |
| 2017-11-29 |            2868 |                880 |    0.3068 |
| 2017-11-30 |            2949 |                872 |    0.2957 |
| 2017-12-01 |            2811 |                827 |    0.2942 |
| 2017-12-02 |            3396 |                952 |    0.2803 |
| 2017-12-03 |            3517 |               1001 |    0.2846 |
+------------+-----------------+--------------------+-----------+
image.png
mysql> select
    ->     sum(case when buy_num > 1 then 1 else 0 end) as "整体复购人数",
    ->     count(user_id) as "购买总人数",
    ->     sum(case when buy_num > 1 then 1 else 0 end)/count(user_id) as "复购率"
    -> from(
    ->     select *,count(behavior) as buy_num
    ->     from userbehavior
    ->     where behavior = 'buy'
    ->     group by user_id) as g;
+--------------------+-----------------+-----------+
| 整体复购人数        | 购买总人数       | 复购率    |
+--------------------+-----------------+-----------+
|               8795 |           13330 |    0.6598 |
+--------------------+-----------------+-----------+

从日复购率情况来看,日复购率在29%左右稳定波动,较为稳定。
从整体复购率来看,复购率高达约66%,表名淘宝的用户忠诚度较高

1.4 跳失率

跳失率:仅仅访问了单个页面的用户占全部访问用户的百分比,或者指从首页离开网站的用户占所有访问用户的百分比。即用户的behavior为1。
跳失率是衡量网站\APP内容质量的重要标准。

mysql> select count(user_id) as "访问一次页面的用户数"
    -> from(
    ->     select user_id
    ->     from userbehavior
    ->     group by user_id
    ->     having count(behavior) = 1) as g;
+--------------------------------+
| 访问一次页面的用户数             |
+--------------------------------+
|                              1 |
+--------------------------------+
# 查看下该用户的访问情况
select * from userbehavior where user_id = (select user_id from userbehavior group by user_id having count(behavior) = 1);
+---------+---------+----------+----------+------------+------------+------+
| user_id | item    | category | behavior | time       | date       | hour |
+---------+---------+----------+----------+------------+------------+------+
|  152435 | 4895956 |  2465336 | pv       | 1511567056 | 2017-11-25 | 07   |
+---------+---------+----------+----------+------------+------------+------+

9天时间跳失率仅为1,表明淘宝在产品与内容的运营上表现优秀,有足够的用户吸引力,确保用户可以长时间驻留选择。

2. 用户行为访问漏斗

常规转化流程大致为:

image.png 结合数据集behavior列,这里简化访问漏斗为:
访问量(pv) ==> 加购物车数(cart) ==> 购买数(buy)
select behavior, count(*) from userbehavior group by behavior; 
+----------+----------+
| behavior | count(*) |
+----------+----------+
| buy      |    40243 |
| cart     |   111015 |
| fav      |    57526 |
| pv       |  1790225 |
+----------+----------+
mysql> select behavior, count(user_id)
    -> from (select behavior, user_id from userbehavior group by behavior, user_id) as g
    -> group by behavior;
+----------+----------------+
| behavior | count(user_id) |
+----------+----------------+
| buy      |          13330 |
| cart     |          14672 |
| fav      |           7854 |
| pv       |          19463 |
+----------+----------------+
image.png

3. 用户行为分析——时间维度拆分

3.1 用户行为分析——按天拆解:

查看用户每天各行为的情况
用户数,访问量,加购量,收藏量,购买量

mysql> select aa.date, bb.用户数, aa.访问量, aa.加购量, aa.收藏量, aa.购买量
    -> from(
    ->     select date,
    ->         sum(case when behavior = 'pv' then 1 else 0 end) as '访问量',
    ->         sum(case when behavior = 'cart' then 1 else 0 end) as '加购量',
    ->         sum(case when behavior = 'fav' then 1 else 0 end) as '收藏量',
    ->         sum(case when behavior = 'buy' then 1 else 0 end) as '购买量'
    ->     from userbehavior
    ->     group by date) as aa
    -> inner join(
    ->     select date, count(*) as  '用户数'
    ->     from (select date, user_id from userbehavior group by date, user_id) as g
    ->     group by date) as bb
    -> on aa.date = bb.date;
+------------+-----------+-----------+-----------+-----------+-----------+
| date       | 用户数    | 访问量     | 加购量    | 收藏量     | 购买量     |
+------------+-----------+-----------+-----------+-----------+-----------+
| 2017-11-25 |     13927 |    186480 |     11405 |      5926 |      3971 |
| 2017-11-26 |     14215 |    191557 |     11560 |      6277 |      3993 |
| 2017-11-27 |     14024 |    177388 |     10959 |      5827 |      4487 |
| 2017-11-28 |     14037 |    177040 |     10826 |      5727 |      4225 |
| 2017-11-29 |     14247 |    181006 |     10856 |      5921 |      4484 |
| 2017-11-30 |     14502 |    187171 |     11292 |      5802 |      4512 |
| 2017-12-01 |     14685 |    196714 |     12572 |      6057 |      4314 |
| 2017-12-02 |     19169 |    245281 |     15791 |      7849 |      5024 |
| 2017-12-03 |     19166 |    247588 |     15754 |      8140 |      5233 |
+------------+-----------+-----------+-----------+-----------+-----------+
image.png

从12/02与12/03,各项数据指标明显提高,推测与淘宝双12预热活动相关。

3.2 用户行为分析——按小时拆解:

mysql> select aa.hour, bb.用户数, aa.访问量, aa.加购量, aa.收藏量, aa.购买量
    -> from(
    ->     select hour,
    ->         sum(case when behavior = 'pv' then 1 else 0 end) as '访问量',
    ->         sum(case when behavior = 'cart' then 1 else 0 end) as '加购量',
    ->         sum(case when behavior = 'fav' then 1 else 0 end) as '收藏量',
    ->         sum(case when behavior = 'buy' then 1 else 0 end) as '购买量'
    ->     from userbehavior
    ->     group by hour) as aa
    -> inner join(
    ->     select hour, count(*) as  '用户数'
    ->     from (select hour, user_id from userbehavior group by hour, user_id) as g
    ->     group by hour) as bb
    -> on aa.hour = bb.hour;
+------+-----------+-----------+-----------+-----------+-----------+
| hour | 用户数    | 访问量     | 加购量    | 收藏量     | 购买量     |
+------+-----------+-----------+-----------+-----------+-----------+
| 00   |      6338 |     60816 |      3723 |      2030 |      1167 |
| 01   |      3249 |     27836 |      1756 |       884 |       484 |
| 02   |      1932 |     15475 |       925 |       597 |       284 |
| 03   |      1385 |     11355 |       750 |       366 |       129 |
| 04   |      1223 |      9515 |       586 |       306 |       156 |
| 05   |      1617 |     10848 |       777 |       406 |       151 |
| 06   |      3442 |     22997 |      1577 |       863 |       354 |
| 07   |      6400 |     43886 |      2849 |      1382 |       735 |
| 08   |      8802 |     60974 |      3839 |      1976 |      1249 |
| 09   |     10367 |     74294 |      4602 |      2531 |      1884 |
| 10   |     11482 |     86088 |      5293 |      2932 |      2484 |
| 11   |     11681 |     84907 |      5350 |      2967 |      2512 |
| 12   |     11819 |     86032 |      5219 |      2848 |      2430 |
| 13   |     12050 |     94554 |      5541 |      3099 |      2573 |
| 14   |     11731 |     91500 |      5355 |      2802 |      2332 |
| 15   |     11978 |     95059 |      5616 |      3045 |      2474 |
| 16   |     11870 |     92677 |      5499 |      2898 |      2345 |
| 17   |     11541 |     83124 |      5344 |      2862 |      2069 |
| 18   |     11492 |     86114 |      4988 |      2607 |      1868 |
| 19   |     12424 |    108553 |      6368 |      3269 |      2326 |
| 20   |     13130 |    131941 |      7911 |      3909 |      2686 |
| 21   |     13441 |    149720 |      9340 |      4372 |      2799 |
| 22   |     12895 |    149017 |      9692 |      4718 |      2766 |
| 23   |     10187 |    112943 |      8115 |      3857 |      1986 |
+------+-----------+-----------+-----------+-----------+-----------+
image.png

从一天内不同时刻各项数据指标来看,从5点到10点,各项指标处于上升阶段,10点到18点各项指标持续稳定在较高的水平,从19点到22点到达高峰,之后开始下降。与用户的作息规律基本吻合。
在活动运营时,在10-22点大概率会获得更好的效果,尤其是在19-22点之间,处于用户最活跃的时间段,适合各种活动营销的开展。

4. 用户行为分析——商品维度拆解

4.1 商品浏览量排行榜Top10

mysql> select item, count(*) as '浏览量'
    -> from userbehavior
    -> where behavior = 'pv'
    -> group by item
    -> order by count(*) desc
    -> limit 10;
+---------+-----------+
| item    | 浏览量    |
+---------+-----------+
|  812879 |       377 |
| 3845720 |       320 |
| 2331370 |       301 |
|  138964 |       294 |
| 2032668 |       293 |
| 3708121 |       258 |
| 1535294 |       256 |
| 3031354 |       252 |
|  987143 |       246 |
| 3371523 |       238 |
+---------+-----------+

4.2 商品销量排行榜Top10

mysql> select item, count(*) as '购买量'
    -> from userbehavior
    -> where behavior = 'buy'
    -> group by item
    -> order by count(*) desc
    -> limit 10;
+---------+-----------+
| item    | 购买量    |
+---------+-----------+
| 3122135 |        35 |
| 3237415 |        17 |
| 1910706 |        16 |
| 2560262 |        16 |
| 3031354 |        15 |
| 2964774 |        15 |
| 1034594 |        14 |
| 4157341 |        14 |
|  705557 |        14 |
| 4574184 |        13 |
+---------+-----------+

4.3 浏览量Top10中购买量Top10的占比

mysql> select aa.item, aa.浏览量, bb.购买量
    -> from(
    ->     select item, count(*) as '浏览量'
    ->     from userbehavior
    ->     where behavior = 'pv'
    ->     group by item
    ->     order by count(*) desc
    ->     limit 10) as aa
    -> inner join(
    ->     select item, count(*) as '购买量'
    ->     from userbehavior
    ->     where behavior = 'buy'
    ->     group by item
    ->     order by count(*) desc
    ->     limit 10) as bb
    -> on aa.item = bb.item;
+---------+-----------+-----------+
| item    | 浏览量    | 购买量    |
+---------+-----------+-----------+
| 3031354 |       265 |        15 |
+---------+-----------+-----------+

浏览量Top10的商品,只有1件在销量Top10,商品推荐算法不太准确,导致用户浏览的商品都不是自己满意的商品,需要对商品推荐算法再优化。

4.4 四象限法分类优化——商品浏览数与销量的关系

这里根据商品的浏览量与购买量将商品划分为四个象限,实际划分标准需要根据实际业务区划分。这里根据二八法则对两个维度进行划分。

mysql> select aa.item, aa.浏览量, bb.购买量
    -> from(
    ->     select item, count(*) as '浏览量'
    ->     from userbehavior
    ->     where behavior = 'pv'
    ->     group by item
    ->     order by count(*) desc) as aa
    -> left join(
    ->     select item, count(*) as '购买量'
    ->     from userbehavior
    ->     where behavior = 'buy'
    ->     group by item
    ->     order by count(*) desc) as bb
    -> on aa.item = bb.item;
+---------+-----------+-----------+
| item    | 浏览量    | 购买量     |
+---------+-----------+-----------+
|  812879 |       377 |         2 |
| 3845720 |       324 |         1 |
| 2032668 |       301 |         3 |
|  138964 |       296 |         3 |
| 2331370 |       281 |         3 |
|  ...... |   ......  |   ......  |
+---------+-----------+-----------+
image.png

5. 基于RFM用户分层模型分析用户行为

RFM模型分析 数据集只涉及最近消费时间间隔R与消费频率F,消费金额M数据集未涉及,这里不对M分层,只通过RF进行用户行为细分。这里定义当前日期为2017/12/03。

5.1 R维度分析

# 获取时间间隔的范围区间
mysql> select datediff('2017/12/03',max(date)) as min_datediff, datediff('2017/12/03',min(date)) as max_datediff from userbehavior;
+--------------+--------------+
| min_datediff | max_datediff |
+--------------+--------------+
|            0 |            8 |
+--------------+--------------+

可知数据集的时间间隔在区间[0:8]天,这里我们定义[0:3]为'1',[4:8]为'0'。(实际工作中看情况定义)

mysql> select user_id, diff_time as R,
    ->     case when diff_time between 0 and 3 then '1'
    ->     when diff_time between 4 and 8 then '0'
    ->     end as R_num
    -> from(
    ->     select user_id, datediff('2017-12-03',max(date)) as 'diff_time'
    ->     from userbehavior
    ->     where behavior = 'buy'
    ->     group by user_id) as g;

5.2 F维度分析

select min(b_num), max(b_num)
from (select user_id, count(*) as b_num from userbehavior where behavior = 'buy' group by user_id) as gg
+------------+------------+
| min(b_num) | max(b_num) |
+------------+------------+
|          1 |         72 |
+------------+------------+

通过查看,用户购买次数最少为1次,最多为72次。这里我们将36次作为标准,定义够阿米次数在[1:14]为0,[15:72]为1。

mysql> select user_id, b_num as F,
    -> case when b_num between 1 and 14 then '1' else '0' end as F_num
    -> from(select user_id, count(*) as b_num from userbehavior where behavior = 'buy' group by user_id) as gg;

5.3 RF维度分析

select ra.user_id, concat(R_num, F_num) as RF
from(
    select user_id, diff_time as R,
        case when diff_time between 0 and 3 then '1'
        when diff_time between 4 and 8 then '0'
        end as R_num
    from(
        select user_id, datediff('2017-12-03',max(date)) as 'diff_time'
        from userbehavior
        where behavior = 'buy'
        group by user_id) as g) as ra
inner join(
    select user_id, b_num as F,
    case when b_num between 1 and 14 then '1' else '0' end as F_num
    from(select user_id, count(*) as b_num from userbehavior where behavior = 'buy' group by user_id) as gg) as fa
on ra.user_id = fa.user_id
select user_id,
    case when RF = '11' then '价值客户'
    when RF = '01' then '保持客户'
    when RF = '10' then '发展客户'
    when RF = '00' then '挽留客户' end as '用户分层'
from (
    select ra.user_id, concat(R_num, F_num) as RF
    from(
        select user_id, diff_time as R,
            case when diff_time between 0 and 3 then '1'
            when diff_time between 4 and 8 then '0'
            end as R_num
        from(
            select user_id, datediff('2017-12-03',max(date)) as 'diff_time'
            from userbehavior
            where behavior = 'buy'
            group by user_id) as g) as ra
inner join(
    select user_id, b_num as F,
    case when b_num between 1 and 14 then '1' else '0' end as F_num
    from(select user_id, count(*) as b_num from userbehavior where behavior = 'buy' group by user_id) as gg) as fa
on ra.user_id = fa.user_id) as user_rf
image.png

6. 探究淘宝商业模式满足“二八定律”还是“长尾理论”

1. 提出假设:淘宝商业模式满足“二八定律”

2. 分析过程

如果淘宝的商业模式满足“二八定律”,那么在淘宝,20%的商品将占有总销量的百分之80%

mysql> select item, count(*) as item_num
    -> from userbehavior
    -> where behavior = 'buy'
    -> group by item
    -> order by item_num desc
    -> limit 10;
+---------+----------+
| item    | item_num |
+---------+----------+
| 3122135 |       35 |
| 3237415 |       17 |
| 1910706 |       16 |
| 2560262 |       16 |
| 2964774 |       15 |
| 3031354 |       15 |
| 1034594 |       14 |
| 4157341 |       14 |
|  705557 |       14 |
| 3189426 |       13 |
+---------+----------+
image.png

根据查询结果,可以看到,销量最多的商品也只卖出35件,其余商品卖出的更少。因此以淘宝为代表的电商平台,其商品售卖主要是依靠长尾商品的累计效应,并非爆款商品的带动。

7. 总结

上一篇 下一篇

猜你喜欢

热点阅读