如何sql计算用户分层:新增、活跃、新召回、新沉默、持续沉默
背景:有一个名为record的数据表记录了从2019-01-01开始的用户购买行为的所有记录。
date, time, uid, order_id, amount
2019-01-01,118790987,2325678,50.3
要求:要求基于SQL语言通过ETL建立流失表的方法写pipeline,输出每一天的新用户new, 活跃用户active(今天有完单行为的老用户), 新召回用户resurrected(今天有完单行为、但昨天无的老用户), 新沉默用户churned(今天无完单行为的老用户), 持续沉默用户stale(今天和昨天天均无完单行为的老用户)的数目。
涉及语法:
1、Hive从0.14.0开始提供创建临时表的功能,表只对当前session有效,session退出后,表自动删除。
语法:CREATE TEMPORARY TABLE 表名 AS Select
2、 COALESCE(t.uid, y.uid) AS uid 去除非空值。
思路:
利用record表建立第一天的user_staus表,存放uid,date=第一天,status字段;
利用record表每天刷新建立today_active表,存放uid字段,提取出今天的活跃人群;不应该存放date。
利用user_staus表刷新建立yesterday_status表,存放uid,status;不应该存放date。
利用today_active表和yesterday_status表left join,更新每新的一天,date=‘today’,
第一天:
INSERT INTO user_status AS
SELECT ‘2019-01-01’ AS date, driver_id, ‘New’ AS status
FROM record
WHERE date = ‘2019-01-01’
GROUP BY 1;
从第二天开始, 每天和前一天的status比较
CREATE OR REPLACE TEMPORARY TABLE today_active AS
SELECT uid
FROM record
WHERE date = ‘<TODAY>’
GROUP BY 1;
CREATE OR REPLACE TEMPORARY TABLE yesterday_status AS
SELECT uid, status
FROM user_status
WHERE date = day_add(<TODAY>, -1)
AND status IN (‘Active’, ‘Resurrected’, ‘New’, ‘Churned’, ‘Stale’);
INSERT INTO user_status AS
SELECT
‘<TODAY>’ AS date,
COALESCE(t.uid, y.uid) AS uid,
CASE WHEN t.uid IS NOT NULL AND y.status IS NULL THEN ‘New’
WHEN t.uid IS NOT NULL AND y.status IN (‘Churned’, ‘Stale’) THEN ‘Resurrected’
WHEN t.uid IS NOT NULL AND y.status IN (‘New’, ‘Resurrected’, ‘Active’) THEN ‘Active’
WHEN t.uid IS NULL AND y.status IN (‘Churned’, ‘Stale’) THEN ‘Stale’
WHEN t.uid IS NULL AND y.status IN (‘New’, ‘Resurrected’, ‘Active’) THEN ‘Churned’
ELSE NULL END AS status
FROM today_active t FULL OUTER JOIN yesterday_status y
ON t.uid = y.uid;
这样的结果, 最后会逐渐得到一个名叫user_status的表, 里面记录每一天每一个用户的status.