数据分析案例-某宝数据产品如何构建

2021-06-07 本文已影响0人数有道

某宝数据产品

每年年底我们都能收到某宝的一个触达，回顾当年我们的消费情况，下面是我在WB上搜到的一个账单信息，只截取了部分，从回顾中我们能够清晰的看到当年消费情况，以及一些我们连自己都想不到的一些消费数据。今天我们就来看看像这样的数据产品，我们是如何实现的呢。

2020年支付宝账单

构建数据产品流程

在实际工作中，想要实现像某宝的数据产品，我们主要有以下几部分内容。如下图中红色的第一部分：

首先确定目标，我们这个数据产品为了干嘛，是提高活跃还是希望作为营销工具提高我们核心指标，比如销量。
一旦确定了目标，就需要进行产品的设计，就需要看蓝色部分的设计图，这部分主要是由运营同学来进行设计，确定一些核心指标。
品设计完成之后，我们需要考虑底层数据的加工，加工完成之后，由研发同学进行调用，触达给相应的用户。
最后，数据分析同学进行效果分析，此次活跃的效果如何，在分析时，需要注意的是，我们如何确定是由这个活动带来的活跃，最好是进行埋点，如果没有埋点，那么，我们可以进行一些逻辑判断，比如，至少这个用户当日第一次登录是在我触达之后，具体的逻辑判断需要实际做的时候确定，但是，建议埋点最好
数据产品-营销工具--底层SQL搭建

构建底层数据

一旦我们确定了具体的数据产品，接下来就是如何处理底层数据的问题了。我们以一个具体的数据给大家讲讲该如何去实现，下图是我们具体的数据源：

通过这个数据源，我模拟了设计一个数据产品，如下：

设计

这个数据产品主要分三部分：

第一部分是总述，加上每个月消费额的趋势。
第二部分是按照城市的维度来看
第三部分是看用户第一次购买的数据

通过上面的产品，我们可以知道具体的数据指标，但是如果研发在调用我们数据的时候，一般都是一个用户一条数据，所以，我们需要在底层把用户的数据加工成N个字段，以user_id为主键，其他的所有数据都是json的格式存储

具体SQL

select
  cumu.Customer_ID,
  parse_json('sum_sale',sum_sale,'sum_order_cnt',sum_order_cnt,'dis_city_cnt',dis_city_cnt,'dis_product_cnt',dis_product_cnt,'first_order_date',first_order_date,'first_city',first_city,'first_product_id',first_product_id,'max_sales_city',max_sales_city) as order_list
from 
  (##累计消息相关数据
    select 
    Customer_ID,
    sum(Sales) as sum_sale,##累计消费
    count(distinct order_id) as sum_order_cnt,##累计下单
    count(distinct city) as dis_city_cnt, ##不同的城市
    count(distinct Product_ID) as dis_product_cnt ##不同的产品
  from 
    chaoshi.order
  where 
    year(order_date) = 2017
  group by 
    Customer_ID
  )cumu left join 
    (##第一次消费
  select
    Customer_ID,
    Order_Date as first_order_date,
    city as first_city,
    Product_ID as first_product_id
  from 
    (
    select 
      Customer_ID,
      Product_ID,
      Order_Date,
      city,
      row_number() over(partition by Customer_ID order by Order_Date asc) as num 
    from 
      chaoshi.order
    where 
      year(order_date) = 2017
    )a 
  where 
    num = 1
  )f on cumu.Customer_ID = f.Customer_ID 
  left join 
  (## 消费最多的城市
  select
    Customer_ID,
    city,
    sum_sales_city as max_sales_city
  from     
    (
    select
      Customer_ID,
      city,
      sum_sales_city,
      row_number() over(partition by Customer_ID order by sum_sales_city desc) as num 
    from
      (
      select 
        Customer_ID,
        city,
        sum(sales) as sum_sales_city
      from 
        chaoshi.order
      where 
        year(order_date) = 2017
      group by 
        Customer_ID,
        City
      )a 
    )b 
  where 
    num = 1
  )cy on cumu.Customer_ID = cy.Customer_ID

星星详析

上面我们得到了一个用户的一条数据，但是我们就是取到了累计消费、累计订单、分布的城市数、购买了不同的产品量、第一个订单时间、第一个订单城市、第一个订单的产品以及订单量最多的城市。
最后，通过parse_json，把这些字段整合到一个json的字典中。
但是，我们有没有发现还有一部分内容，就是每个月的消费量趋势没有处理。接下来，我们单独说说这一部分的内容，这部分特殊点在于，每个用户并不是对应一条数据。我们需要其他的函数来处理，具体如下：

select
    Customer_ID,
    concat('[',concat_ws(',',collect_list(json)),']') as month_list
from 
    (
    select
        Customer_ID,
        parse_json('order_month',order_month,'sum_sales_month',sum_sales_month) as json 
    from 
        (##每个月消费数据
        select 
            Customer_ID,
            month(order_date) as order_month,
            sum(sales) as sum_sales_month
        from 
            chaoshi.order
        where 
            year(order_date) = 2017
        group by 
            Customer_ID,
            month(order_date) 
        )m 
    )aa
group by 
    Customer_ID

我们通过concat_ws和collect_list以及concat，把多个json处理成一个列表的形式，这样还是一个用户一条数据，便于研发调用，最后，这两个SQL再关联，取出
Customer_ID/order_list/month_list形成三个字段，具体如下：

Customer_ID	order_list	month_list
235543323	{'sum_sale':23545,'sum_order_cnt':223,'dis_city_cnt':3,'dis_product_cnt':78,'first_order_date':'2017-01-03','first_city':'北京','first_product_id',3535,'max_sales_city':'北京'}	[{'1':234},{'2',464},...]

这样，我们就构建了整个数据产品的底层数据部分。

接下就是运营同学直接就触达我们的目标用户了，触达完了之后，就需要具体分析这个数据产品的效果了。关于如何分析活动的效果，这里我们不做具体的说明，后面会有专门的文章去介绍