大数据分析

用户画像 - 挖掘标签

2021-06-19  本文已影响0人  架构师老狼

RFM 用户价值模型

1 需求

用户画像

2 RFM 是什么RFM

4 高维空间模型

高维空间模型
5 通过打分统一量纲
val rScore: Column = when('r.>=(1).and('r.<=(3)), 5)
  .when('r >= 4 and 'r <= 6, 4)
  .when('r >= 7 and 'r <= 9, 3)
  .when('r >= 10 and 'r <= 15, 2)
  .when('r >= 16, 1)
  .as("r_score")

val fScore: Column = when('f >= 200, 5)
  .when(('f >= 150) && ('f <= 199), 4)
  .when((col("f") >= 100) && (col("f") <= 149), 3)
  .when((col("f") >= 50) && (col("f") <= 99), 2)
  .when((col("f") >= 1) && (col("f") <= 49), 1)
  .as("f_score")

val mScore: Column = when(col("m") >= 200000, 5)
  .when(col("m").between(100000, 199999), 4)
  .when(col("m").between(50000, 99999), 3)
  .when(col("m").between(10000, 49999), 2)
  .when(col("m") <= 9999, 1)
  .as("m_score")

6 模型训练与预测

 def process(source: DataFrame): DataFrame = {
  val assembled = assembleDataFrame(source)
     val regressor = new KMeans()
     .setK(7)
     .setSeed(10)
     .setMaxIter(10)
     .setFeaturesCol("features")
     .setPredictionCol("predict")

   regressor.fit(assembled).save(MODEL_PATH)

  null
}
val assembled = RFMModel.assembleDataFrame(source)

val kmeans = KMeansModel.load(RFMModel.MODEL_PATH)
val predicted = kmeans.transform(assembled)

// 找到 kmeans 生成的组号和 rule 之间的关系
val sortedCenters: IndexedSeq[(Int, Double)] = kmeans.clusterCenters.indices // IndexedSeq
  .map(i => (i, kmeans.clusterCenters(i).toArray.sum))
  .sortBy(c => c._2).reverse

val sortedDF = sortedCenters.toDF("index", "totalScore")

RFE 活跃度

PSM 价格敏感度模型

1 PSM计算公式

// 应收金额
val receivableAmount = ('couponCodeValue + 'orderAmount).cast(DoubleType) as "receivableAmount"
// 优惠金额
val discountAmount = 'couponCodeValue.cast(DoubleType) as "discountAmount"
// 实收金额
val practicalAmount = 'orderAmount.cast(DoubleType) as "practicalAmount"
// 是否优惠
val state = when(discountAmount =!= 0.0d, 1) // =!=是column的方法
  .when(discountAmount === 0.0d, 0)
  .as("state")

// 优惠订单数
val discountCount = sum('state) as "discountCount"
// 订单总数
val totalCount = count('state) as "totalCount"
// 优惠总额
val totalDiscountAmount = sum('discountAmount) as "totalDiscountAmount"
// 应收总额
val totalReceivableAmount = sum('receivableAmount) as "totalReceivableAmount"

// 平均优惠金额
val avgDiscountAmount = ('totalDiscountAmount / 'discountCount) as "avgDiscountAmount"
// 平均每单应收
val avgReceivableAmount = ('totalReceivableAmount / 'totalCount) as "avgReceivableAmount"
// 优惠订单占比
val discountPercent = ('discountCount / 'totalCount) as "discountPercent"
// 平均优惠金额占比
val avgDiscountPercent = (avgDiscountAmount / avgReceivableAmount) as "avgDiscountPercent"
// 优惠金额占比
val discountAmountPercent = ('totalDiscountAmount / 'totalReceivableAmount) as "discountAmountPercent"

// 优惠订单占比 + (平均优惠金额 / 平均每单应收) + 优惠金额占比
val psmScore = (discountPercent + (avgDiscountPercent / avgReceivableAmount) + discountAmountPercent) as "psm"

2 聚类算法原理

3 确定K - 肘部法则

4 模型训练与迭代计算

val kArray = Array(2, 3, 4, 5, 6, 7, 8)
val wssseMap = kArray.map(f = k => {
  val kmeans = new KMeans()
    .setK(k)
    .setMaxIter(10)
    .setPredictionCol("prediction")
    .setFeaturesCol("features")
  val model: KMeansModel = kmeans.fit(vectored)

  import spark.implicits._
  // mlLib计算损失函数
  val vestors: Array[OldVector] = model.clusterCenters.map(v => OldVectors.fromML(v))
  val libModel: LibKMeansModel = new LibKMeansModel(vestors)
  val features = vectored.rdd.map(row => {
    val ve = row.getAs[Vector]("features")
    val oldVe: OldVector = OldVectors.fromML(ve)
    oldVe
  })

  val wssse: Double = libModel.computeCost(features)
  (k, wssse)
}).toMap

分类模型-预测性别

1 预置标签,量化属性

|memberId| color|productType|gender|colorIndex|  color|    productType|gender|productTypeIndex|   features|featuresIndex|
+--------+------+-----------+------+----------+------------------+---------------+------+----------------+-----------+-------------+
|       4|樱花粉|   智能电视|     1|      14.0|樱花粉|       智能电视|     1|            13.0|[14.0,13.0]|  [14.0,13.0]|
|       4|樱花粉|   智能电视|     1|      14.0|  蓝色| Haier/海尔冰箱|     0|             1.0| [14.0,1.0]|   [14.0,1.0]|
val label = when('ogColor.equalTo("樱花粉")
  .or('ogColor.equalTo("白色"))
  .or('ogColor.equalTo("香槟色"))
  .or('ogColor.equalTo("香槟金"))
  .or('productType.equalTo("料理机"))
  .or('productType.equalTo("挂烫机"))
  .or('productType.equalTo("吸尘器/除螨仪")), 1)
  .otherwise(0)
  .alias("gender")

2 决策树算法

3 算法工程与模型评估

val featureVectorIndexer = new VectorIndexer()
  .setInputCol("features")
  .setOutputCol("featuresIndex")
  .setMaxCategories(3)

val decisionTreeClassifier = new DecisionTreeClassifier()
  .setFeaturesCol("featuresIndex")
  .setLabelCol("gender")
  .setPredictionCol("predict")
  .setMaxDepth(5)
  .setImpurity("gini")

val pipeline = new Pipeline()
  .setStages(Array(colorIndexer, productTypeIndexer, featureAssembler, featureVectorIndexer, decisionTreeClassifier))

val Array(trainData, testData) = source.randomSplit(Array(0.8, 0.2))

val model: PipelineModel = pipeline.fit(trainData)
 val pTrain = model.transform(trainData)
 val tTrain = model.transform(testData)

val accEvaluator = new MulticlassClassificationEvaluator()
  .setPredictionCol("predict")
  .setLabelCol("gender")
  .setMetricName("accuracy")//精准度
上一篇 下一篇

猜你喜欢

热点阅读