pyspark向量装配与笛卡尔积

2021-08-21  本文已影响0人  米斯特芳

向量装配VectorAssembler:对每一行,将多个列的元素组成一个向量
笛卡尔积Interaction:这个也不知道怎么翻译好,先对集合做笛卡尔积,然后对每个元组结果做累乘,得到一个元素为向量的列

from pyspark.ml.feature import Interaction, VectorAssembler
from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .appName("InteractionExample")\
    .getOrCreate()

df = spark.createDataFrame(
    [(1, 1, 2, 3, 8, 4, 5),
     (2, 4, 3, 8, 7, 9, 8),
     (3, 6, 1, 9, 2, 3, 6),
     (4, 10, 8, 6, 9, 4, 5),
     (5, 9, 2, 7, 10, 7, 3),
     (6, 1, 1, 4, 2, 8, 4)],
    ["id1", "id2", "id3", "id4", "id5", "id6", "id7"])

assembler1 = VectorAssembler(inputCols=["id2", "id3", "id4"], outputCol="vec1")
assembled1 = assembler1.transform(df)# 将["id2", "id3", "id4"]装配为一个元素为向量的列
assembler2 = VectorAssembler(inputCols=["id5", "id6", "id7"], outputCol="vec2")
assembled2 = assembler2.transform(assembled1).select("id1", "vec1", "vec2")
# 对["id1", "vec1", "vec2"]求笛卡尔积后,每个元组内的元素累乘,得到一个元素为向量的列
interaction = Interaction(inputCols=["id1", "vec1", "vec2"], outputCol="interactedCol")
interacted = interaction.transform(assembled2)
interacted.show(truncate=False)
上一篇 下一篇

猜你喜欢

热点阅读