pyspark 添加UUID

2023-12-19  本文已影响0人  FireJohnny
import pyspark.sql.functions as f
from pyspark.sql.types import StringType

# method 1 use udf 
uuid_udf = f.udf(lambda : str(uuid.uuid4().hex), StringType())
df_with_uuid = df.withColumn('uuid', uuid_udf())

# method 2 use lit 
df_with_uuid = df.withColumn('uuid', f.lit(uuid.uuid4().hex))

code来源:https://elegantdata.blogspot.com/2021/03/add-uuid-column-to-spark-dataframe.html?lr=1

方法简述

上述两种添加uuid的方法第一种正确:
result method1:

Name Age City uuid
John 25 New York 8a8d84e99b6f49aea...
Emma 28 London dff0676453494d7cb...
Mike 30 Paris db93842d82e34a11a...
John 27 London cd3e3cac967a471a8...

result method2:

Name Age City uuid2
John 25 New York 98426e22f58442f59...
Emma 28 London 98426e22f58442f59...
Mike 30 Paris 98426e22f58442f59...
John 27 London 98426e22f58442f59...
上一篇下一篇

猜你喜欢

热点阅读