spark rdd.distinct()实现原理

2019-11-05 本文已影响0人邵红晓

看代码

 def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope {
    map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)
  }

上一篇下一篇

猜你喜欢

热点阅读