groupby 与 distinct 去重时的区别

2018-08-23  本文已影响337人  scottzcw

sql1,select count(distinct sellno) from xxx;

sql2,select count( sellno) from

(select sellno from xxx

group by sellno) t;

sql1执行过程:

Stage-Stage-1: Map: 396 Reduce: 1 Cumulative CPU: 7915.67 sec HDFS Read: 119072894175 HDFS Write: 10 SUCCESS

Total MapReduce CPU Time Spent: 0 days 2 hours 11 minutes 55 seconds 670 msec

sql2执行过程:

Stage-Stage-1: Map: 396 Reduce: 457 Cumulative CPU: 10056.7 sec HDFS Read: 119074266583 HDFS Write: 53469 SUCCESS

Stage-Stage-2: Map: 177 Reduce: 1 Cumulative CPU: 280.22 sec HDFS Read: 472596 HDFS Write: 10 SUCCESS

Total MapReduce CPU Time Spent: 0 days 2 hours 52 minutes 16 seconds 920 msec

总结,distinct会将所有的数据都shuffle到一个reducer里面,而groupby 将数据分布到多台机器上执行,效率更高

上一篇下一篇

猜你喜欢

热点阅读