之前一直没太关注 order by 和sort by的区别,今天看了下。
首先 ,如果在 严格模式下直接使用order by 会报错,必须加上 LIMIT关键字; In strict mode, if ORDER BY is specified, LIMIT must also be specified. set hive.mapred.mode=nonstrict; #或者将参数值设置为,nostrictselect *from Awhere d ='2018-10-22'order by checkin_time limit 100 sort by 的语法不会受到set hive.mapred.mode 参数影响,
select *from Awhere d ='2018-10-22' sort by checkin_time
distribute by $ 按指定的key 去分发数据,相同key数据会被分到同一个reduce
select *from Awhere d ='2018-10-22'distribute by clientname sort by checkin_time #cluster by 等价于以上语句,但是cluster by 只能降序select *from Awhere d ='2018-10-22' cluster by checkin_time