首页 > 编程知识 正文

spark中groupby用法,分析函数groupby

时间:2023-05-04 12:26:56 阅读:149077 作者:1838

创建数据帧

scalavaldf=seq(|(01 ),Jack ),) 08012345566 ),) 28 ),) SALES ),) 1000 ),1 ),| ) 02 ),(Tom ),) 0889; 0889; ' 08009097878 ',' 25 ',' MARKET ',' 2000 ',1 ),|,04 ',' yjdy ',' 07099661234 ',' 30 ',' LOGISTICS ',' lolot ' MARKET ',' 500 ',1 ),|(09,' Allen ',' 08099206680 ',' 20 ',' MARKET ',' 2500 ',1 ),| ) 10 ',' 10 ' 10,' 10 ' 10 ' 10 ' 10 ' 10 ' 10 ' 10 ' 10 ' 10 ' cellphone ',' age ',' department ',' expense ',' gender ' ' df : org.Apache.spark.SQL.data frame=[ id 3366660 ]

Scala df.show---------------- Scala df.show-------id|name------ ccala 2000|1|| 04|yjdy|0709 9661234|30|logistics|3000|0|||| 05|wy dhmg|0801920888 0801 1223344|22|clerk fn DDN|08080201682|35|mm market|2500|1|||10|Caesar|0901 1020806|32|sales|1000 | 1------------------------------ 1

saladf.groupby('gender ' ).count.show )--------|1|9|||

scaladf.groupby(department ).agg (max )、min )、sum (expense )、mean(age ) ) ).show---------------------------------------- 3000|3000|3000.0|30.0|| management|2500|2500.0|19.0|--------- -或

saladf.filter ($ ' cell phone '.contains (' 080 ' ) ) )、show------------ show-------- gender---------------------------- - 0800 9097878|25|market|2000|1|||05|wy dhmg|0801 9208960|18|market|3500|1|||||06|bob|0800 market|500|1|||09|0809920680|20|mmm sales|1000|1-----------------Allen

Scala df.filter ($ ' cell phone '.contains ) ' 080 ' ).groupby ) $'department ' ).agg ) sum ) $'expense ' ) ) .

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。