Java实现GroupBy/分组TopN功能
介绍
在Java 8 的Lambda(stream)之前,要在Java代码中实现类似SQL中的group by分组聚合功能,还是比较困难的。这之前Java对函数式编程支持不是很好,Scala则把函数式编程发挥到了机制,实现一个group by聚合对Scala来说就是几行代码的事情:
valbirds = List("Golden Eagle","Gyrfalcon","American Robin","Mountain BlueBird","Mountain-Hawk Eagle")valgroupByFirstLetter =birds.groupby(_.charAt(0))
输出:
Map(M-> List(MountainBlueBird, Mountain-Hawk Eagle), G -> List(GoldenEagle, Gyrfalcon), A -> List(AmericanRobin))
Java也有一些第三方的函数库来支持,例如Guava的Function,以及functional java这样的库。 但总的来说,内存对Java集合进行GroupBy ,OrderBy, Limit等TopN操作还是比较繁琐。本文实现一个简单的group功能,支持自定义key以及聚合函数,通过简单的几个类,可以实现SQL都比较难实现的先分组,然后组内排序,最后取组内TopN。
源码可以在这里下载;
实现
假设我们有这样一个Person类:
packageme.lin;classPerson{privateString name;privateintage;privatedoublesalary;publicPerson(String name,intage,doublesalary){super();this.name = name;this.age = age;this.salary = salary; }publicStringgetName(){returnname; }publicvoidsetName(String name){this.name = name; }publicintgetAge(){returnage; }publicvoidsetAge(intage){this.age = age; }publicdoublegetSalary(){returnsalary; }publicvoidsetSalary(doublesalary){this.salary = salary; }publicStringgetNameAndAge(){returnthis.getName() +"-"+this.getAge(); }@OverridepublicStringtoString(){return"Person [name="+ name +", age="+ age +", salary="+ salary +"]"; }}
对于一个Person的List,想要根据年龄进行统计,取第一个值,取salary最高值等。实现如下:
聚合操作
定义一个聚合接口,用于对分组后的元素进行聚合操作,类比到MySQL中的count(*) 、sum():
package me.lin;import java.util.List;/**
*
* 聚合操作
*
* Created by Brandon on 2016/7/21.
*/publicinterfaceAggregator{/** * 每一组的聚合操作 * *@paramkey 组别标识key *@paramvalues 属于该组的元素集合 *@return*/Object aggregate(Object key ,List values);}
我们实现几个聚合操作,更复杂的操作支持完全可以自己定义。
CountAggragator:package me.lin;importjava.util.List;/**
*
* 计数聚合操作
*
* Created by Brandon on 2016/7/21.
*/publicclassCountAggregatorimplementsAggregator{@OverridepublicObjectaggregate(Objectkey,List values) {returnvalues.size(); }}
FisrtAggregator:package me.lin;importjava.util.List;/**
*
* 取第一个元素
*
* Created by Brandon on 2016/7/21.
*/publicclassFirstAggregatorimplementsAggregator{@OverridepublicObjectaggregate(Objectkey,List values) {if( values.size() >=1) {returnvalues.get(0); }else{returnnull; } }}
TopNAggregator:packageme.lin;importjava.util.ArrayList;importjava.util.Collections;importjava.util.Comparator;importjava.util.List;/**
*
* 取每组TopN
*
* Created by Brandon on 2016/7/21.
*/publicclass TopNAggregator implements Aggregator {privateComparator comparator;privateintlimit;publicTopNAggregator(Comparator comparator,intlimit) {this.limit = limit;this.comparator = comparator; } @OverridepublicObjectaggregate(Objectkey, List values) {if(values ==null|| values.size() ==0) {returnnull; } ArrayListcopy=newArrayList<>( values ); Collections.sort(copy, comparator);intsize= values.size();inttoIndex = Math.min(limit,size);returncopy.subList(0, toIndex); }}
分组实现
接下来是分组实现,简单起见,采用工具类实现:
packageme.lin;importjava.lang.reflect.Field;importjava.lang.reflect.InvocationTargetException;importjava.lang.reflect.Method;importjava.util.ArrayList;importjava.util.Collection;importjava.util.Collections;importjava.util.HashMap;importjava.util.Map;/**
* Collection分组工具类
*/publicclass GroupUtils {/**
* 分组聚合
*
* @param listToDeal 待分组的数据,相当于SQL中的原始表
* @param clazz 带分组数据元素类型
* @param groupBy 分组的属性名称
* @param aggregatorMap 聚合器,key为聚合器名称,作为返回结果中聚合值map中的key
* @param <T> 元素类型Class
* @return
* @throws NoSuchFieldException
* @throws SecurityException
* @throws IllegalArgumentException
* @throws IllegalAccessException
*/publicstatic Map> groupByProperty( Collection listToDeal, Class clazz,StringgroupBy, Map> aggregatorMap)throwsNoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException { Map> groupResult =newHashMap>();for(T ele : listToDeal) { Field field = clazz.getDeclaredField(groupBy); field.setAccessible(true);Objectkey= field.get(ele);if(!groupResult.containsKey(key)) { groupResult.put(key,newArrayList()); } groupResult.get(key).add(ele); }returninvokeAggregators(groupResult, aggregatorMap); }publicstatic Map> groupByMethod( Collection listToDeal, Class clazz,StringgroupByMethodName, Map> aggregatorMap)throwsNoSuchMethodException, SecurityException, IllegalAccessException, IllegalArgumentException, InvocationTargetException { Map> groupResult =newHashMap>();for(T ele : listToDeal) { Method groupByMenthod = clazz.getDeclaredMethod(groupByMethodName); groupByMenthod.setAccessible(true);Objectkey= groupByMenthod.invoke(ele);if(!groupResult.containsKey(key)) { groupResult.put(key,newArrayList()); } groupResult.get(key).add(ele); }returninvokeAggregators(groupResult, aggregatorMap); }privatestatic Map> invokeAggregators(Map> groupResult, Map> aggregatorMap) { Map> aggResults =newHashMap<>();for(Objectkey: groupResult.keySet()) { Collection group = groupResult.get(key); Map aggValues = doInvokeAggregators(key, group, aggregatorMap);if(aggValues !=null&& aggValues.size() >0) { aggResults.put(key, aggValues); } }returnaggResults; }privatestatic Map doInvokeAggregators(Objectkey, Collection group, Map> aggregatorMap) { Map aggResults =newHashMap();if(group !=null&& group.size() >0) {// 调用当前key的每一个聚合函数for(StringaggKey : aggregatorMap.keySet()) { Aggregator aggregator = aggregatorMap.get(aggKey);ObjectaggResult = aggregator.aggregate(key, Collections.unmodifiableList(newArrayList(group))); aggResults.put(aggKey, aggResult); } }returnaggResults; }}
上述代码中,分组的key可以指定元素的属性,也可以指定元素的方法,通过自己实现复杂方法和聚合函数,可以实现很强大的分组功能。
测试
根据属性分组
下面测试一下根据属性分组:
packageme.lin;importjava.util.ArrayList;importjava.util.Comparator;importjava.util.HashMap;importjava.util.List;importjava.util.Map;publicclass GroupByPropertyTest {publicstaticvoidmain(String[] args)throwsNoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException { List persons =newArrayList<>(); persons.add(newPerson("Brandon",15,5000)); persons.add(newPerson("Braney",15,15000)); persons.add(newPerson("Jack",10,5000)); persons.add(newPerson("Robin",10,500000)); persons.add(newPerson("Tony",10,1400000)); Map> aggregatorMap =newHashMap<>(); aggregatorMap.put("count",newCountAggregator()); aggregatorMap.put("first",newFirstAggregator()); Comparator comparator =newComparator() {publicintcompare(finalPerson o1,finalPerson o2) {doublediff = o1.getSalary() - o2.getSalary();if(diff ==0) {return0; }returndiff >0?-1:1; } }; aggregatorMap.put("top2",newTopNAggregator( comparator ,2)); Map> aggResults = GroupUtils.groupByProperty(persons, Person.class,"age", aggregatorMap);for(Objectkey: aggResults.keySet()) { System.out.println("Key:"+key); Map results = aggResults.get(key);for(StringaggKey : results.keySet()) { System.out.println(" aggkey->"+ results.get(aggKey)); } } }}
输出结果:
Key:10 aggkey->3 aggkey->Person [name=Jack,age=10,salary=5000.0] aggkey->[Person [name=Tony,age=10,salary=1400000.0], Person [name=Robin,age=10,salary=500000.0]]Key:15 aggkey->2 aggkey->Person [name=Brandon,age=15,salary=5000.0] aggkey->[Person [name=Braney,age=15,salary=15000.0], Person [name=Brandon,age=15,salary=5000.0]]
根据方法返回值分组
测试根据方法返回值分组:
packageme.lin;importjava.util.ArrayList;importjava.util.Comparator;importjava.util.HashMap;importjava.util.List;importjava.util.Map;publicclass GroupByMethodTest {publicstaticvoidmain(String[] args)throwsException { List persons =newArrayList<>(); persons.add(newPerson("Brandon",15,5000)); persons.add(newPerson("Brandon",15,15000)); persons.add(newPerson("Jack",10,5000)); persons.add(newPerson("Robin",10,500000)); persons.add(newPerson("Tony",10,1400000)); Map> aggregatorMap =newHashMap<>(); aggregatorMap.put("count",newCountAggregator()); aggregatorMap.put("first",newFirstAggregator()); Comparator comparator =newComparator() {publicintcompare(finalPerson o1,finalPerson o2) {doublediff = o1.getSalary() - o2.getSalary();if(diff ==0) {return0; }returndiff >0?-1:1; } }; aggregatorMap.put("top2",newTopNAggregator(comparator,2)); Map> aggResults = GroupUtils.groupByMethod(persons, Person.class,"getNameAndAge", aggregatorMap);for(Objectkey: aggResults.keySet()) { System.out.println("Key:"+key); Map results = aggResults.get(key);for(StringaggKey : results.keySet()) { System.out.println(" "+ aggKey +"->"+ results.get(aggKey)); } } }}
测试结果:
Key:Robin-10 count->1 first->Person [name=Robin,age=10,salary=500000.0] top2->[Person [name=Robin,age=10,salary=500000.0]]Key:Jack-10 count->1 first->Person [name=Jack,age=10,salary=5000.0] top2->[Person [name=Jack,age=10,salary=5000.0]]Key:Tony-10 count->1 first->Person [name=Tony,age=10,salary=1400000.0] top2->[Person [name=Tony,age=10,salary=1400000.0]]Key:Brandon-15 count->2 first->Person [name=Brandon,age=15,salary=5000.0] top2->[Person [name=Brandon,age=15,salary=15000.0], Person [name=Brandon,age=15,salary=5000.0]]
以上就是GroupBy的简单实现,如果问题,欢迎指出。
有兴趣可以加一下854630135这个群去交流一下噢
欢迎交流。