PostgreSQL 多个数组聚合为一维数组加速(array_a
标签
PostgreSQL , array_agg , arragg
多个数组聚合为一维数组,求PC。业务背景见:
《PostgreSQL APP海量FEED LOG实时质量统计CASE(含percentile_disc)》
由于PostgreSQL内置的聚合函数array_agg支持的数组聚合实际上是将多个数组聚合为多维数组。并不是一维数组。
例如:
postgres=# select array_agg(arr) from (values(array[1,2,3]), (array[4,5,6])) t(arr); array_agg ------------------- {{1,2,3},{4,5,6}} (1row)
而实际上我们要的是一维数组的结果
{1,2,3,4,5,6}
此时需要自定义一个聚合函数
createaggregatearragg (anyarray) (sfunc = array_cat, stype=anyarray,PARALLEL=safe);
效果如下
postgres=# select arragg(arr) from (values(array[1,2,3]), (array[4,5,6])) t(arr); arragg --------------- {1,2,3,4,5,6} (1row)
但是这个新加的聚合用到了array_cat,大量的memcpy导致性能并不好。
聚合100万个元素.
postgres=# explain (analyze,verbose,timing,costs,buffers) select array_agg(array[1,2,3,4,5,6,7,8,9,10]) from generate_series(1,100000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=12.50..12.51rows=1width=32) (actual time=113.134..113.134rows=1loops=1) Output: array_agg('{1,2,3,4,5,6,7,8,9,10}'::integer[]) ->FunctionScanonpg_catalog.generate_series(cost=0.00..10.00rows=1000width=0)(actual time=53.585..66.200rows=100000loops=1)Output:generate_seriesFunctionCall:generate_series(1,100000)Planningtime: 0.064msExecutiontime: 143.075ms(7rows)
2、arragg(use array_cat),耗时108.15秒
postgres=# explain (analyze,verbose,timing,costs,buffers) select arragg(array[1,2,3,4,5,6,7,8,9,10]) from generate_series(1,100000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=12.50..12.51rows=1width=32) (actual time=108081.186..108081.186rows=1loops=1) Output: arragg('{1,2,3,4,5,6,7,8,9,10}'::integer[]) ->FunctionScanonpg_catalog.generate_series(cost=0.00..10.00rows=1000width=0)(actual time=11.121..81.467rows=100000loops=1)Output:generate_seriesFunctionCall:generate_series(1,100000)Planningtime: 0.148msExecutiontime: 108154.846ms