PostgreSQL 10.0 preview 性能增強 - (多維分析)更快,更省內存hashed aggregation with grouping sets
標簽
PostgreSQL , 10.0 , hashed aggregation with grouping sets
背景
grouping sets 是多維分析語法,PostgreSQL 從9.5開始支持這種語法,常被用於OLAP係統,數據透視等應用場景。
《PostgreSQL 9.5 new feature - Support GROUPING SETS, CUBE and ROLLUP.》
由於多維分析的一個QUERY涉及多個GROUP,所以如果使用hash agg的話,需要多個HASH table,並行計算. 9.5, 9.6的時候,還不支持一個QUERY使用多個HASH TABLE並行計算。
10.0 擴展了聚合NODE,支持hashAggregate並行開多個hashtable,以及MixedAggregate策略用於sort grouping時哈希表的數據倒騰。
使用時對用戶完全透明,同時優化器在使用hash agg, multi hashtable,時,會盡量的減少重複SORT。
總而言之,grouping set多維分析會更快(即使包含排序),更省內存。
Support hashed aggregation with grouping sets.
This extends the Aggregate node with two new features:
HashAggregate can now run multiple hashtables concurrently,
and a new strategy MixedAggregate populates hashtables while doing sorted grouping.
The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used. No SQL-level changes
are required. There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests). The
enable_hashagg option is respected.
Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
例子
+explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v)
+ from gstest1 group by grouping sets ((a),(b)) order by 3,1,2;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: (GROUPING("*VALUES*".column1, "*VALUES*".column2)), "*VALUES*".column1, "*VALUES*".column2
+ -> HashAggregate
+ Hash Key: "*VALUES*".column1
+ Hash Key: "*VALUES*".column2
+ -> Values Scan on "*VALUES*"
+(6 rows)
這個patch的討論,詳見郵件組,本文末尾URL。
PostgreSQL社區的作風非常嚴謹,一個patch可能在郵件組中討論幾個月甚至幾年,根據大家的意見反複的修正,patch合並到master已經非常成熟,所以PostgreSQL的穩定性也是遠近聞名的。
參考
最後更新:2017-04-01 17:04:39