PostgreSQL 10.0 preview 功能增強 - 自由定義統計信息維度
標簽
PostgreSQL , 10.0 , 統計信息 , 自由定義 , 自由分組統計信息
背景
統計信息是數據庫優化器用來計算SQL plan tree中每個NODE成本的依據,目前PostgreSQL的統計信息是開源數據庫中最完備的。
包括繼承表、以及全局的統計信息,每個字段的NULL值個數或百分比,平均字段寬度,唯一值個數或百分比,高頻詞,高頻詞的百分比,柱狀圖,線性相關性,數組字段的高頻元素,高頻元素的百分比,數組元素的柱狀圖。
View "pg_catalog.pg_stats"
Column | Type | Modifiers
------------------------+----------+-----------
schemaname | name |
tablename | name |
attname | name |
inherited | boolean |
null_frac | real |
avg_width | integer |
n_distinct | real |
most_common_vals | anyarray |
most_common_freqs | real[] |
histogram_bounds | anyarray |
correlation | real |
most_common_elems | anyarray |
most_common_elem_freqs | real[] |
elem_count_histogram | real[] |
實際上統計信息已經非常完備了,10.0提出了一個更大膽的統計信息,自定義統計信息。
比如我們將多個字段作為一個分組,生成分組的統計信息。有什麼用呢?
例如select count(distinct a,b) from tbl, select count(*),a,b from tbl group by a,b;
以上SQL,如果有a,b分組的統計信息,有多少唯一值,線性相關性如何。優化器就可以評估group by需要多少內存,結果集大概多大。
是的,現在PostgreSQL 10.0允許你定義字段分組的統計信息了。是不是很開森呢?
你甚至可以用它來做多個字段組合的TOP N的評估呢。
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
新增的係統表
pg_statistic_ext
新增的語法
CREATE STATISTICS
這個patch的討論,詳見郵件組,本文末尾URL。
PostgreSQL社區的作風非常嚴謹,一個patch可能在郵件組中討論幾個月甚至幾年,根據大家的意見反複的修正,patch合並到master已經非常成熟,所以PostgreSQL的穩定性也是遠近聞名的。
參考
最後更新:2017-04-01 17:04:39