817
阿裏雲
技術社區[雲棲]
PostgreSQL 聚合函數講解 - 7 窗口反聚合
聚合函數的最後一個分類, Hypothetical-Set Aggregate Functions.
這類聚合函數還有對應的窗口函數, 首先來看一下對應窗口函數的用法.
rank() |
bigint |
rank of the current row with gaps; same as row_number of its first peer |
dense_rank() |
bigint |
rank of the current row without gaps; this function counts peer groups |
percent_rank() |
double precision |
relative rank of the current row: (rank - 1) / (total rows - 1) |
cume_dist() |
double precision |
relative rank of the current row: (number of rows preceding or peer with current row) / (total rows) |
the four ranking functions are defined so that they give the same answer for any two peer rows.
rank 返回值在分組內的等級, 如果值有重複的話, 跳級處理.
dense_rank 返回值在分組內的等級, 如果值有重複的話, 不跳級處理.
percent_rank 返回 (rank - 1) / (total rows - 1), rank指當前rank值, rows指當前組的記錄數
cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 即截至當前記錄等級一共有多少行除以本組的總行數.
看一個例子比較明白.
postgres=# select *,row_number() over(partition by info order by id),rank() over(partition by info order by id),dense_rank() over(partition by info order by id),percent_rank() over(partition by info order by id),cume_dist() over(partition by info order by id) from test;
id | info | row_number | rank | dense_rank | percent_rank | cume_dist
------+-------+------------+------+------------+--------------+-------------------
1 | test1 | 1 | 1 | 1 | 0 | 1
2 | test2 | 1 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 2 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 3 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 4 | 1 | 1 | 0 | 0.444444444444444
3 | test2 | 5 | 5 | 2 | 0.5 | 0.555555555555556
4 | test2 | 6 | 6 | 3 | 0.625 | 0.666666666666667
5 | test2 | 7 | 7 | 4 | 0.75 | 0.777777777777778
6 | test2 | 8 | 8 | 5 | 0.875 | 0.888888888888889
7 | test2 | 9 | 9 | 6 | 1 | 1
8 | test3 | 1 | 1 | 1 | 0 | 0.5
100 | test3 | 2 | 2 | 2 | 1 | 1
1000 | test4 | 1 | 1 | 1 | 0 | 1
(13 rows)
算法 :
以info='test2'這個組為例 :
2 | test2 | 1 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 2 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 3 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 4 | 1 | 1 | 0 | 0.444444444444444
id=2 的 rank和dense_rank都是1.
percent_rank 返回 (rank - 1) / (total rows - 1), rank指當前rank值, rows指當前組的記錄數
cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至當前記錄等級一共有多少行除以本組的總行數.
所以
percent_rank = (1-1)/(9-1)=0
cume_dist = (4)/(9) = 0.444444444444444
3 | test2 | 5 | 5 | 2 | 0.5 | 0.555555555555556
rank = 5, 跳級
dense_rank = 2, 不跳級
percent_rank = (5-1)/(9-1)=0.5
cume_dist = (5)/(9) = 0.555555555555556
接下來回到正題, 我們看看這些窗口函數的另一種用法, 聚合用法.
Each of the aggregates listed in Table 9-52 is associated with a window function of the same name defined in Section 9.21. In each case, the aggregate result is the value that the associated window function would have returned for the "hypothetical" row constructed from args, if such a row had been added to the sorted group of rows computed from the sorted_args.
Table 9-52. Hypothetical-Set Aggregate Functions
Function |
Direct Argument Type(s) |
Aggregated Argument Type(s) |
Return Type |
Description |
rank(args) WITHIN GROUP (ORDER BY sorted_args) |
VARIADIC "any" |
VARIADIC "any" |
bigint |
rank of the hypothetical row, with gaps for duplicate rows |
dense_rank(args) WITHIN GROUP (ORDER BY sorted_args) |
VARIADIC "any" |
VARIADIC "any" |
bigint |
rank of the hypothetical row, without gaps |
percent_rank(args) WITHIN GROUP (ORDER BYsorted_args) |
VARIADIC "any" |
VARIADIC "any" |
double precision |
relative rank of the hypothetical row, ranging from 0 to 1 |
cume_dist(args) WITHIN GROUP (ORDER BY sorted_args) |
VARIADIC "any" |
VARIADIC "any" |
double precision |
relative rank of the hypothetical row, ranging from 1/N to 1 |
For each of these hypothetical-set aggregates, the list of direct arguments given in args must match the number and types of the aggregated arguments given in sorted_args. Unlike most built-in aggregates, these aggregates are not strict, that is they do not drop input rows containing nulls. Null values sort according to the rule specified in the ORDER BY clause.
這些用法比較奇特, 其實是要返回給定參數在集合中的位置.
例如 :
如果我們給一個參數值是2.2, 應該排在以上數據中的第三行.
例子 :
postgres=# select * from test order by info,id;
id | info
------+-------
1 | test1
2 | test2
2 | test2
2 | test2
2 | test2
3 | test2
4 | test2
5 | test2
6 | test2
7 | test2
8 | test3
100 | test3
1000 | test4
(13 rows)
postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id) from test group by info;
info | rank | dense_rank
-------+------+------------
test1 | 2 | 2
test2 | 7 | 4
test3 | 1 | 1
test4 | 1 | 1
(4 rows)
4.9在test1這個分組, 排名第2, 並且這個分組隻有1個值, 所以沒有gap.
重點關注test2這個組, 這個組有9個值, 其中有4個重複值2, 所以4.9在這裏排名需要考慮gap.
rank 返回7, 即4.9在這裏考慮GAP排名第7
dense_rank 返回4, 即4.9在這裏不考慮GAP排名第4.
又如 :
postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id) from test group by info;
info | rank | dense_rank
-------+------+------------
test1 | 2 | 2
test2 | 7 | 4
test3 | 1 | 1
test4 | 1 | 1
(4 rows)
postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id) from test group by info;
info | rank | dense_rank
-------+------+------------
test1 | 2 | 2
test2 | 8 | 5
test3 | 1 | 1
test4 | 1 | 1
(4 rows)
最後要看計算0~1代表位置的聚合函數percent_rank和cume_dist.
算法
percent_rank 返回 (rank - 1) / (total rows - 1), rank指當前rank值, rows指當前組的記錄數
cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至當前記錄等級一共有多少行除以本組的總行數.
例子1 :
postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id),percent_rank(4.9) within group (order by id),cume_dist(4.9) within group (order by id) from test group by info;
info | rank | dense_rank | percent_rank | cume_dist
-------+------+------------+-------------------+-------------------
test1 | 2 | 2 | 1 | 1
test2 | 7 | 4 | 0.666666666666667 | 0.7
test3 | 1 | 1 | 0 | 0.333333333333333
test4 | 1 | 1 | 0 | 0.5
(4 rows)
同樣以test2為分組, 講解算法. 把4.9插入到這個分組後. 數據應該變成 :
2 | test2
2 | test2
2 | test2
2 | test2
3 | test2
4 | test2
4.9 | test2 # 計算位置
5 | test2
6 | test2
7 | test2
一共10行.
percent_rank 返回 (rank - 1) / (total rows - 1), rank指當前rank值, rows指當前組的記錄數
cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至當前記錄等級一共有多少行除以本組的總行數.
所以4.9對應的percent_rank 和 cume_dist 分別為 :
percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667
cume_dist = (7)/10 = 0.7
例子2 :
postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id),percent_rank(5) within group (order by id),cume_dist(5) within group (order by id) from test group by info;
info | rank | dense_rank | percent_rank | cume_dist
-------+------+------------+-------------------+-------------------
test1 | 2 | 2 | 1 | 1
test2 | 7 | 4 | 0.666666666666667 | 0.8
test3 | 1 | 1 | 0 | 0.333333333333333
test4 | 1 | 1 | 0 | 0.5
(4 rows)
插入計算值5後, 數據變成
2 | test2
2 | test2
2 | test2
2 | test2
3 | test2
4 | test2
5 | test2 # 計算位置, 即參數值
5 | test2
6 | test2
7 | test2
依舊10行. 但是截至當前記錄等級一共有多少行? 注意是8了.
percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667
cume_dist = (8)/10 = 0.8
例子3 :
postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id),percent_rank(5.1) within group (order by id),cume_dist(5.1) within group (order by id) from test group by info;
info | rank | dense_rank | percent_rank | cume_dist
-------+------+------------+-------------------+-------------------
test1 | 2 | 2 | 1 | 1
test2 | 8 | 5 | 0.777777777777778 | 0.8
test3 | 1 | 1 | 0 | 0.333333333333333
test4 | 1 | 1 | 0 | 0.5
(4 rows)
插入計算值5.1後, 數據變成 :
2 | test2
2 | test2
2 | test2
2 | test2
3 | test2
4 | test2
5 | test2
5.1 | test2 # 計算位置, 即參數值
6 | test2
7 | test2
結果自己驗證吧.
例子4 :
postgres=# select info,rank(5) within group (order by id desc),dense_rank(5) within group (order by id desc),percent_rank(5) within group (order by id desc),cume_dist(5) within group (order by id desc) from test group by info;
info | rank | dense_rank | percent_rank | cume_dist
-------+------+------------+-------------------+-----------
test1 | 1 | 1 | 0 | 0.5
test2 | 3 | 3 | 0.222222222222222 | 0.4
test3 | 3 | 3 | 1 | 1
test4 | 2 | 2 | 1 | 1
(4 rows)
插入計算值5後, 數據變成 :
7 | test2
6 | test2
5 | test2 # 注意, 這才是計算位置, 即插入位置.
5 | test2
4 | test2
3 | test2
2 | test2
2 | test2
2 | test2
2 | test2
結果自己驗證吧.
[參考]
最後更新:2017-04-01 13:38:50