閱讀773 返回首頁    go 阿裏雲 go 技術社區[雲棲]


一天學會PostgreSQL應用開發與管理 - 4 高級SQL用法

背景

本章大綱

1. 聚集函數

常用聚合函數

統計類的聚合函數

分組排序聚合

Hypothetical-Set聚合函數

多維分析

2. 子查詢

3. union\union all\except\intersect

4. 自連接

5. 內連接

優化器如何強製連接順序?

6. 外連接

左外連接

右外連接

全外連接

7. 窗口查詢

第二章:高級SQL用法

1. 聚集函數

https://www.postgresql.org/docs/9.6/static/functions-aggregate.html

常用聚合函數

Function Argument Type(s) Return Type Description
array_agg(expression) any array of the argument type input values, including nulls, concatenated into an array
avg(expression) smallint, int, bigint, real, double precision, numeric, or interval numeric for any integer-type argument, double precision for a floating-point argument, otherwise the same as the argument data type the average (arithmetic mean) of all input values
bit_and(expression) smallint, int, bigint, or bit same as argument data type the bitwise AND of all non-null input values, or null if none
bit_or(expression) smallint, int, bigint, or bit same as argument data type the bitwise OR of all non-null input values, or null if none
bool_and(expression) bool bool true if all input values are true, otherwise false
bool_or(expression) bool bool true if at least one input value is true, otherwise false
count(*) - bigint number of input rows
count(expression) any bigint number of input rows for which the value of expression is not null
every(expression) bool bool equivalent to bool_and
json_agg(expression) any json aggregates values as a JSON array
json_object_agg(name,value) (any, any) json aggregates name/value pairs as a JSON object
max(expression) any array, numeric, string, or date/time type same as argument type maximum value of expression across all input values
min(expression) any array, numeric, string, or date/time type same as argument type minimum value of expression across all input values
string_agg(expression,delimiter) (text, text) or (bytea, bytea) same as argument types input values concatenated into a string, separated by delimiter
sum(expression) smallint, int, bigint, real, double precision, numeric, interval, or money bigint for smallint or int arguments, numeric for bigint arguments, otherwise the same as the argument data type sum of expression across all input values
xmlagg(expression) xml xml concatenation of XML values (see alsoSection 9.14.1.7)

上圖中所有聚合函數, 當沒有行輸入時, 除了count返回0, 其他都返回null.

使用sum, array_agg時, 當沒有行輸入, 返回NULL可能有點別扭, 可以使用coalesce來替代NULL, 如coalesce(sum(x), 0)

coalesce(array_agg(x), '{}'::int[])

例子 :

聚合後得到數組, null將計入數組元素

postgres=# select array_agg(id) from (values(null),(1),(2)) as t(id);    
 array_agg      
------------    
 {NULL,1,2}    
(1 row)    

算平均值時不計算null

postgres=# select avg(id) from (values(null),(1),(2)) as t(id);    
        avg             
--------------------    
 1.5000000000000000    
(1 row)    

算bit與|或 時也不計算NULL

postgres=# select bit_and(id) from (values(null),(1),(2)) as t(id);    
 bit_and     
---------    
       0    
(1 row)    
postgres=# select bit_or(id) from (values(null),(1),(2)) as t(id);    
 bit_or     
--------    
      3    
(1 row)    

算布爾邏輯時也不計算NULL

postgres=# select bool_and(id) from (values(null),(true),(false)) as t(id);    
 bool_and     
----------    
 f    
(1 row)    

every是bool_and的別名, 實際上是SQL標準中定義的.

postgres=# select every(id) from (values(null),(true),(false)) as t(id);    
 every     
-------    
 f    
(1 row)    

SQL標準中還定義了any和some為bool_or的別名, 但是因為any和some還可以被解釋為子查詢, 所以在PostgreSQL中any和some的布爾邏輯聚合不可用.

postgres=# select any(id) from (values(null),(true),(false)) as t(id);    
ERROR:  syntax error at or near "any"    
LINE 1: select any(id) from (values(null),(true),(false)) as t(id);    
               ^    
postgres=# select some(id) from (values(null),(true),(false)) as t(id);    
ERROR:  syntax error at or near "some"    
LINE 1: select some(id) from (values(null),(true),(false)) as t(id);    
               ^    

bool_or的例子

postgres=# select bool_or(id) from (values(null),(true),(false)) as t(id);    
 bool_or     
---------    
 t    
(1 row)    

計算非空的表達式個數, count帶表達式時, 不計算null

postgres=# select count(id) from (values(null),(1),(2)) as t(id);    
 count     
-------    
     2    
(1 row)    

計算表達式(含空值)的個數, count()計算null, 注意count()是一個獨立的聚合函數. 請和count(express)區分開來.

postgres=# select count(*) from (values(null),(1),(2)) as t(id);    
 count     
-------    
     3    
(1 row)    
postgres=# select count(*) from (values(null),(null),(1),(2)) as t(id);    
 count     
-------    
     4    
(1 row)    

聚合後得到json, 不帶key的json聚合

postgres=# select json_agg(id) from (values(null),(true),(false)) as t(id);    
      json_agg           
---------------------    
 [null, true, false]    
(1 row)    

聚合後得到json, 帶key的json聚合, 注意key不能為null, 否則報錯.

postgres=# select json_object_agg(c1,c2) from (values('a',null),('b',true),('c',false)) as t(c1,c2);    
             json_object_agg                 
-----------------------------------------    
 { "a" : null, "b" : true, "c" : false }    
(1 row)    
postgres=# select json_object_agg(c1,c2) from (values(null,null),('b',true),('c',false)) as t(c1,c2);    
ERROR:  22023: field name must not be null    
LOCATION:  json_object_agg_transfn, json.c:1959    

計算最大最小值, max, min都不計算null

postgres=# select max(id) from (values(null),(1),(2)) as t(id);    
 max     
-----    
   2    
(1 row)    
postgres=# select min(id) from (values(null),(1),(2)) as t(id);    
 min     
-----    
   1    
(1 row)    

聚合後得到字符串, 字符串聚合

postgres=# select string_agg(c1,'***') from (values('a',null),('b',true),('c',false)) as t(c1,c2);    
 string_agg     
------------    
 a***b***c    
(1 row)    
postgres=# select string_agg(id,'***') from (values(null),('digoal'),('zhou')) as t(id);    
  string_agg       
---------------    
 digoal***zhou    
(1 row)    

計算總和, sum不計算null, 當所有行都是null時, 即沒有任何行輸入, 返回null.

postgres=# select sum(id) from (values(null),(1),(2)) as t(id);    
 sum     
-----    
   3    
(1 row)    
postgres=# select sum(id::int) from (values(null),(null),(null)) as t(id);    
 sum     
-----    
        
(1 row)    
(1 row)    

聚合後得到xml

postgres=# select xmlagg(id::xml) from (values(null),('<foo>digoal</foo>'),('<bar/>')) as t(id);    
         xmlagg              
-------------------------    
 <foo>digoal</foo><bar/>    
(1 row)    

某些聚合函數得到的結果可能和行的輸入順序有關, 例如array_agg, json_agg, json_object_agg, string_agg, and xmlagg, 以及某些自定義聚合函數. 如何來實現呢?

支持聚合函數中使用order by的PostgreSQL版本可以用如下語法 :

postgres=# select string_agg(id,'***' order by id) from (values(null),('digoal'),('zhou')) as t(id);    
  string_agg       
---------------    
 digoal***zhou    
(1 row)    
postgres=# select string_agg(id,'***' order by id desc) from (values(null),('digoal'),('zhou')) as t(id);    
  string_agg       
---------------    
 zhou***digoal    
(1 row)    

統計類的聚合函數

Function Argument Type Return Type Description
corr(Y, X) double precision double precision correlation coefficient
covar_pop(Y, X) double precision double precision population covariance
covar_samp(Y, X) double precision double precision sample covariance
regr_avgx(Y, X) double precision double precision average of the independent variable (sum(X)/N)
regr_avgy(Y, X) double precision double precision average of the dependent variable (sum(Y)/N)
regr_count(Y, X) double precision bigint number of input rows in which both expressions are nonnull
regr_intercept(Y, X) double precision double precision y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs
regr_r2(Y, X) double precision double precision square of the correlation coefficient
regr_slope(Y, X) double precision double precision slope of the least-squares-fit linear equation determined by the (X, Y) pairs
regr_sxx(Y, X) double precision double precision sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable)
regr_sxy(Y, X) double precision double precision sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable)
regr_syy(Y, X) double precision double precision sum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable)
stddev(expression) smallint, int, bigint, real, double precision, or numeric double precision for floating-point arguments, otherwise numeric historical alias for stddev_samp
stddev_pop(expression) smallint, int, bigint, real, double precision, or numeric double precision for floating-point arguments, otherwise numeric population standard deviation of the input values
stddev_samp(expression) smallint, int, bigint, real, double precision, or numeric double precision for floating-point arguments, otherwise numeric sample standard deviation of the input values
variance(expression) smallint, int, bigint, real, double precision, or numeric double precision for floating-point arguments, otherwise numeric historical alias for var_samp
var_pop(expression) smallint, int, bigint, real, double precision, or numeric double precision for floating-point arguments, otherwise numeric population variance of the input values (square of the population standard deviation)
var_samp(expression) smallint, int, bigint, real, double precision, or numeric double precision for floating-point arguments, otherwise numeric sample variance of the input values (square of the sample standard deviation)

相關性統計 :

corr, regr_r2

總體|樣本 方差, 標準方差 :

variance, var_pop, var_samp

stddev, stddev_pop, stddev_samp

總體協方差, 樣本協方差 :

covar_pop, covar_samp

線性回歸 :

regr_avgx, regr_avgy, regr_count, regr_intercept(截距), regr_r2(相關度corr的平方), regr_slope(斜率), regr_sxx, regr_sxy, regr_syy.

分組排序聚合

Function Direct Argument Type(s) Aggregated Argument Type(s) Return Type Description
mode() WITHIN GROUP (ORDER BYsort_expression) - any sortable type same as sort expression returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression) double precision double precisionor interval same as sort expression continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed
percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression) double precision[] double precisionor interval array of sort expression's type multiple continuous percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the value corresponding to that percentile
percentile_disc(fraction) WITHIN GROUP (ORDER BY sort_expression) double precision any sortable type same as sort expression discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction
percentile_disc(fractions) WITHIN GROUP (ORDER BY sort_expression) double precision[] any sortable type array of sort expression's type multiple discrete percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the input value corresponding to that percentile

mode比較好理解, 就是取分組中出現頻率最高的值或表達式, 如果最高頻率的值有多個, 則隨機取一個.

postgres=# create table test(id int, info text);    
CREATE TABLE    
postgres=# insert into test values (1,'test1');    
INSERT 0 1    
postgres=# insert into test values (1,'test1');    
INSERT 0 1    
postgres=# insert into test values (1,'test2');    
INSERT 0 1    
postgres=# insert into test values (1,'test3');    
INSERT 0 1    
postgres=# insert into test values (2,'test1');    
INSERT 0 1    
postgres=# insert into test values (2,'test1');    
INSERT 0 1    
postgres=# insert into test values (2,'test1');    
INSERT 0 1    
postgres=# insert into test values (3,'test4');    
INSERT 0 1    
postgres=# insert into test values (3,'test4');    
INSERT 0 1    
postgres=# insert into test values (3,'test4');    
INSERT 0 1    
postgres=# insert into test values (3,'test4');    
INSERT 0 1    
postgres=# insert into test values (3,'test4');    
INSERT 0 1    
postgres=# select * from test;    
 id | info      
----+-------    
  1 | test1    
  1 | test1    
  1 | test2    
  1 | test3    
  2 | test1    
  2 | test1    
  2 | test1    
  3 | test4    
  3 | test4    
  3 | test4    
  3 | test4    
  3 | test4    
(12 rows)    

取出所有數據中, 出現頻率最高的info, 有可能是test1也有可能是test4, 因為他們的出現頻率一致.

mode的返回結果數據類型和order by後麵的表達式一致.

postgres=# select mode() within group (order by info) from test;    
 mode      
-------    
 test1    
(1 row)    

如果按INFO來分組的話, 取出出現頻率最高的info, 實際上這個操作是沒有任何意義的, 返回值就是所有記錄的info的唯一值.

postgres=# select mode() within group (order by info) from test group by info;    
 mode      
-------    
 test1    
 test2    
 test3    
 test4    
(4 rows)    

按id來分組, 取出組內出現頻率最高的info值, 這個是有意義的.

postgres=# select mode() within group (order by info) from test group by id;    
 mode      
-------    
 test1    
 test1    
 test4    
(3 rows)    

id=1 , 出現頻率最高的info是test1. 出現2次.

如下 :

postgres=# select id,info,count(*) from test group by id,info;    
 id | info  | count     
----+-------+-------    
  1 | test1 |     2    
  1 | test3 |     1    
  3 | test4 |     5    
  1 | test2 |     1    
  2 | test1 |     3    
(5 rows)    

如果要返回mode()並返回頻率次數. 可以使用row_number()窗口來實現. 如下.

postgres=# select id,info,cnt from (select id,info,cnt,row_number() over(partition by id order by cnt desc) as rn from (select id,info,count(*) cnt from test group by id,info) t) t where t.rn=1;    
 id | info  | cnt     
----+-------+-----    
  1 | test1 |   2    
  2 | test1 |   3    
  3 | test4 |   5    
(3 rows)    

其他, mode的返回結果數據類型和order by後麵的表達式一致.

postgres=# select mode() within group (order by id) from test;    
 mode     
------    
    3    
(1 row)    
postgres=# select mode() within group (order by id+1) from test;    
 mode     
------    
    4    
(1 row)    

另外還有4個函數是和數據分布有關的, 需要指定從0到1的分布位置. 返回排序後, 在指定分布位置的值或表達式的值.

src/backend/utils/adt/orderedsetaggs.c    
    
    
        if (percentile < 0 || percentile > 1 || isnan(percentile))    
                ereport(ERROR,    
                                (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),    
                                 errmsg("percentile value %g is not between 0 and 1",    
                                                percentile)));    

同時還需要注意區分連續分布和離散分布.

postgres=# create table test(id int, info text);    
CREATE TABLE    
postgres=# insert into test values (1,'test1');    
INSERT 0 1    
postgres=# insert into test values (2,'test2');    
INSERT 0 1    
postgres=# insert into test values (3,'test2');    
INSERT 0 1    
postgres=# insert into test values (4,'test2');    
INSERT 0 1    
postgres=# insert into test values (5,'test2');    
INSERT 0 1    
postgres=# insert into test values (6,'test2');    
INSERT 0 1    
postgres=# insert into test values (7,'test2');    
INSERT 0 1    
postgres=# insert into test values (8,'test3');    
INSERT 0 1    
postgres=# insert into test values (100,'test3');    
INSERT 0 1    
postgres=# insert into test values (1000,'test4');    
INSERT 0 1    
postgres=# select * from test;    
  id  | info      
------+-------    
    1 | test1    
    2 | test2    
    3 | test2    
    4 | test2    
    5 | test2    
    6 | test2    
    7 | test2    
    8 | test3    
  100 | test3    
 1000 | test4    
(10 rows)    

取連續分布的中位數可以用percentile_cont(0.5)來獲得.

postgres=# select percentile_cont(0.5) within group (order by id) from test;    
 percentile_cont     
-----------------    
             5.5    
(1 row)    

這個5.5是怎麼計算來的呢? 參考本文末尾 :

  If (CRN = FRN = RN) then the result is    
    (value of expression from row at RN)    
  Otherwise the result is    
    (CRN - RN) * (value of expression for row at FRN) +    
    (RN - FRN) * (value of expression for row at CRN)    

解釋 :

N = 當前分組的行數 = 10

RN = (1+傳入參數*(N-1)) = (1+0.5*(10-1)) = 5.5

CRN = ceiling(RN) = 6

FRN = floor(RN) = 5

value of expression for row at FRN : 當前分組內第FRN行的值 = 5

value of expression for row at CRN : 當前分組內第CRN行的值 = 6

所以最終中位數值 :

    (CRN - RN) * (value of expression for row at FRN) +    
    (RN - FRN) * (value of expression for row at CRN) =     
(6-5.5)*(5) + (5.5 - 5)*(6) = 5.5;    

使用info分組 :

postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;    
 percentile_cont | info      
-----------------+-------    
               1 | test1    
             4.5 | test2    
              54 | test3    
            1000 | test4    
(4 rows)    

驗證這個值 4.5 | test2 :

    2 | test2    
    3 | test2    
    4 | test2    
    5 | test2    
    6 | test2    
    7 | test2    

N = 當前分組的行數 = 6

RN = (1+傳入參數*(N-1)) = (1+0.5*(6-1)) = 3.5

CRN = ceiling(RN) = 4

FRN = floor(RN) = 3

value of expression for row at FRN : 當前分組內第FRN行的值 = 4

value of expression for row at CRN : 當前分組內第CRN行的值 = 5

所以最終中位數值 :

    (CRN - RN) * (value of expression for row at FRN) +    
    (RN - FRN) * (value of expression for row at CRN) =     
(4-3.5)*(4) + (3.5 - 3)*(5) = 4.5;    

當輸入參數為數組時, 返回值也是數組, 如下 :

postgres=# select percentile_cont(array[0.5, 1]) within group (order by id) from test;    
 percentile_cont     
-----------------    
 {5.5,1000}    
(1 row)    

接下來看一下稀疏分布 :

返回行號大於等於指定百分比的值或表達式值.

例如 :

postgres=# select id from test;    
  id      
------    
    1    
    2    
    3    
    4    
    5    
    6    
    7    
    8    
  100    
 1000    
(10 rows)    

當前組一共10行, 取位置在0.5的.即行號>=0.5*10的第一行的值或表達式的值.

postgres=# select percentile_disc(0.5) within group (order by id) from test;    
 percentile_disc     
-----------------    
               5    
(1 row)    
postgres=# select percentile_disc(0.5) within group (order by id^2) from test;    
 percentile_disc     
-----------------    
              25    
(1 row)    

輸入0.11, 表示行號返回>=1.1的第一行的值.

postgres=# select percentile_disc(0.11) within group (order by id) from test;    
 percentile_disc     
-----------------    
               2    
(1 row)    

再看個例子

postgres=# select id,info,count(*) over (partition by info) from test;    
  id  | info  | count     
------+-------+-------    
    1 | test1 |     1    
    2 | test2 |     6    
    3 | test2 |     6    
    4 | test2 |     6    
    5 | test2 |     6    
    6 | test2 |     6    
    7 | test2 |     6    
    8 | test3 |     2    
  100 | test3 |     2    
 1000 | test4 |     1    
(10 rows)    

取分組的數據, 主要看test2 這個組一共有6行, 0.3*6=1.8, 所以它需要取第二行的數據.

postgres=# select info,percentile_disc(0.3) within group (order by id) from test group by info;    
 info  | percentile_disc     
-------+-----------------    
 test1 |               1    
 test2 |               3    
 test3 |               8    
 test4 |            1000    
(4 rows)    

注意

最終計算的是表達式的分布數, 而不是計算列值的分布數後再計算表達式.

驗證如下 :

或者你可以看代碼 :

postgres=# select percentile_cont(0.5) within group (order by id^2),info from test group by info;    
 percentile_cont | info      
-----------------+-------    
               1 | test1    
            20.5 | test2    
            5032 | test3    
         1000000 | test4    
(4 rows)    
    
postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;    
 percentile_cont | info      
-----------------+-------    
               1 | test1    
             4.5 | test2    
              54 | test3    
            1000 | test4    
(4 rows)    
    
postgres=# select 4.5^2;    
      ?column?           
---------------------    
 20.2500000000000000    
(1 row)    
    
postgres=# select 54^2;    
 ?column?     
----------    
     2916    
(1 row)    

Hypothetical-Set聚合函數

函數 返回類型 描述
rank() bigint rank of the current row with gaps; same as row_number of its first peer
dense_rank() bigint rank of the current row without gaps; this function counts peer groups
percent_rank() double precision relative rank of the current row: (rank - 1) / (total rows - 1)
cume_dist() double precision relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)

the four ranking functions are defined so that they give the same answer for any two peer rows.

rank 返回值在分組內的等級, 如果值有重複的話, 跳級處理.

dense_rank 返回值在分組內的等級, 如果值有重複的話, 不跳級處理.

percent_rank 返回 (rank - 1) / (total rows - 1), rank指當前rank值, rows指當前組的記錄數

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 即截至當前記錄等級一共有多少行除以本組的總行數.

看一個例子比較明白.

postgres=# select *,row_number() over(partition by info order by id),rank() over(partition by info order by id),dense_rank() over(partition by info order by id),percent_rank() over(partition by info order by id),cume_dist() over(partition by info order by id) from test;    
  id  | info  | row_number | rank | dense_rank | percent_rank |     cume_dist         
------+-------+------------+------+------------+--------------+-------------------    
    1 | test1 |          1 |    1 |          1 |            0 |                 1    
    2 | test2 |          1 |    1 |          1 |            0 | 0.444444444444444    
    2 | test2 |          2 |    1 |          1 |            0 | 0.444444444444444    
    2 | test2 |          3 |    1 |          1 |            0 | 0.444444444444444    
    2 | test2 |          4 |    1 |          1 |            0 | 0.444444444444444    
    3 | test2 |          5 |    5 |          2 |          0.5 | 0.555555555555556    
    4 | test2 |          6 |    6 |          3 |        0.625 | 0.666666666666667    
    5 | test2 |          7 |    7 |          4 |         0.75 | 0.777777777777778    
    6 | test2 |          8 |    8 |          5 |        0.875 | 0.888888888888889    
    7 | test2 |          9 |    9 |          6 |            1 |                 1    
    8 | test3 |          1 |    1 |          1 |            0 |               0.5    
  100 | test3 |          2 |    2 |          2 |            1 |                 1    
 1000 | test4 |          1 |    1 |          1 |            0 |                 1    
(13 rows)    

算法 :

以info='test2'這個組為例 :

    2 | test2 |          1 |    1 |          1 |            0 | 0.444444444444444    
    2 | test2 |          2 |    1 |          1 |            0 | 0.444444444444444    
    2 | test2 |          3 |    1 |          1 |            0 | 0.444444444444444    
    2 | test2 |          4 |    1 |          1 |            0 | 0.444444444444444    

id=2 的 rank和dense_rank都是1.

percent_rank 返回 (rank - 1) / (total rows - 1), rank指當前rank值, rows指當前組的記錄數

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至當前記錄等級一共有多少行除以本組的總行數.

所以

percent_rank = (1-1)/(9-1)=0    
cume_dist = (4)/(9) = 0.444444444444444    
    3 | test2 |          5 |    5 |          2 |          0.5 | 0.555555555555556    

rank = 5, 跳級

dense_rank = 2, 不跳級

percent_rank = (5-1)/(9-1)=0.5    
cume_dist = (5)/(9) = 0.555555555555556    

這些窗口函數的另一種用法, 聚合用法.

Function Direct Argument Type(s) Aggregated Argument Type(s) Return Type Description
rank(args) WITHIN GROUP (ORDER BY sorted_args) VARIADIC "any" VARIADIC "any" bigint rank of the hypothetical row, with gaps for duplicate rows
dense_rank(args) WITHIN GROUP (ORDER BY sorted_args) VARIADIC "any" VARIADIC "any" bigint rank of the hypothetical row, without gaps
percent_rank(args) WITHIN GROUP (ORDER BYsorted_args) VARIADIC "any" VARIADIC "any" double precision relative rank of the hypothetical row, ranging from 0 to 1
cume_dist(args) WITHIN GROUP (ORDER BY sorted_args) VARIADIC "any" VARIADIC "any" double precision relative rank of the hypothetical row, ranging from 1/N to 1

這些用法比較奇特, 其實是要返回給定參數在集合中的位置.

例如 :

1    
2    
3    
4    
5    

如果我們給一個參數值是2.2, 應該排在以上數據中的第三行.

例子 :

postgres=# select * from test order by info,id;    
  id  | info      
------+-------    
    1 | test1    
    2 | test2    
    2 | test2    
    2 | test2    
    2 | test2    
    3 | test2    
    4 | test2    
    5 | test2    
    6 | test2    
    7 | test2    
    8 | test3    
  100 | test3    
 1000 | test4    
(13 rows)    
    
postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id) from test group by info;    
 info  | rank | dense_rank     
-------+------+------------    
 test1 |    2 |          2    
 test2 |    7 |          4    
 test3 |    1 |          1    
 test4 |    1 |          1    
(4 rows)    

4.9在test1這個分組, 排名第2, 並且這個分組隻有1個值, 所以沒有gap.

重點關注test2這個組, 這個組有9個值, 其中有4個重複值2, 所以4.9在這裏排名需要考慮gap.

rank 返回7, 即4.9在這裏考慮GAP排名第7

dense_rank 返回4, 即4.9在這裏不考慮GAP排名第4.

又如 :

postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id) from test group by info;    
 info  | rank | dense_rank     
-------+------+------------    
 test1 |    2 |          2    
 test2 |    7 |          4    
 test3 |    1 |          1    
 test4 |    1 |          1    
(4 rows)    
postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id) from test group by info;    
 info  | rank | dense_rank     
-------+------+------------    
 test1 |    2 |          2    
 test2 |    8 |          5    
 test3 |    1 |          1    
 test4 |    1 |          1    
(4 rows)    

最後要看計算0~1代表位置的聚合函數percent_rank和cume_dist.

算法

percent_rank 返回 (rank - 1) / (total rows - 1), rank指當前rank值, rows指當前組的記錄數

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至當前記錄等級一共有多少行除以本組的總行數.

例子1 :

postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id),percent_rank(4.9) within group (order by id),cume_dist(4.9) within group (order by id) from test group by info;    
 info  | rank | dense_rank |   percent_rank    |     cume_dist         
-------+------+------------+-------------------+-------------------    
 test1 |    2 |          2 |                 1 |                 1    
 test2 |    7 |          4 | 0.666666666666667 |               0.7    
 test3 |    1 |          1 |                 0 | 0.333333333333333    
 test4 |    1 |          1 |                 0 |               0.5    
(4 rows)    

同樣以test2為分組, 講解算法. 把4.9插入到這個分組後. 數據應該變成 :

    2 | test2    
    2 | test2    
    2 | test2    
    2 | test2    
    3 | test2    
    4 | test2    
    4.9 | test2  # 計算位置    
    5 | test2    
    6 | test2    
    7 | test2    

一共10行.

percent_rank 返回 (rank - 1) / (total rows - 1), rank指當前rank值, rows指當前組的記錄數

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至當前記錄等級一共有多少行除以本組的總行數.

所以4.9對應的percent_rank 和 cume_dist 分別為 :

percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667     
cume_dist = (7)/10 = 0.7    

例子2 :

postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id),percent_rank(5) within group (order by id),cume_dist(5) within group (order by id) from test group by info;    
 info  | rank | dense_rank |   percent_rank    |     cume_dist         
-------+------+------------+-------------------+-------------------    
 test1 |    2 |          2 |                 1 |                 1    
 test2 |    7 |          4 | 0.666666666666667 |               0.8    
 test3 |    1 |          1 |                 0 | 0.333333333333333    
 test4 |    1 |          1 |                 0 |               0.5    
(4 rows)    

插入計算值5後, 數據變成

    2 | test2    
    2 | test2    
    2 | test2    
    2 | test2    
    3 | test2    
    4 | test2    
    5 | test2  # 計算位置, 即參數值    
    5 | test2    
    6 | test2    
    7 | test2    

依舊10行. 但是截至當前記錄等級一共有多少行? 注意是8了.

percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667     
cume_dist = (8)/10 = 0.8    

例子3 :

postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id),percent_rank(5.1) within group (order by id),cume_dist(5.1) within group (order by id) from test group by info;    
 info  | rank | dense_rank |   percent_rank    |     cume_dist         
-------+------+------------+-------------------+-------------------    
 test1 |    2 |          2 |                 1 |                 1    
 test2 |    8 |          5 | 0.777777777777778 |               0.8    
 test3 |    1 |          1 |                 0 | 0.333333333333333    
 test4 |    1 |          1 |                 0 |               0.5    
(4 rows)    

插入計算值5.1後, 數據變成 :

    2 | test2    
    2 | test2    
    2 | test2    
    2 | test2    
    3 | test2    
    4 | test2    
    5 | test2    
    5.1 | test2  # 計算位置, 即參數值    
    6 | test2    
    7 | test2    

例子4 :

postgres=# select info,rank(5) within group (order by id desc),dense_rank(5) within group (order by id desc),percent_rank(5) within group (order by id desc),cume_dist(5) within group (order by id desc) from test group by info;    
 info  | rank | dense_rank |   percent_rank    | cume_dist     
-------+------+------------+-------------------+-----------    
 test1 |    1 |          1 |                 0 |       0.5    
 test2 |    3 |          3 | 0.222222222222222 |       0.4    
 test3 |    3 |          3 |                 1 |         1    
 test4 |    2 |          2 |                 1 |         1    
(4 rows)    

插入計算值5後, 數據變成 :

    7 | test2    
    6 | test2    
    5 | test2  # 注意, 這才是計算位置, 即插入位置.    
    5 | test2    
    4 | test2    
    3 | test2    
    2 | test2    
    2 | test2    
    2 | test2    
    2 | test2    

多維分析

=> SELECT * FROM items_sold;    
 brand | size | sales    
-------+------+-------    
 Foo   | L    |  10    
 Foo   | M    |  20    
 Bar   | M    |  15    
 Bar   | L    |  5    
(4 rows)    
    
=> SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());    
 brand | size | sum    
-------+------+-----    
 Foo   |      |  30    
 Bar   |      |  20    
       | L    |  15    
       | M    |  35    
       |      |  50    
(5 rows)    

其中GROUP BY GROUPING SETS ((brand), (size), ());

相當於以下三個group by的union all(未在分組的列以NULL代替。):

group by brand    
group by size    
group by ()    

分組集合除了可以用GROUPING SETS來指定,另外還提供了兩個特殊的寫法rollup和cube.

ROLLUP ( e1, e2, e3, ... )    

代表遞減分組,一般用於異構結構的分組如國家,省份,城市,鄉鎮這樣的結構查詢。

逐級分組匯總結果,它相當於如下寫法:

GROUPING SETS (    
    ( e1, e2, e3, ... ),    
    ...    
    ( e1, e2 )    
    ( e1 )    
    ( )  -- 注意包含全集    
)    

還有一種寫法是CUBE

CUBE ( a, b, c )    

cube是任意組合,相當於:

GROUPING SETS (    
    ( a, b, c ),    
    ( a, b    ),    
    ( a,    c ),    
    ( a       ),    
    (    b, c ),    
    (    b    ),    
    (       c ),    
    (         )    -- 注意包含全集    
)    

在cube和rollup中使用括號可以將多個表達式作為單個表達式來處理:

ROLLUP ( a, (b,c), d )    

遞減,相當於

GROUPING SETS (    
    ( a, b, c, d ),    
    ( a, b, c    ),    
    ( a          ),    
    (            )    
)    
CUBE ( (a,b), (c,d) )    

相當於:

GROUPING SETS (    
    ( a, b, c, d ),    
    ( a, b       ),    
    (       c, d ),    
    (            )    
)    

同時cube,rollup,grouping sets還可以混合使用:

GROUP BY a, CUBE(b,c), GROUPING SETS ((d), (e))    

相當於:

GROUP BY GROUPING SETS (    
  (a,b,c,d), (a,b,c,e),    
  (a,b,d),   (a,b,e),    
  (a,c,d),   (a,c,e),    
  (a,d),     (a,e)    
)    

既然分組聚合相當於多個group by 的union all,那麼我們如何區分當前輸出的記錄是哪個分組group by的輸出呢?

用grouping(cols)可以表示未參與聚合的表達式的比特位,並轉換為INT輸出。

例如:

=> SELECT * FROM items_sold;    
 make  | model | sales    
-------+-------+-------    
 Foo   | GT    |  10    
 Foo   | Tour  |  20    
 Bar   | City  |  15    
 Bar   | Sport |  5    
(4 rows)    

grouping()中必須包含group by後麵的任意或所有列。

=> SELECT make, model, GROUPING(make,model), sum(sales) FROM items_sold GROUP BY ROLLUP(make,model);    
 make  | model | grouping | sum    
-------+-------+----------+-----    
 Foo   | GT    |        0 | 10    
 Foo   | Tour  |        0 | 20    
 Bar   | City  |        0 | 15    
 Bar   | Sport |        0 | 5    
 Foo   |       |        1 | 30    
 Bar   |       |        1 | 20    
       |       |        3 | 50    
(7 rows)    

grouping()中必須包含group by後麵的任意或所有列,不能包含未參與聚合的列。

grouping()中的每個表達式用1個比特位表示。

postgres=# create table tbl(c1 int,c2 int,c3 int, c4 int, c5 int);    
CREATE TABLE    
postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
    |     0 |        7   -- b'111'::int = 7    
(1 row)    
    
postgres=# select c1,count(*),grouping(c1,c2) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
    |     0 |        3  b'11'::int = 3    
(1 row)    
    
postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
    |     0 |        1  b'1'::int = 1    
(1 row)    
    
postgres=# insert into tbl values (1,2,3,4,5);    
INSERT 0 1    
postgres=# insert into tbl values (1,2,3,4,6);    
INSERT 0 1    
postgres=# insert into tbl values (2,3,4,5,6);    
INSERT 0 1    
postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
  1 |     2 |        0    
  1 |     2 |        0    
  1 |     2 |        1    
  2 |     1 |        0    
  2 |     1 |        0    
  2 |     1 |        1    
    |     3 |        1    
  1 |     2 |        1    
    |     2 |        1    
  2 |     1 |        1    
    |     1 |        1    
    |     2 |        0    
    |     2 |        0    
    |     1 |        0    
    |     1 |        0    
(15 rows)    

grouping中包含未參與聚合的列將報錯:

postgres=# select c1,count(*),grouping(c4) from tbl group by cube(c1,c2,c3);    
ERROR:  arguments to GROUPING must be grouping expressions of the associated query level    
LINE 1: select c1,count(*),grouping(c4) from tbl group by cube(c1,c2...    
                                    ^    
postgres=# select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by cube(c1,c2,c3);    
ERROR:  arguments to GROUPING must be grouping expressions of the associated query level    
LINE 1: select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by c...    
                                             ^    
postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
  1 |     2 |        0    
  1 |     2 |        1    
  1 |     2 |        3    
  2 |     1 |        0    
  2 |     1 |        1    
  2 |     1 |        3    
    |     3 |        7    
  1 |     2 |        2    
    |     2 |        6    
  2 |     1 |        2    
    |     1 |        6    
    |     2 |        4    
    |     2 |        5    
    |     1 |        4    
    |     1 |        5    
(15 rows)    

2. 子查詢

select子查詢隻能返回一列

postgres=# select (select * from (values (1,2),(2,3)) as t(c1,c2)) , relname, relkind from pg_class;    
ERROR:  subquery must return only one column    
LINE 1: select (select * from (values (1,2),(2,3)) as t(c1,c2)) , re...    
               ^    

select子查詢隻能返回一條記錄

postgres=# select (select * from (values (1),(2)) as t(c1)) , relname, relkind from pg_class;    
ERROR:  more than one row returned by a subquery used as an expression    

子查詢可以用在select子句也可以用在源中,還可以用在with, update from語句

postgres=# select (select * from (values (1),(2)) as t(c1) limit 1) , relname, relkind from pg_class;    
 c1 |                    relname                    | relkind     
----+-----------------------------------------------+---------    
  1 | pg_type                                       | r    
  1 | pg_toast_187550                               | t    
  1 | new_type                                      | c    
  1 | pg_toast_187550_index                         | i    
  1 | test                                          | r    
  1 | pg_toast_187556                               | t    
    
postgres=# select t.relname from (select * from pg_class limit 1) t , pg_class where t.relname=pg_class.relname;    
 relname     
---------    
 pg_type    
(1 row)    

3. union\union all\except\intersect

兩條或者多條查詢結果的合並、去重合並、相交、求差。

union all(不去重)

postgres=# select * from (values (1),(1)) t(id) union all select * from (values (2),(2)) t(id);  
 id   
----  
  1  
  1  
  2  
  2  
(4 rows)  

union(去重)

postgres=# select * from (values (1),(1)) t(id) union select * from (values (2),(2)) t(id);  
 id   
----  
  1  
  2  
(2 rows)  

except(去重)

postgres=# select * from (values (1),(1),(2)) t(id) except select * from (values (2),(2)) t(id);  
 id   
----  
  1  
(1 row)  

intersect(去重)

postgres=# select * from (values (1),(1),(2)) t(id) intersect select * from (values (2),(2)) t(id);  
 id   
----  
  2  
(1 row)  
  

4. 自連接

當某個表或者子句的A字段要和B字段進行關聯時,可以使用自關聯。

postgres=# create table tbl8(c1 int, c2 int, info text);  
CREATE TABLE  
postgres=# insert into tbl8 select generate_series(1,100), generate_series(2,101),md5(random()::text) ;  
INSERT 0 100  
  
postgres=# select t1.* from tbl8 t1, tbl8 t2 where t1.c1=t2.c2 and t2.c2<10 limit 10;  
 c1 | c2 |               info                 
----+----+----------------------------------  
  2 |  3 | b54fbfd843c2343330ceaa1758882ee4  
  3 |  4 | df999680d87435f2d2d8d5604aca2f1e  
  4 |  5 | 497ad2cfa8acd1f7062bb1bd7aa7a646  
  5 |  6 | ef55bcdc85dfa8d54978e4c2085ec55a  
  6 |  7 | 45f60117d6d577389707b22823a513e5  
  7 |  8 | ebde16ccaeced9a400a1608d591a2bf0  
  8 |  9 | fdeb1505dd02aca33abed20531592302  
  9 | 10 | 010d07f3b9b05b2c36b46440b1dd92aa  
(8 rows)  

5. 內連接

內連接,僅僅輸出符合連接條件的記錄。

digoal=# create table tbl_join_1(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_2(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_3(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_4(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_5(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_6(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_7(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_8(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_9(id int primary key,info text);  
CREATE TABLE  
  
digoal=# insert into tbl_join_1 select generate_series(1,10),md5(random()::text);  
digoal=# insert into tbl_join_2 select generate_series(1,100),md5(random()::text);  
digoal=# insert into tbl_join_3 select generate_series(1,1000),md5(random()::text);  
digoal=# insert into tbl_join_4 select generate_series(1,10000),md5(random()::text);  
digoal=# insert into tbl_join_5 select generate_series(1,100000),md5(random()::text);  
digoal=# insert into tbl_join_6 select generate_series(1,1000000),md5(random()::text);  
digoal=# insert into tbl_join_7 select generate_series(1,2000000),md5(random()::text);  
digoal=# insert into tbl_join_8 select generate_series(1,3000000),md5(random()::text);  
digoal=# insert into tbl_join_9 select generate_series(1,4000000),md5(random()::text);  

非顯示JOIN

postgres=# select t1.* from tbl_join_1 t1, tbl_join_2 t2 where t1.id=t2.id and t2.id=1;  
 id |               info                 
----+----------------------------------  
  1 | 5b6049ec7b94d17c0bd92dc4da436311  
(1 row)  

顯示JOIN

postgres=# select t1.* from tbl_join_1 t1 join tbl_join_2 t2 on(t1.id=t2.id and t2.id=1);  
 id |               info                 
----+----------------------------------  
  1 | 5b6049ec7b94d17c0bd92dc4da436311  
(1 row)  

優化器如何強製連接順序?

當join_collapse_limit=1時, 顯示JOIN的關聯按SQL寫法進行關聯. 例如 :

以下寫法非顯示關聯,無法強製JOIN順序

digoal=# set join_collapse_limit=1;  
  
digoal=# explain select t1.info, t5.info from tbl_join_1 t1,  
tbl_join_2 t2,  
tbl_join_3 t3,  
tbl_join_4 t4,  
tbl_join_5 t5,  
tbl_join_6 t6,  
tbl_join_7 t7,  
tbl_join_8 t8,  
tbl_join_9 t9  
where   
t1.id=t2.id and  
t2.id=t3.id and  
t3.id=t4.id and  
t4.id=t5.id and  
t5.id=t6.id and  
t6.id=t7.id and  
t7.id=t8.id and  
t8.id=t9.id and  
t9.id=10000;  
  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Nested Loop  (cost=2.72..25.99 rows=1 width=65)  
   ->  Nested Loop  (cost=2.29..22.93 rows=1 width=69)  
         ->  Nested Loop  (cost=1.86..19.87 rows=1 width=69)  
               ->  Nested Loop  (cost=1.43..16.82 rows=1 width=69)  
                     ->  Nested Loop  (cost=1.00..13.76 rows=1 width=69)  
                           ->  Nested Loop  (cost=0.71..10.84 rows=1 width=36)  
                                 ->  Nested Loop  (cost=0.43..7.93 rows=1 width=36)  
                                       ->  Nested Loop  (cost=0.15..5.03 rows=1 width=36)  
                                             ->  Index Scan using tbl_join_1_pkey on tbl_join_1 t1  (cost=0.15..2.77 rows=1 width=36)  
                                                   Index Cond: (id = 10000)  
                                             ->  Seq Scan on tbl_join_2 t2  (cost=0.00..2.25 rows=1 width=4)  
                                                   Filter: (id = 10000)  
                                       ->  Index Only Scan using tbl_join_3_pkey on tbl_join_3 t3  (cost=0.28..2.89 rows=1 width=4)  
                                             Index Cond: (id = 10000)  
                                 ->  Index Only Scan using tbl_join_4_pkey on tbl_join_4 t4  (cost=0.29..2.90 rows=1 width=4)  
                                       Index Cond: (id = 10000)  
                           ->  Index Scan using tbl_join_5_pkey on tbl_join_5 t5  (cost=0.29..2.91 rows=1 width=37)  
                                 Index Cond: (id = 10000)  
                     ->  Index Only Scan using tbl_join_6_pkey on tbl_join_6 t6  (cost=0.42..3.04 rows=1 width=4)  
                           Index Cond: (id = 10000)  
               ->  Index Only Scan using tbl_join_7_pkey on tbl_join_7 t7  (cost=0.43..3.04 rows=1 width=4)  
                     Index Cond: (id = 10000)  
         ->  Index Only Scan using tbl_join_8_pkey on tbl_join_8 t8  (cost=0.43..3.05 rows=1 width=4)  
               Index Cond: (id = 10000)  
   ->  Index Only Scan using tbl_join_9_pkey on tbl_join_9 t9  (cost=0.43..3.05 rows=1 width=4)  
         Index Cond: (id = 10000)  
(26 rows)  
  
Time: 8.398 ms  

以下寫法顯示關聯,可以強製JOIN順序

digoal=# set join_collapse_limit=1;  
  
digoal=# explain select t1.info, t5.info from   
tbl_join_1 t1 join tbl_join_2 t2 on (t1.id=t2.id)  
join tbl_join_3 t3 on (t2.id=t3.id)  
join tbl_join_4 t4 on (t3.id=t4.id)  
join tbl_join_5 t5 on (t4.id=t5.id)  
join tbl_join_6 t6 on (t5.id=t6.id)  
join tbl_join_7 t7 on (t6.id=t7.id)  
join tbl_join_8 t8 on (t7.id=t8.id)  
join tbl_join_9 t9 on (t8.id=t9.id)  
where t9.id=10000;  
  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Nested Loop  (cost=2.72..25.99 rows=1 width=65)  
   ->  Nested Loop  (cost=2.29..22.93 rows=1 width=69)  
         ->  Nested Loop  (cost=1.86..19.87 rows=1 width=69)  
               ->  Nested Loop  (cost=1.43..16.82 rows=1 width=69)  
                     ->  Nested Loop  (cost=1.00..13.76 rows=1 width=69)  
                           ->  Nested Loop  (cost=0.71..10.84 rows=1 width=36)  
                                 ->  Nested Loop  (cost=0.43..7.93 rows=1 width=36)  
                                       ->  Nested Loop  (cost=0.15..5.03 rows=1 width=36)  
                                             ->  Index Scan using tbl_join_1_pkey on tbl_join_1 t1  (cost=0.15..2.77 rows=1 width=36)  
                                                   Index Cond: (id = 10000)  
                                             ->  Seq Scan on tbl_join_2 t2  (cost=0.00..2.25 rows=1 width=4)  
                                                   Filter: (id = 10000)  
                                       ->  Index Only Scan using tbl_join_3_pkey on tbl_join_3 t3  (cost=0.28..2.89 rows=1 width=4)  
                                             Index Cond: (id = 10000)  
                                 ->  Index Only Scan using tbl_join_4_pkey on tbl_join_4 t4  (cost=0.29..2.90 rows=1 width=4)  
                                       Index Cond: (id = 10000)  
                           ->  Index Scan using tbl_join_5_pkey on tbl_join_5 t5  (cost=0.29..2.91 rows=1 width=37)  
                                 Index Cond: (id = 10000)  
                     ->  Index Only Scan using tbl_join_6_pkey on tbl_join_6 t6  (cost=0.42..3.04 rows=1 width=4)  
                           Index Cond: (id = 10000)  
               ->  Index Only Scan using tbl_join_7_pkey on tbl_join_7 t7  (cost=0.43..3.04 rows=1 width=4)  
                     Index Cond: (id = 10000)  
         ->  Index Only Scan using tbl_join_8_pkey on tbl_join_8 t8  (cost=0.43..3.05 rows=1 width=4)  
               Index Cond: (id = 10000)  
   ->  Index Only Scan using tbl_join_9_pkey on tbl_join_9 t9  (cost=0.43..3.05 rows=1 width=4)  
         Index Cond: (id = 10000)  
(26 rows)  
  
Time: 0.829 ms  

設置join_collapse_limit=1後, 按照SQL寫法進行關聯. 執行計劃的時間也縮短了.

6. 外連接

左外連接

左邊的表,即使沒有匹配,也輸出

postgres=# create table tab1(id int, info text, crt_time timestamp);  
CREATE TABLE  
Time: 26.056 ms  
postgres=# create table tab2(id int, info text, crt_time timestamp);  
CREATE TABLE  
Time: 36.215 ms  
postgres=# insert into tab1 values (1,'test',now());  
INSERT 0 1  
Time: 0.520 ms  
postgres=# insert into tab1 values (2,'test',now());  
INSERT 0 1  
Time: 0.297 ms  
postgres=# insert into tab2 values (2,'test',now());  
INSERT 0 1  
Time: 11.325 ms  
postgres=# insert into tab2 values (3,'test',now());  
INSERT 0 1  
Time: 0.352 ms  
postgres=# select * from tab1;  
 id | info |          crt_time            
----+------+----------------------------  
  1 | test | 2017-04-11 17:48:29.37083  
  2 | test | 2017-04-11 17:48:32.742795  
(2 rows)  
  
Time: 0.506 ms  
postgres=# select * from tab2;  
 id | info |          crt_time            
----+------+----------------------------  
  2 | test | 2017-04-11 17:48:39.722821  
  3 | test | 2017-04-11 17:48:41.901834  
(2 rows)  
  
Time: 0.335 ms  

右表沒有被匹配時,輸出NULL

postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id);  
 id | info |          crt_time          | id | info |          crt_time            
----+------+----------------------------+----+------+----------------------------  
  1 | test | 2017-04-11 17:48:29.37083  |    |      |   
  2 | test | 2017-04-11 17:48:32.742795 |  2 | test | 2017-04-11 17:48:39.722821  
(2 rows)  

過濾在A表但是不在B表的記錄。

postgres=# select tab1.* from tab1 left join tab2 on (tab1.id=tab2.id) where tab2.* is null;  
 id | info |         crt_time            
----+------+---------------------------  
  1 | test | 2017-04-11 17:48:29.37083  
(1 row)  
  
postgres=# select * from tab1 where id in (select id from tab1 except select id from tab2);  
 id | info |         crt_time            
----+------+---------------------------  
  1 | test | 2017-04-11 17:48:29.37083  
(1 row)  

通過外部where filter,可以過濾連接=false的記錄

postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id) where tab1.id=1;  
 id | info |         crt_time          | id | info | crt_time   
----+------+---------------------------+----+------+----------  
  1 | test | 2017-04-11 17:48:29.37083 |    |      |   
(1 row)  
  
postgres=# explain select * from tab1 left join tab2 on (tab1.id=tab2.id) where tab1.id=1;  
                            QUERY PLAN                              
------------------------------------------------------------------  
 Nested Loop Left Join  (cost=0.00..48.81 rows=36 width=88)  
   Join Filter: (tab1.id = tab2.id)  
   ->  Seq Scan on tab1  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
   ->  Materialize  (cost=0.00..24.16 rows=6 width=44)  
         ->  Seq Scan on tab2  (cost=0.00..24.12 rows=6 width=44)  
               Filter: (id = 1)  
(7 rows)  

join內的條件,不會過濾未連接的記錄

postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id and tab1.id=1);  
 id | info |          crt_time          | id | info | crt_time   
----+------+----------------------------+----+------+----------  
  1 | test | 2017-04-11 17:48:29.37083  |    |      |   
  2 | test | 2017-04-11 17:48:32.742795 |    |      |   
(2 rows)  
  
postgres=# explain select * from tab1 left join tab2 on (tab1.id=tab2.id and tab1.id=1);  
                             QUERY PLAN                                
---------------------------------------------------------------------  
 Hash Left Join  (cost=35.42..298.96 rows=1130 width=88)  
   Hash Cond: (tab1.id = tab2.id)  
   Join Filter: (tab1.id = 1)  
   ->  Seq Scan on tab1  (cost=0.00..21.30 rows=1130 width=44)  
   ->  Hash  (cost=21.30..21.30 rows=1130 width=44)  
         ->  Seq Scan on tab2  (cost=0.00..21.30 rows=1130 width=44)  
(6 rows)  

右外連接

左表沒有被匹配時,輸出NULL

postgres=# select * from tab1 right join tab2 on (tab1.id=tab2.id);  
 id | info |          crt_time          | id | info |          crt_time            
----+------+----------------------------+----+------+----------------------------  
  2 | test | 2017-04-11 17:48:32.742795 |  2 | test | 2017-04-11 17:48:39.722821  
    |      |                            |  3 | test | 2017-04-11 17:48:41.901834  
(2 rows)  

全外連接

沒有被匹配時,輸出NULL

postgres=# select * from tab1 full join tab2 on (tab1.id=tab2.id);  
 id | info |          crt_time          | id | info |          crt_time            
----+------+----------------------------+----+------+----------------------------  
  1 | test | 2017-04-11 17:48:29.37083  |    |      |   
  2 | test | 2017-04-11 17:48:32.742795 |  2 | test | 2017-04-11 17:48:39.722821  
    |      |                            |  3 | test | 2017-04-11 17:48:41.901834  
(3 rows)  

7. 窗口查詢

https://www.postgresql.org/docs/9.6/static/functions-window.html

常用窗口函數

Function Return Type Description
row_number() bigint number of the current row within its partition, counting from 1
rank() bigint rank of the current row with gaps; same as row_number of its first peer
dense_rank() bigint rank of the current row without gaps; this function counts peer groups
percent_rank() double precision relativ

最後更新:2017-04-12 23:25:18

  上一篇:go DB2數據庫運行維護
  下一篇:go 10 種機器學習算法的要點(附 Python 和 R 代碼)(轉載)