一天学会PostgreSQL应用开发与管理 - 4 高级SQL用法
背景
本章大纲
1. 聚集函数
常用聚合函数
统计类的聚合函数
分组排序聚合
Hypothetical-Set聚合函数
多维分析
2. 子查询
3. union\union all\except\intersect
4. 自连接
5. 内连接
优化器如何强制连接顺序?
6. 外连接
左外连接
右外连接
全外连接
7. 窗口查询
第二章:高级SQL用法
1. 聚集函数
https://www.postgresql.org/docs/9.6/static/functions-aggregate.html
常用聚合函数
Function | Argument Type(s) | Return Type | Description |
---|---|---|---|
array_agg(expression) | any | array of the argument type | input values, including nulls, concatenated into an array |
avg(expression) | smallint, int, bigint, real, double precision, numeric, or interval | numeric for any integer-type argument, double precision for a floating-point argument, otherwise the same as the argument data type | the average (arithmetic mean) of all input values |
bit_and(expression) | smallint, int, bigint, or bit | same as argument data type | the bitwise AND of all non-null input values, or null if none |
bit_or(expression) | smallint, int, bigint, or bit | same as argument data type | the bitwise OR of all non-null input values, or null if none |
bool_and(expression) | bool | bool | true if all input values are true, otherwise false |
bool_or(expression) | bool | bool | true if at least one input value is true, otherwise false |
count(*) | - | bigint | number of input rows |
count(expression) | any | bigint | number of input rows for which the value of expression is not null |
every(expression) | bool | bool | equivalent to bool_and |
json_agg(expression) | any | json | aggregates values as a JSON array |
json_object_agg(name,value) | (any, any) | json | aggregates name/value pairs as a JSON object |
max(expression) | any array, numeric, string, or date/time type | same as argument type | maximum value of expression across all input values |
min(expression) | any array, numeric, string, or date/time type | same as argument type | minimum value of expression across all input values |
string_agg(expression,delimiter) | (text, text) or (bytea, bytea) | same as argument types | input values concatenated into a string, separated by delimiter |
sum(expression) | smallint, int, bigint, real, double precision, numeric, interval, or money | bigint for smallint or int arguments, numeric for bigint arguments, otherwise the same as the argument data type | sum of expression across all input values |
xmlagg(expression) | xml | xml | concatenation of XML values (see alsoSection 9.14.1.7) |
上图中所有聚合函数, 当没有行输入时, 除了count返回0, 其他都返回null.
使用sum, array_agg时, 当没有行输入, 返回NULL可能有点别扭, 可以使用coalesce来替代NULL, 如coalesce(sum(x), 0)
coalesce(array_agg(x), '{}'::int[])
例子 :
聚合后得到数组, null将计入数组元素
postgres=# select array_agg(id) from (values(null),(1),(2)) as t(id);
array_agg
------------
{NULL,1,2}
(1 row)
算平均值时不计算null
postgres=# select avg(id) from (values(null),(1),(2)) as t(id);
avg
--------------------
1.5000000000000000
(1 row)
算bit与|或 时也不计算NULL
postgres=# select bit_and(id) from (values(null),(1),(2)) as t(id);
bit_and
---------
0
(1 row)
postgres=# select bit_or(id) from (values(null),(1),(2)) as t(id);
bit_or
--------
3
(1 row)
算布尔逻辑时也不计算NULL
postgres=# select bool_and(id) from (values(null),(true),(false)) as t(id);
bool_and
----------
f
(1 row)
every是bool_and的别名, 实际上是SQL标准中定义的.
postgres=# select every(id) from (values(null),(true),(false)) as t(id);
every
-------
f
(1 row)
SQL标准中还定义了any和some为bool_or的别名, 但是因为any和some还可以被解释为子查询, 所以在PostgreSQL中any和some的布尔逻辑聚合不可用.
postgres=# select any(id) from (values(null),(true),(false)) as t(id);
ERROR: syntax error at or near "any"
LINE 1: select any(id) from (values(null),(true),(false)) as t(id);
^
postgres=# select some(id) from (values(null),(true),(false)) as t(id);
ERROR: syntax error at or near "some"
LINE 1: select some(id) from (values(null),(true),(false)) as t(id);
^
bool_or的例子
postgres=# select bool_or(id) from (values(null),(true),(false)) as t(id);
bool_or
---------
t
(1 row)
计算非空的表达式个数, count带表达式时, 不计算null
postgres=# select count(id) from (values(null),(1),(2)) as t(id);
count
-------
2
(1 row)
计算表达式(含空值)的个数, count()计算null, 注意count()是一个独立的聚合函数. 请和count(express)区分开来.
postgres=# select count(*) from (values(null),(1),(2)) as t(id);
count
-------
3
(1 row)
postgres=# select count(*) from (values(null),(null),(1),(2)) as t(id);
count
-------
4
(1 row)
聚合后得到json, 不带key的json聚合
postgres=# select json_agg(id) from (values(null),(true),(false)) as t(id);
json_agg
---------------------
[null, true, false]
(1 row)
聚合后得到json, 带key的json聚合, 注意key不能为null, 否则报错.
postgres=# select json_object_agg(c1,c2) from (values('a',null),('b',true),('c',false)) as t(c1,c2);
json_object_agg
-----------------------------------------
{ "a" : null, "b" : true, "c" : false }
(1 row)
postgres=# select json_object_agg(c1,c2) from (values(null,null),('b',true),('c',false)) as t(c1,c2);
ERROR: 22023: field name must not be null
LOCATION: json_object_agg_transfn, json.c:1959
计算最大最小值, max, min都不计算null
postgres=# select max(id) from (values(null),(1),(2)) as t(id);
max
-----
2
(1 row)
postgres=# select min(id) from (values(null),(1),(2)) as t(id);
min
-----
1
(1 row)
聚合后得到字符串, 字符串聚合
postgres=# select string_agg(c1,'***') from (values('a',null),('b',true),('c',false)) as t(c1,c2);
string_agg
------------
a***b***c
(1 row)
postgres=# select string_agg(id,'***') from (values(null),('digoal'),('zhou')) as t(id);
string_agg
---------------
digoal***zhou
(1 row)
计算总和, sum不计算null, 当所有行都是null时, 即没有任何行输入, 返回null.
postgres=# select sum(id) from (values(null),(1),(2)) as t(id);
sum
-----
3
(1 row)
postgres=# select sum(id::int) from (values(null),(null),(null)) as t(id);
sum
-----
(1 row)
(1 row)
聚合后得到xml
postgres=# select xmlagg(id::xml) from (values(null),('<foo>digoal</foo>'),('<bar/>')) as t(id);
xmlagg
-------------------------
<foo>digoal</foo><bar/>
(1 row)
某些聚合函数得到的结果可能和行的输入顺序有关, 例如array_agg, json_agg, json_object_agg, string_agg, and xmlagg, 以及某些自定义聚合函数. 如何来实现呢?
支持聚合函数中使用order by的PostgreSQL版本可以用如下语法 :
postgres=# select string_agg(id,'***' order by id) from (values(null),('digoal'),('zhou')) as t(id);
string_agg
---------------
digoal***zhou
(1 row)
postgres=# select string_agg(id,'***' order by id desc) from (values(null),('digoal'),('zhou')) as t(id);
string_agg
---------------
zhou***digoal
(1 row)
统计类的聚合函数
Function | Argument Type | Return Type | Description |
---|---|---|---|
corr(Y, X) | double precision | double precision | correlation coefficient |
covar_pop(Y, X) | double precision | double precision | population covariance |
covar_samp(Y, X) | double precision | double precision | sample covariance |
regr_avgx(Y, X) | double precision | double precision | average of the independent variable (sum(X)/N) |
regr_avgy(Y, X) | double precision | double precision | average of the dependent variable (sum(Y)/N) |
regr_count(Y, X) | double precision | bigint | number of input rows in which both expressions are nonnull |
regr_intercept(Y, X) | double precision | double precision | y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs |
regr_r2(Y, X) | double precision | double precision | square of the correlation coefficient |
regr_slope(Y, X) | double precision | double precision | slope of the least-squares-fit linear equation determined by the (X, Y) pairs |
regr_sxx(Y, X) | double precision | double precision | sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable) |
regr_sxy(Y, X) | double precision | double precision | sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable) |
regr_syy(Y, X) | double precision | double precision | sum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable) |
stddev(expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | historical alias for stddev_samp |
stddev_pop(expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | population standard deviation of the input values |
stddev_samp(expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | sample standard deviation of the input values |
variance(expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | historical alias for var_samp |
var_pop(expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | population variance of the input values (square of the population standard deviation) |
var_samp(expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | sample variance of the input values (square of the sample standard deviation) |
相关性统计 :
corr, regr_r2
总体|样本 方差, 标准方差 :
variance, var_pop, var_samp
stddev, stddev_pop, stddev_samp
总体协方差, 样本协方差 :
covar_pop, covar_samp
线性回归 :
regr_avgx, regr_avgy, regr_count, regr_intercept(截距), regr_r2(相关度corr的平方), regr_slope(斜率), regr_sxx, regr_sxy, regr_syy.
分组排序聚合
Function | Direct Argument Type(s) | Aggregated Argument Type(s) | Return Type | Description |
---|---|---|---|---|
mode() WITHIN GROUP (ORDER BYsort_expression) | - | any sortable type | same as sort expression | returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results) |
percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression) | double precision | double precisionor interval | same as sort expression | continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed |
percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression) | double precision[] | double precisionor interval | array of sort expression's type | multiple continuous percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the value corresponding to that percentile |
percentile_disc(fraction) WITHIN GROUP (ORDER BY sort_expression) | double precision | any sortable type | same as sort expression | discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction |
percentile_disc(fractions) WITHIN GROUP (ORDER BY sort_expression) | double precision[] | any sortable type | array of sort expression's type | multiple discrete percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the input value corresponding to that percentile |
mode比较好理解, 就是取分组中出现频率最高的值或表达式, 如果最高频率的值有多个, 则随机取一个.
postgres=# create table test(id int, info text);
CREATE TABLE
postgres=# insert into test values (1,'test1');
INSERT 0 1
postgres=# insert into test values (1,'test1');
INSERT 0 1
postgres=# insert into test values (1,'test2');
INSERT 0 1
postgres=# insert into test values (1,'test3');
INSERT 0 1
postgres=# insert into test values (2,'test1');
INSERT 0 1
postgres=# insert into test values (2,'test1');
INSERT 0 1
postgres=# insert into test values (2,'test1');
INSERT 0 1
postgres=# insert into test values (3,'test4');
INSERT 0 1
postgres=# insert into test values (3,'test4');
INSERT 0 1
postgres=# insert into test values (3,'test4');
INSERT 0 1
postgres=# insert into test values (3,'test4');
INSERT 0 1
postgres=# insert into test values (3,'test4');
INSERT 0 1
postgres=# select * from test;
id | info
----+-------
1 | test1
1 | test1
1 | test2
1 | test3
2 | test1
2 | test1
2 | test1
3 | test4
3 | test4
3 | test4
3 | test4
3 | test4
(12 rows)
取出所有数据中, 出现频率最高的info, 有可能是test1也有可能是test4, 因为他们的出现频率一致.
mode的返回结果数据类型和order by后面的表达式一致.
postgres=# select mode() within group (order by info) from test;
mode
-------
test1
(1 row)
如果按INFO来分组的话, 取出出现频率最高的info, 实际上这个操作是没有任何意义的, 返回值就是所有记录的info的唯一值.
postgres=# select mode() within group (order by info) from test group by info;
mode
-------
test1
test2
test3
test4
(4 rows)
按id来分组, 取出组内出现频率最高的info值, 这个是有意义的.
postgres=# select mode() within group (order by info) from test group by id;
mode
-------
test1
test1
test4
(3 rows)
id=1 , 出现频率最高的info是test1. 出现2次.
如下 :
postgres=# select id,info,count(*) from test group by id,info;
id | info | count
----+-------+-------
1 | test1 | 2
1 | test3 | 1
3 | test4 | 5
1 | test2 | 1
2 | test1 | 3
(5 rows)
如果要返回mode()并返回频率次数. 可以使用row_number()窗口来实现. 如下.
postgres=# select id,info,cnt from (select id,info,cnt,row_number() over(partition by id order by cnt desc) as rn from (select id,info,count(*) cnt from test group by id,info) t) t where t.rn=1;
id | info | cnt
----+-------+-----
1 | test1 | 2
2 | test1 | 3
3 | test4 | 5
(3 rows)
其他, mode的返回结果数据类型和order by后面的表达式一致.
postgres=# select mode() within group (order by id) from test;
mode
------
3
(1 row)
postgres=# select mode() within group (order by id+1) from test;
mode
------
4
(1 row)
另外还有4个函数是和数据分布有关的, 需要指定从0到1的分布位置. 返回排序后, 在指定分布位置的值或表达式的值.
src/backend/utils/adt/orderedsetaggs.c
if (percentile < 0 || percentile > 1 || isnan(percentile))
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
errmsg("percentile value %g is not between 0 and 1",
percentile)));
同时还需要注意区分连续分布和离散分布.
postgres=# create table test(id int, info text);
CREATE TABLE
postgres=# insert into test values (1,'test1');
INSERT 0 1
postgres=# insert into test values (2,'test2');
INSERT 0 1
postgres=# insert into test values (3,'test2');
INSERT 0 1
postgres=# insert into test values (4,'test2');
INSERT 0 1
postgres=# insert into test values (5,'test2');
INSERT 0 1
postgres=# insert into test values (6,'test2');
INSERT 0 1
postgres=# insert into test values (7,'test2');
INSERT 0 1
postgres=# insert into test values (8,'test3');
INSERT 0 1
postgres=# insert into test values (100,'test3');
INSERT 0 1
postgres=# insert into test values (1000,'test4');
INSERT 0 1
postgres=# select * from test;
id | info
------+-------
1 | test1
2 | test2
3 | test2
4 | test2
5 | test2
6 | test2
7 | test2
8 | test3
100 | test3
1000 | test4
(10 rows)
取连续分布的中位数可以用percentile_cont(0.5)来获得.
postgres=# select percentile_cont(0.5) within group (order by id) from test;
percentile_cont
-----------------
5.5
(1 row)
这个5.5是怎么计算来的呢? 参考本文末尾 :
If (CRN = FRN = RN) then the result is
(value of expression from row at RN)
Otherwise the result is
(CRN - RN) * (value of expression for row at FRN) +
(RN - FRN) * (value of expression for row at CRN)
解释 :
N = 当前分组的行数 = 10
RN = (1+传入参数*(N-1)) = (1+0.5*(10-1)) = 5.5
CRN = ceiling(RN) = 6
FRN = floor(RN) = 5
value of expression for row at FRN : 当前分组内第FRN行的值 = 5
value of expression for row at CRN : 当前分组内第CRN行的值 = 6
所以最终中位数值 :
(CRN - RN) * (value of expression for row at FRN) +
(RN - FRN) * (value of expression for row at CRN) =
(6-5.5)*(5) + (5.5 - 5)*(6) = 5.5;
使用info分组 :
postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;
percentile_cont | info
-----------------+-------
1 | test1
4.5 | test2
54 | test3
1000 | test4
(4 rows)
验证这个值 4.5 | test2 :
2 | test2
3 | test2
4 | test2
5 | test2
6 | test2
7 | test2
N = 当前分组的行数 = 6
RN = (1+传入参数*(N-1)) = (1+0.5*(6-1)) = 3.5
CRN = ceiling(RN) = 4
FRN = floor(RN) = 3
value of expression for row at FRN : 当前分组内第FRN行的值 = 4
value of expression for row at CRN : 当前分组内第CRN行的值 = 5
所以最终中位数值 :
(CRN - RN) * (value of expression for row at FRN) +
(RN - FRN) * (value of expression for row at CRN) =
(4-3.5)*(4) + (3.5 - 3)*(5) = 4.5;
当输入参数为数组时, 返回值也是数组, 如下 :
postgres=# select percentile_cont(array[0.5, 1]) within group (order by id) from test;
percentile_cont
-----------------
{5.5,1000}
(1 row)
接下来看一下稀疏分布 :
返回行号大于等于指定百分比的值或表达式值.
例如 :
postgres=# select id from test;
id
------
1
2
3
4
5
6
7
8
100
1000
(10 rows)
当前组一共10行, 取位置在0.5的.即行号>=0.5*10的第一行的值或表达式的值.
postgres=# select percentile_disc(0.5) within group (order by id) from test;
percentile_disc
-----------------
5
(1 row)
postgres=# select percentile_disc(0.5) within group (order by id^2) from test;
percentile_disc
-----------------
25
(1 row)
输入0.11, 表示行号返回>=1.1的第一行的值.
postgres=# select percentile_disc(0.11) within group (order by id) from test;
percentile_disc
-----------------
2
(1 row)
再看个例子
postgres=# select id,info,count(*) over (partition by info) from test;
id | info | count
------+-------+-------
1 | test1 | 1
2 | test2 | 6
3 | test2 | 6
4 | test2 | 6
5 | test2 | 6
6 | test2 | 6
7 | test2 | 6
8 | test3 | 2
100 | test3 | 2
1000 | test4 | 1
(10 rows)
取分组的数据, 主要看test2 这个组一共有6行, 0.3*6=1.8, 所以它需要取第二行的数据.
postgres=# select info,percentile_disc(0.3) within group (order by id) from test group by info;
info | percentile_disc
-------+-----------------
test1 | 1
test2 | 3
test3 | 8
test4 | 1000
(4 rows)
注意
最终计算的是表达式的分布数, 而不是计算列值的分布数后再计算表达式.
验证如下 :
或者你可以看代码 :
postgres=# select percentile_cont(0.5) within group (order by id^2),info from test group by info;
percentile_cont | info
-----------------+-------
1 | test1
20.5 | test2
5032 | test3
1000000 | test4
(4 rows)
postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;
percentile_cont | info
-----------------+-------
1 | test1
4.5 | test2
54 | test3
1000 | test4
(4 rows)
postgres=# select 4.5^2;
?column?
---------------------
20.2500000000000000
(1 row)
postgres=# select 54^2;
?column?
----------
2916
(1 row)
Hypothetical-Set聚合函数
函数 | 返回类型 | 描述 |
---|---|---|
rank() | bigint | rank of the current row with gaps; same as row_number of its first peer |
dense_rank() | bigint | rank of the current row without gaps; this function counts peer groups |
percent_rank() | double precision | relative rank of the current row: (rank - 1) / (total rows - 1) |
cume_dist() | double precision | relative rank of the current row: (number of rows preceding or peer with current row) / (total rows) |
the four ranking functions are defined so that they give the same answer for any two peer rows.
rank 返回值在分组内的等级, 如果值有重复的话, 跳级处理.
dense_rank 返回值在分组内的等级, 如果值有重复的话, 不跳级处理.
percent_rank 返回 (rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数
cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 即截至当前记录等级一共有多少行除以本组的总行数.
看一个例子比较明白.
postgres=# select *,row_number() over(partition by info order by id),rank() over(partition by info order by id),dense_rank() over(partition by info order by id),percent_rank() over(partition by info order by id),cume_dist() over(partition by info order by id) from test;
id | info | row_number | rank | dense_rank | percent_rank | cume_dist
------+-------+------------+------+------------+--------------+-------------------
1 | test1 | 1 | 1 | 1 | 0 | 1
2 | test2 | 1 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 2 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 3 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 4 | 1 | 1 | 0 | 0.444444444444444
3 | test2 | 5 | 5 | 2 | 0.5 | 0.555555555555556
4 | test2 | 6 | 6 | 3 | 0.625 | 0.666666666666667
5 | test2 | 7 | 7 | 4 | 0.75 | 0.777777777777778
6 | test2 | 8 | 8 | 5 | 0.875 | 0.888888888888889
7 | test2 | 9 | 9 | 6 | 1 | 1
8 | test3 | 1 | 1 | 1 | 0 | 0.5
100 | test3 | 2 | 2 | 2 | 1 | 1
1000 | test4 | 1 | 1 | 1 | 0 | 1
(13 rows)
算法 :
以info='test2'这个组为例 :
2 | test2 | 1 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 2 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 3 | 1 | 1 | 0 | 0.444444444444444
2 | test2 | 4 | 1 | 1 | 0 | 0.444444444444444
id=2 的 rank和dense_rank都是1.
percent_rank 返回 (rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数
cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.
所以
percent_rank = (1-1)/(9-1)=0
cume_dist = (4)/(9) = 0.444444444444444
3 | test2 | 5 | 5 | 2 | 0.5 | 0.555555555555556
rank = 5, 跳级
dense_rank = 2, 不跳级
percent_rank = (5-1)/(9-1)=0.5
cume_dist = (5)/(9) = 0.555555555555556
这些窗口函数的另一种用法, 聚合用法.
Function | Direct Argument Type(s) | Aggregated Argument Type(s) | Return Type | Description |
---|---|---|---|---|
rank(args) WITHIN GROUP (ORDER BY sorted_args) | VARIADIC "any" | VARIADIC "any" | bigint | rank of the hypothetical row, with gaps for duplicate rows |
dense_rank(args) WITHIN GROUP (ORDER BY sorted_args) | VARIADIC "any" | VARIADIC "any" | bigint | rank of the hypothetical row, without gaps |
percent_rank(args) WITHIN GROUP (ORDER BYsorted_args) | VARIADIC "any" | VARIADIC "any" | double precision | relative rank of the hypothetical row, ranging from 0 to 1 |
cume_dist(args) WITHIN GROUP (ORDER BY sorted_args) | VARIADIC "any" | VARIADIC "any" | double precision | relative rank of the hypothetical row, ranging from 1/N to 1 |
这些用法比较奇特, 其实是要返回给定参数在集合中的位置.
例如 :
1
2
3
4
5
如果我们给一个参数值是2.2, 应该排在以上数据中的第三行.
例子 :
postgres=# select * from test order by info,id;
id | info
------+-------
1 | test1
2 | test2
2 | test2
2 | test2
2 | test2
3 | test2
4 | test2
5 | test2
6 | test2
7 | test2
8 | test3
100 | test3
1000 | test4
(13 rows)
postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id) from test group by info;
info | rank | dense_rank
-------+------+------------
test1 | 2 | 2
test2 | 7 | 4
test3 | 1 | 1
test4 | 1 | 1
(4 rows)
4.9在test1这个分组, 排名第2, 并且这个分组只有1个值, 所以没有gap.
重点关注test2这个组, 这个组有9个值, 其中有4个重复值2, 所以4.9在这里排名需要考虑gap.
rank 返回7, 即4.9在这里考虑GAP排名第7
dense_rank 返回4, 即4.9在这里不考虑GAP排名第4.
又如 :
postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id) from test group by info;
info | rank | dense_rank
-------+------+------------
test1 | 2 | 2
test2 | 7 | 4
test3 | 1 | 1
test4 | 1 | 1
(4 rows)
postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id) from test group by info;
info | rank | dense_rank
-------+------+------------
test1 | 2 | 2
test2 | 8 | 5
test3 | 1 | 1
test4 | 1 | 1
(4 rows)
最后要看计算0~1代表位置的聚合函数percent_rank和cume_dist.
算法
percent_rank 返回 (rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数
cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.
例子1 :
postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id),percent_rank(4.9) within group (order by id),cume_dist(4.9) within group (order by id) from test group by info;
info | rank | dense_rank | percent_rank | cume_dist
-------+------+------------+-------------------+-------------------
test1 | 2 | 2 | 1 | 1
test2 | 7 | 4 | 0.666666666666667 | 0.7
test3 | 1 | 1 | 0 | 0.333333333333333
test4 | 1 | 1 | 0 | 0.5
(4 rows)
同样以test2为分组, 讲解算法. 把4.9插入到这个分组后. 数据应该变成 :
2 | test2
2 | test2
2 | test2
2 | test2
3 | test2
4 | test2
4.9 | test2 # 计算位置
5 | test2
6 | test2
7 | test2
一共10行.
percent_rank 返回 (rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数
cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.
所以4.9对应的percent_rank 和 cume_dist 分别为 :
percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667
cume_dist = (7)/10 = 0.7
例子2 :
postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id),percent_rank(5) within group (order by id),cume_dist(5) within group (order by id) from test group by info;
info | rank | dense_rank | percent_rank | cume_dist
-------+------+------------+-------------------+-------------------
test1 | 2 | 2 | 1 | 1
test2 | 7 | 4 | 0.666666666666667 | 0.8
test3 | 1 | 1 | 0 | 0.333333333333333
test4 | 1 | 1 | 0 | 0.5
(4 rows)
插入计算值5后, 数据变成
2 | test2
2 | test2
2 | test2
2 | test2
3 | test2
4 | test2
5 | test2 # 计算位置, 即参数值
5 | test2
6 | test2
7 | test2
依旧10行. 但是截至当前记录等级一共有多少行? 注意是8了.
percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667
cume_dist = (8)/10 = 0.8
例子3 :
postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id),percent_rank(5.1) within group (order by id),cume_dist(5.1) within group (order by id) from test group by info;
info | rank | dense_rank | percent_rank | cume_dist
-------+------+------------+-------------------+-------------------
test1 | 2 | 2 | 1 | 1
test2 | 8 | 5 | 0.777777777777778 | 0.8
test3 | 1 | 1 | 0 | 0.333333333333333
test4 | 1 | 1 | 0 | 0.5
(4 rows)
插入计算值5.1后, 数据变成 :
2 | test2
2 | test2
2 | test2
2 | test2
3 | test2
4 | test2
5 | test2
5.1 | test2 # 计算位置, 即参数值
6 | test2
7 | test2
例子4 :
postgres=# select info,rank(5) within group (order by id desc),dense_rank(5) within group (order by id desc),percent_rank(5) within group (order by id desc),cume_dist(5) within group (order by id desc) from test group by info;
info | rank | dense_rank | percent_rank | cume_dist
-------+------+------------+-------------------+-----------
test1 | 1 | 1 | 0 | 0.5
test2 | 3 | 3 | 0.222222222222222 | 0.4
test3 | 3 | 3 | 1 | 1
test4 | 2 | 2 | 1 | 1
(4 rows)
插入计算值5后, 数据变成 :
7 | test2
6 | test2
5 | test2 # 注意, 这才是计算位置, 即插入位置.
5 | test2
4 | test2
3 | test2
2 | test2
2 | test2
2 | test2
2 | test2
多维分析
=> SELECT * FROM items_sold;
brand | size | sales
-------+------+-------
Foo | L | 10
Foo | M | 20
Bar | M | 15
Bar | L | 5
(4 rows)
=> SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());
brand | size | sum
-------+------+-----
Foo | | 30
Bar | | 20
| L | 15
| M | 35
| | 50
(5 rows)
其中GROUP BY GROUPING SETS ((brand), (size), ());
相当于以下三个group by的union all(未在分组的列以NULL代替。):
group by brand
group by size
group by ()
分组集合除了可以用GROUPING SETS来指定,另外还提供了两个特殊的写法rollup和cube.
ROLLUP ( e1, e2, e3, ... )
代表递减分组,一般用于异构结构的分组如国家,省份,城市,乡镇这样的结构查询。
逐级分组汇总结果,它相当于如下写法:
GROUPING SETS (
( e1, e2, e3, ... ),
...
( e1, e2 )
( e1 )
( ) -- 注意包含全集
)
还有一种写法是CUBE
CUBE ( a, b, c )
cube是任意组合,相当于:
GROUPING SETS (
( a, b, c ),
( a, b ),
( a, c ),
( a ),
( b, c ),
( b ),
( c ),
( ) -- 注意包含全集
)
在cube和rollup中使用括号可以将多个表达式作为单个表达式来处理:
ROLLUP ( a, (b,c), d )
递减,相当于
GROUPING SETS (
( a, b, c, d ),
( a, b, c ),
( a ),
( )
)
CUBE ( (a,b), (c,d) )
相当于:
GROUPING SETS (
( a, b, c, d ),
( a, b ),
( c, d ),
( )
)
同时cube,rollup,grouping sets还可以混合使用:
GROUP BY a, CUBE(b,c), GROUPING SETS ((d), (e))
相当于:
GROUP BY GROUPING SETS (
(a,b,c,d), (a,b,c,e),
(a,b,d), (a,b,e),
(a,c,d), (a,c,e),
(a,d), (a,e)
)
既然分组聚合相当于多个group by 的union all,那么我们如何区分当前输出的记录是哪个分组group by的输出呢?
用grouping(cols)可以表示未参与聚合的表达式的比特位,并转换为INT输出。
例如:
=> SELECT * FROM items_sold;
make | model | sales
-------+-------+-------
Foo | GT | 10
Foo | Tour | 20
Bar | City | 15
Bar | Sport | 5
(4 rows)
grouping()中必须包含group by后面的任意或所有列。
=> SELECT make, model, GROUPING(make,model), sum(sales) FROM items_sold GROUP BY ROLLUP(make,model);
make | model | grouping | sum
-------+-------+----------+-----
Foo | GT | 0 | 10
Foo | Tour | 0 | 20
Bar | City | 0 | 15
Bar | Sport | 0 | 5
Foo | | 1 | 30
Bar | | 1 | 20
| | 3 | 50
(7 rows)
grouping()中必须包含group by后面的任意或所有列,不能包含未参与聚合的列。
grouping()中的每个表达式用1个比特位表示。
postgres=# create table tbl(c1 int,c2 int,c3 int, c4 int, c5 int);
CREATE TABLE
postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);
c1 | count | grouping
----+-------+----------
| 0 | 7 -- b'111'::int = 7
(1 row)
postgres=# select c1,count(*),grouping(c1,c2) from tbl group by cube(c1,c2,c3);
c1 | count | grouping
----+-------+----------
| 0 | 3 b'11'::int = 3
(1 row)
postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);
c1 | count | grouping
----+-------+----------
| 0 | 1 b'1'::int = 1
(1 row)
postgres=# insert into tbl values (1,2,3,4,5);
INSERT 0 1
postgres=# insert into tbl values (1,2,3,4,6);
INSERT 0 1
postgres=# insert into tbl values (2,3,4,5,6);
INSERT 0 1
postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);
c1 | count | grouping
----+-------+----------
1 | 2 | 0
1 | 2 | 0
1 | 2 | 1
2 | 1 | 0
2 | 1 | 0
2 | 1 | 1
| 3 | 1
1 | 2 | 1
| 2 | 1
2 | 1 | 1
| 1 | 1
| 2 | 0
| 2 | 0
| 1 | 0
| 1 | 0
(15 rows)
grouping中包含未参与聚合的列将报错:
postgres=# select c1,count(*),grouping(c4) from tbl group by cube(c1,c2,c3);
ERROR: arguments to GROUPING must be grouping expressions of the associated query level
LINE 1: select c1,count(*),grouping(c4) from tbl group by cube(c1,c2...
^
postgres=# select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by cube(c1,c2,c3);
ERROR: arguments to GROUPING must be grouping expressions of the associated query level
LINE 1: select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by c...
^
postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);
c1 | count | grouping
----+-------+----------
1 | 2 | 0
1 | 2 | 1
1 | 2 | 3
2 | 1 | 0
2 | 1 | 1
2 | 1 | 3
| 3 | 7
1 | 2 | 2
| 2 | 6
2 | 1 | 2
| 1 | 6
| 2 | 4
| 2 | 5
| 1 | 4
| 1 | 5
(15 rows)
2. 子查询
select子查询只能返回一列
postgres=# select (select * from (values (1,2),(2,3)) as t(c1,c2)) , relname, relkind from pg_class;
ERROR: subquery must return only one column
LINE 1: select (select * from (values (1,2),(2,3)) as t(c1,c2)) , re...
^
select子查询只能返回一条记录
postgres=# select (select * from (values (1),(2)) as t(c1)) , relname, relkind from pg_class;
ERROR: more than one row returned by a subquery used as an expression
子查询可以用在select子句也可以用在源中,还可以用在with, update from语句
postgres=# select (select * from (values (1),(2)) as t(c1) limit 1) , relname, relkind from pg_class;
c1 | relname | relkind
----+-----------------------------------------------+---------
1 | pg_type | r
1 | pg_toast_187550 | t
1 | new_type | c
1 | pg_toast_187550_index | i
1 | test | r
1 | pg_toast_187556 | t
postgres=# select t.relname from (select * from pg_class limit 1) t , pg_class where t.relname=pg_class.relname;
relname
---------
pg_type
(1 row)
3. union\union all\except\intersect
两条或者多条查询结果的合并、去重合并、相交、求差。
union all(不去重)
postgres=# select * from (values (1),(1)) t(id) union all select * from (values (2),(2)) t(id);
id
----
1
1
2
2
(4 rows)
union(去重)
postgres=# select * from (values (1),(1)) t(id) union select * from (values (2),(2)) t(id);
id
----
1
2
(2 rows)
except(去重)
postgres=# select * from (values (1),(1),(2)) t(id) except select * from (values (2),(2)) t(id);
id
----
1
(1 row)
intersect(去重)
postgres=# select * from (values (1),(1),(2)) t(id) intersect select * from (values (2),(2)) t(id);
id
----
2
(1 row)
4. 自连接
当某个表或者子句的A字段要和B字段进行关联时,可以使用自关联。
postgres=# create table tbl8(c1 int, c2 int, info text);
CREATE TABLE
postgres=# insert into tbl8 select generate_series(1,100), generate_series(2,101),md5(random()::text) ;
INSERT 0 100
postgres=# select t1.* from tbl8 t1, tbl8 t2 where t1.c1=t2.c2 and t2.c2<10 limit 10;
c1 | c2 | info
----+----+----------------------------------
2 | 3 | b54fbfd843c2343330ceaa1758882ee4
3 | 4 | df999680d87435f2d2d8d5604aca2f1e
4 | 5 | 497ad2cfa8acd1f7062bb1bd7aa7a646
5 | 6 | ef55bcdc85dfa8d54978e4c2085ec55a
6 | 7 | 45f60117d6d577389707b22823a513e5
7 | 8 | ebde16ccaeced9a400a1608d591a2bf0
8 | 9 | fdeb1505dd02aca33abed20531592302
9 | 10 | 010d07f3b9b05b2c36b46440b1dd92aa
(8 rows)
5. 内连接
内连接,仅仅输出符合连接条件的记录。
digoal=# create table tbl_join_1(id int primary key,info text);
CREATE TABLE
digoal=# create table tbl_join_2(id int primary key,info text);
CREATE TABLE
digoal=# create table tbl_join_3(id int primary key,info text);
CREATE TABLE
digoal=# create table tbl_join_4(id int primary key,info text);
CREATE TABLE
digoal=# create table tbl_join_5(id int primary key,info text);
CREATE TABLE
digoal=# create table tbl_join_6(id int primary key,info text);
CREATE TABLE
digoal=# create table tbl_join_7(id int primary key,info text);
CREATE TABLE
digoal=# create table tbl_join_8(id int primary key,info text);
CREATE TABLE
digoal=# create table tbl_join_9(id int primary key,info text);
CREATE TABLE
digoal=# insert into tbl_join_1 select generate_series(1,10),md5(random()::text);
digoal=# insert into tbl_join_2 select generate_series(1,100),md5(random()::text);
digoal=# insert into tbl_join_3 select generate_series(1,1000),md5(random()::text);
digoal=# insert into tbl_join_4 select generate_series(1,10000),md5(random()::text);
digoal=# insert into tbl_join_5 select generate_series(1,100000),md5(random()::text);
digoal=# insert into tbl_join_6 select generate_series(1,1000000),md5(random()::text);
digoal=# insert into tbl_join_7 select generate_series(1,2000000),md5(random()::text);
digoal=# insert into tbl_join_8 select generate_series(1,3000000),md5(random()::text);
digoal=# insert into tbl_join_9 select generate_series(1,4000000),md5(random()::text);
非显示JOIN
postgres=# select t1.* from tbl_join_1 t1, tbl_join_2 t2 where t1.id=t2.id and t2.id=1;
id | info
----+----------------------------------
1 | 5b6049ec7b94d17c0bd92dc4da436311
(1 row)
显示JOIN
postgres=# select t1.* from tbl_join_1 t1 join tbl_join_2 t2 on(t1.id=t2.id and t2.id=1);
id | info
----+----------------------------------
1 | 5b6049ec7b94d17c0bd92dc4da436311
(1 row)
优化器如何强制连接顺序?
当join_collapse_limit=1时, 显示JOIN的关联按SQL写法进行关联. 例如 :
以下写法非显示关联,无法强制JOIN顺序
digoal=# set join_collapse_limit=1;
digoal=# explain select t1.info, t5.info from tbl_join_1 t1,
tbl_join_2 t2,
tbl_join_3 t3,
tbl_join_4 t4,
tbl_join_5 t5,
tbl_join_6 t6,
tbl_join_7 t7,
tbl_join_8 t8,
tbl_join_9 t9
where
t1.id=t2.id and
t2.id=t3.id and
t3.id=t4.id and
t4.id=t5.id and
t5.id=t6.id and
t6.id=t7.id and
t7.id=t8.id and
t8.id=t9.id and
t9.id=10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2.72..25.99 rows=1 width=65)
-> Nested Loop (cost=2.29..22.93 rows=1 width=69)
-> Nested Loop (cost=1.86..19.87 rows=1 width=69)
-> Nested Loop (cost=1.43..16.82 rows=1 width=69)
-> Nested Loop (cost=1.00..13.76 rows=1 width=69)
-> Nested Loop (cost=0.71..10.84 rows=1 width=36)
-> Nested Loop (cost=0.43..7.93 rows=1 width=36)
-> Nested Loop (cost=0.15..5.03 rows=1 width=36)
-> Index Scan using tbl_join_1_pkey on tbl_join_1 t1 (cost=0.15..2.77 rows=1 width=36)
Index Cond: (id = 10000)
-> Seq Scan on tbl_join_2 t2 (cost=0.00..2.25 rows=1 width=4)
Filter: (id = 10000)
-> Index Only Scan using tbl_join_3_pkey on tbl_join_3 t3 (cost=0.28..2.89 rows=1 width=4)
Index Cond: (id = 10000)
-> Index Only Scan using tbl_join_4_pkey on tbl_join_4 t4 (cost=0.29..2.90 rows=1 width=4)
Index Cond: (id = 10000)
-> Index Scan using tbl_join_5_pkey on tbl_join_5 t5 (cost=0.29..2.91 rows=1 width=37)
Index Cond: (id = 10000)
-> Index Only Scan using tbl_join_6_pkey on tbl_join_6 t6 (cost=0.42..3.04 rows=1 width=4)
Index Cond: (id = 10000)
-> Index Only Scan using tbl_join_7_pkey on tbl_join_7 t7 (cost=0.43..3.04 rows=1 width=4)
Index Cond: (id = 10000)
-> Index Only Scan using tbl_join_8_pkey on tbl_join_8 t8 (cost=0.43..3.05 rows=1 width=4)
Index Cond: (id = 10000)
-> Index Only Scan using tbl_join_9_pkey on tbl_join_9 t9 (cost=0.43..3.05 rows=1 width=4)
Index Cond: (id = 10000)
(26 rows)
Time: 8.398 ms
以下写法显示关联,可以强制JOIN顺序
digoal=# set join_collapse_limit=1;
digoal=# explain select t1.info, t5.info from
tbl_join_1 t1 join tbl_join_2 t2 on (t1.id=t2.id)
join tbl_join_3 t3 on (t2.id=t3.id)
join tbl_join_4 t4 on (t3.id=t4.id)
join tbl_join_5 t5 on (t4.id=t5.id)
join tbl_join_6 t6 on (t5.id=t6.id)
join tbl_join_7 t7 on (t6.id=t7.id)
join tbl_join_8 t8 on (t7.id=t8.id)
join tbl_join_9 t9 on (t8.id=t9.id)
where t9.id=10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2.72..25.99 rows=1 width=65)
-> Nested Loop (cost=2.29..22.93 rows=1 width=69)
-> Nested Loop (cost=1.86..19.87 rows=1 width=69)
-> Nested Loop (cost=1.43..16.82 rows=1 width=69)
-> Nested Loop (cost=1.00..13.76 rows=1 width=69)
-> Nested Loop (cost=0.71..10.84 rows=1 width=36)
-> Nested Loop (cost=0.43..7.93 rows=1 width=36)
-> Nested Loop (cost=0.15..5.03 rows=1 width=36)
-> Index Scan using tbl_join_1_pkey on tbl_join_1 t1 (cost=0.15..2.77 rows=1 width=36)
Index Cond: (id = 10000)
-> Seq Scan on tbl_join_2 t2 (cost=0.00..2.25 rows=1 width=4)
Filter: (id = 10000)
-> Index Only Scan using tbl_join_3_pkey on tbl_join_3 t3 (cost=0.28..2.89 rows=1 width=4)
Index Cond: (id = 10000)
-> Index Only Scan using tbl_join_4_pkey on tbl_join_4 t4 (cost=0.29..2.90 rows=1 width=4)
Index Cond: (id = 10000)
-> Index Scan using tbl_join_5_pkey on tbl_join_5 t5 (cost=0.29..2.91 rows=1 width=37)
Index Cond: (id = 10000)
-> Index Only Scan using tbl_join_6_pkey on tbl_join_6 t6 (cost=0.42..3.04 rows=1 width=4)
Index Cond: (id = 10000)
-> Index Only Scan using tbl_join_7_pkey on tbl_join_7 t7 (cost=0.43..3.04 rows=1 width=4)
Index Cond: (id = 10000)
-> Index Only Scan using tbl_join_8_pkey on tbl_join_8 t8 (cost=0.43..3.05 rows=1 width=4)
Index Cond: (id = 10000)
-> Index Only Scan using tbl_join_9_pkey on tbl_join_9 t9 (cost=0.43..3.05 rows=1 width=4)
Index Cond: (id = 10000)
(26 rows)
Time: 0.829 ms
设置join_collapse_limit=1后, 按照SQL写法进行关联. 执行计划的时间也缩短了.
6. 外连接
左外连接
左边的表,即使没有匹配,也输出
postgres=# create table tab1(id int, info text, crt_time timestamp);
CREATE TABLE
Time: 26.056 ms
postgres=# create table tab2(id int, info text, crt_time timestamp);
CREATE TABLE
Time: 36.215 ms
postgres=# insert into tab1 values (1,'test',now());
INSERT 0 1
Time: 0.520 ms
postgres=# insert into tab1 values (2,'test',now());
INSERT 0 1
Time: 0.297 ms
postgres=# insert into tab2 values (2,'test',now());
INSERT 0 1
Time: 11.325 ms
postgres=# insert into tab2 values (3,'test',now());
INSERT 0 1
Time: 0.352 ms
postgres=# select * from tab1;
id | info | crt_time
----+------+----------------------------
1 | test | 2017-04-11 17:48:29.37083
2 | test | 2017-04-11 17:48:32.742795
(2 rows)
Time: 0.506 ms
postgres=# select * from tab2;
id | info | crt_time
----+------+----------------------------
2 | test | 2017-04-11 17:48:39.722821
3 | test | 2017-04-11 17:48:41.901834
(2 rows)
Time: 0.335 ms
右表没有被匹配时,输出NULL
postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id);
id | info | crt_time | id | info | crt_time
----+------+----------------------------+----+------+----------------------------
1 | test | 2017-04-11 17:48:29.37083 | | |
2 | test | 2017-04-11 17:48:32.742795 | 2 | test | 2017-04-11 17:48:39.722821
(2 rows)
过滤在A表但是不在B表的记录。
postgres=# select tab1.* from tab1 left join tab2 on (tab1.id=tab2.id) where tab2.* is null;
id | info | crt_time
----+------+---------------------------
1 | test | 2017-04-11 17:48:29.37083
(1 row)
postgres=# select * from tab1 where id in (select id from tab1 except select id from tab2);
id | info | crt_time
----+------+---------------------------
1 | test | 2017-04-11 17:48:29.37083
(1 row)
通过外部where filter,可以过滤连接=false的记录
postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id) where tab1.id=1;
id | info | crt_time | id | info | crt_time
----+------+---------------------------+----+------+----------
1 | test | 2017-04-11 17:48:29.37083 | | |
(1 row)
postgres=# explain select * from tab1 left join tab2 on (tab1.id=tab2.id) where tab1.id=1;
QUERY PLAN
------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..48.81 rows=36 width=88)
Join Filter: (tab1.id = tab2.id)
-> Seq Scan on tab1 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 1)
-> Materialize (cost=0.00..24.16 rows=6 width=44)
-> Seq Scan on tab2 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 1)
(7 rows)
join内的条件,不会过滤未连接的记录
postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id and tab1.id=1);
id | info | crt_time | id | info | crt_time
----+------+----------------------------+----+------+----------
1 | test | 2017-04-11 17:48:29.37083 | | |
2 | test | 2017-04-11 17:48:32.742795 | | |
(2 rows)
postgres=# explain select * from tab1 left join tab2 on (tab1.id=tab2.id and tab1.id=1);
QUERY PLAN
---------------------------------------------------------------------
Hash Left Join (cost=35.42..298.96 rows=1130 width=88)
Hash Cond: (tab1.id = tab2.id)
Join Filter: (tab1.id = 1)
-> Seq Scan on tab1 (cost=0.00..21.30 rows=1130 width=44)
-> Hash (cost=21.30..21.30 rows=1130 width=44)
-> Seq Scan on tab2 (cost=0.00..21.30 rows=1130 width=44)
(6 rows)
右外连接
左表没有被匹配时,输出NULL
postgres=# select * from tab1 right join tab2 on (tab1.id=tab2.id);
id | info | crt_time | id | info | crt_time
----+------+----------------------------+----+------+----------------------------
2 | test | 2017-04-11 17:48:32.742795 | 2 | test | 2017-04-11 17:48:39.722821
| | | 3 | test | 2017-04-11 17:48:41.901834
(2 rows)
全外连接
没有被匹配时,输出NULL
postgres=# select * from tab1 full join tab2 on (tab1.id=tab2.id);
id | info | crt_time | id | info | crt_time
----+------+----------------------------+----+------+----------------------------
1 | test | 2017-04-11 17:48:29.37083 | | |
2 | test | 2017-04-11 17:48:32.742795 | 2 | test | 2017-04-11 17:48:39.722821
| | | 3 | test | 2017-04-11 17:48:41.901834
(3 rows)
7. 窗口查询
https://www.postgresql.org/docs/9.6/static/functions-window.html
常用窗口函数
Function | Return Type | Description |
---|---|---|
row_number() | bigint | number of the current row within its partition, counting from 1 |
rank() | bigint | rank of the current row with gaps; same as row_number of its first peer |
dense_rank() | bigint | rank of the current row without gaps; this function counts peer groups |
percent_rank() | double precision | relativ 最后更新:2017-04-12 23:25:18 上一篇: |