直观理解:MySQL常用窗口函数

2022-05-17  本文已影响0人  老羊_肖恩

简介

  窗口函数(window functions),也叫分析函数和OLAP函数,MySQL在8.0之后开始支持窗口函数。窗口函数可以用来对数据进行实时分析处理,和group by有类似之处,其区别在于窗口会对每个分组之后的数据按行进行分别操作,而group by一般对分组之后的函数使用聚合函数汇总,做不到对不同的group中的行数据进行分别操作。这就简单介绍几种常见的MySQL窗口函数。下表中列出了几种常见的窗口函数,并对其基本功能进行了描述。接下来我们会以一段示例,来展示MySQL中窗口函数的用途和效果。

序号   函数        描述
1 ROW_NUMBER() 当前行在其分区中的行号
2 RANK() 当前行在其分区内的序号,有间隙,不连续
3 DENSE_RANK() 当前行在其分区内的序号,无间隙,连续
4 NTILE() 当前行在其分区内的分段编号
5 SUM() 其分区内当前行号范围内的指定列的累计和
6 AVG() 其分区内当前行号范围内的指定列的均值
7 MAX() 其分区内当前行号范围内的指定列的最大值
8 MIN() 其分区内当前行号范围内的指定列的最小值
9 COUNT() 其分区内当前行号范围内的指定列的累计数量
10 CUME_DIST() 其分区内指定行号范围内的指定列的累积分布值
11 PERCENT_RANK() 其分区内指定行号范围内的指定列的百分比秩值
12 LAG() 其分区内当前行之前指定行的指定列的值
13 LEAD() 其分区内当前行之后指定行的指定列的值
14 FIRST_VALUE() 其分区内指定列的第一行的值
15 LAST_VALUE() 其分区内指定列的最后一行的值
16 NTH_VALUE() 其分区内指定列的第N行的值

数据准备

  假设我们存在一张如下的员工工资表,接下来我们将以这张表对窗口函数的使用方法进行简单的演示。

DROP TABLE employee;
CREATE TABLE IF NOT EXISTS employee
(
    emp_no       varchar(32)   comment '员工id',
    emp_name     varchar(32)   comment '员工姓名',
    dept_name    varchar(32)   comment '部门名', 
    hire_date    date          comment '入职日期',
    salary       double        comment '薪资'
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into employee
values ('1001', '刘备', '蜀国', '2021-11-01', 1000);
insert into employee
values ('1002', '关羽', '蜀国', '2021-11-02', 5000);
insert into employee
values ('1003', '张飞', '蜀国', '2021-11-04', 7000);
insert into employee
values ('1004', '赵云', '蜀国', '2021-11-03', 7000);
insert into employee
values ('1006', '黄忠', '蜀国', '2021-11-06', 4000);
insert into employee
values ('1007', '孙权', '吴国', '2021-11-07', 4000);
insert into employee
values ('1008', '周瑜', '吴国', '2021-11-05', 8000);
insert into employee
values ('1009', '黄盖', '吴国', '2021-11-02', 6000);
insert into employee
values ('1010', '曹操', '魏国', '2021-11-01', 2000);
insert into employee
values ('1011', '许褚', '魏国', '2021-11-07', 3000);
insert into employee
values ('1012', '典韦', '魏国', '2021-11-05', 5000);
insert into employee
values ('1013', '张辽', '魏国', '2021-11-04', 6000);

语法及示例

  窗口函数的语法如下,所有的窗口函数均遵循以下语法:

function (expression) 
OVER (
  [ PARTITION BY expr_list ]
  [ ORDER BY order_list 
    [ frame_clause ] 
  ] 
)

其中frame_clause语法如下。

ROWS
{ UNBOUNDED PRECEDING | unsigned_value PRECEDING | CURRENT ROW } |
{BETWEEN
{ UNBOUNDED PRECEDING | unsigned_value { PRECEDING | FOLLOWING } |
CURRENT ROW}
AND
{ UNBOUNDED FOLLOWING | unsigned_value { PRECEDING | FOLLOWING } |
CURRENT ROW }}

接下来我们将展示一些场景的窗口函数的用法和效果。

1.ROW_NUMBER()、RANK()、DENSE_RANK()、NTILE()

示例:对所有员工按照薪资降序排序,并给出对应的row_number、rank和dense_rank的排名

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    ROW_NUMBER() over (ORDER BY salary DESC) as rowNumber,
    RANK() OVER(ORDER BY salary DESC) as rankNumber,
    DENSE_RANK() OVER(ORDER BY salary DESC) as denseNumber
FROM employee e;

示例:对每个部门的员工按照薪资降序排序,并给出对应的row_number、rank和dense_rank的排名

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    ROW_NUMBER() over (PARTITION BY dept_name ORDER BY salary DESC) as rowNumber,
    RANK() OVER(PARTITION BY dept_name ORDER BY salary DESC) as rankNumber,
    DENSE_RANK() OVER(PARTITION BY dept_name ORDER BY salary DESC) as denseNumber
FROM employee e;

示例:对所有员工按照薪资降序排序,并给出对应的row_number、rank和dense_rank的排名,最终结果按照员工号进行排序输出。

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    ROW_NUMBER() over (ORDER  BY salary desc) as rowNumber,
    RANK() OVER(ORDER BY salary desc) as rankNumber,
    DENSE_RANK() OVER(ORDER BY salary desc) as denseNumber
FROM employee e
ORDER  BY emp_no 

示例:找出每个部门工资最高的人。

-- 找出每个部门工资最高的人。
SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary  
FROM ( 
    SELECT 
        emp_no,
        emp_name,
        dept_name,
        salary,
        DENSE_RANK() OVER (PARTITION BY dept_name ORDER BY salary DESC) AS dr
    FROM employee e
) r
WHERE r.dr <= 1

示例:a.将所有员工按照工资递增的顺序分成4组。b.根据员工入职日期升序分成7组。

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    hire_date,
    NTILE (4) OVER(ORDER BY salary) AS 4_tile,
    NTILE (7) OVER(ORDER BY hire_date) AS 7_tile
FROM employee e;

2.聚合窗口函数


示例:获取每个部门,按工资从低到高得累计和。

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    SUM(salary) OVER(PARTITION BY dept_name ORDER BY salary) AS total
FROM employee e;

示例:获取每个部门得工资累计和。

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    SUM(salary) OVER(PARTITION BY dept_name) AS total
FROM employee e;

示例:获取整个公司的薪资按照薪资递增的累计和。

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    SUM(salary) OVER(ORDER BY salary) AS total
FROM employee e;

示例:获取整个公司的薪资按照薪资递增的前两行和后一行范围内的薪资和。

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    SUM(salary) OVER(ORDER BY salary 
    ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) AS total
FROM employee e;

示例:每个部门的平均工资。

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    AVG(salary) OVER(PARTITION BY dept_name) AS avg
FROM employee e;

示例:获取整个公司的薪资按照薪资递增的前两行和后一行范围内的薪资平均值。

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    AVG(salary) OVER(ORDER BY salary 
    ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) AS avg
FROM employee e;

3. CUME_DIST()、PERCENT_RANK()

示例:根据薪资排序,获取CUME_DIST()和PERCENT_RANK()

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    CUME_DIST() OVER(ORDER BY salary ) AS cd1,
    PERCENT_RANK() OVER(ORDER BY salary) AS cd2
FROM employee e;

4.LAG()、LEAD()

示例:a. 获取每个人入职前一行的数据,默认值为"2021-01-01";b. 获取每个人入职前两行的数据,不设置默认值;c. 获取每个人入职后一行的数据,默认值为"2022-01-01";d. 获取每个人入职后两行的数据,不设置默认值;

SELECT 
    emp_no,
    emp_name,
    dept_name,
    hire_date,
    LAG (hire_date, 1, '2021-01-01') OVER(ORDER BY hire_date) AS next_1_time,
    LAG (hire_date, 2) OVER(ORDER BY hire_date) AS next_2_time,
    LEAD (hire_date, 1, '2022-01-01') OVER(ORDER BY hire_date) AS last_1_time,
    LEAD (hire_date, 2) OVER(ORDER BY hire_date) AS last_2_time
FROM employee e;

5.FIRST_VALUE ()、LAST_VALUE() 、NTH_VALUE()

示例:a. 按照入职日期顺序排序,找出当前每个部门最先入职的人的薪资。b. 按照入职日期顺序排序,找出当前每个部门最后入职的人的薪资。

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    hire_date,
    FIRST_VALUE (salary) OVER(PARTITION BY dept_name ORDER BY hire_date) AS first,
    LAST_VALUE (salary) OVER(PARTITION BY dept_name ORDER BY hire_date) AS last
FROM employee e;

示例:a.获取截至当前工资第二高的人的工资。b.获取第二个入职的人的工资。

SELECT 
    emp_no,
    emp_name,
    dept_name,
    salary,
    hire_date,
    NTH_VALUE (salary, 2) OVER(ORDER BY salary) AS 2_th,
    NTH_VALUE (salary, 2) OVER(ORDER BY hire_date) AS 3_th
FROM employee e;

上一篇 下一篇

猜你喜欢

热点阅读