DatistEQ之开窗口函数

2020-11-24 本文已影响0人了无_数据科学

1、开窗口函数语法

数据专家支持开窗口函数，具体语法如下，可参考 sqlite.org。

window-defn：

frame-spec：

2、Window Functions

row_number()

The number of the row within the current partition. Rows are numbered starting from 1 in the order defined by the ORDER BY clause in the window definition, or in arbitrary order otherwise.

rank()

The row_number() of the first peer in each group - the rank of the current row with gaps. If there is no ORDER BY clause, then all rows are considered peers and this function always returns 1.

dense_rank()

The number of the current row's peer group within its partition - the rank of the current row without gaps. Partitions are numbered starting from 1 in the order defined by the ORDER BY clause in the window definition. If there is no ORDER BY clause, then all rows are considered peers and this function always returns 1.

percent_rank()

Despite the name, this function always returns a value between 0.0 and 1.0 equal to (rank - 1)/(partition-rows - 1), where rank is the value returned by built-in window function rank() and partition-rows is the total number of rows in the partition. If the partition contains only one row, this function returns 0.0.

cume_dist()

The cumulative distribution. Calculated as row-number/partition-rows, where row-number is the value returned by row_number() for the last peer in the group and partition-rows the number of rows in the partition.

ntile(N)

Argument N is handled as an integer. This function divides the partition into N groups as evenly as possible and assigns an integer between 1 and N to each group, in the order defined by the ORDER BY clause, or in arbitrary order otherwise. If necessary, larger groups occur first. This function returns the integer value assigned to the group that the current row is a part of.

lag(expr)
lag(expr, offset)
lag(expr, offset, default)

The first form of the lag() function returns the result of evaluating expression expr against the previous row in the partition. Or, if there is no previous row (because the current row is the first), NULL.

If the offset argument is provided, then it must be a non-negative integer. In this case the value returned is the result of evaluating expr against the row offset rows before the current row within the partition. If offset is 0, then expr is evaluated against the current row. If there is no row offset rows before the current row, NULL is returned.

If default is also provided, then it is returned instead of NULL if the row identified by offset does not exist.

lead(expr)
lead(expr, offset)
lead(expr, offset, default)

The first form of the lead() function returns the result of evaluating expression expr against the next row in the partition. Or, if there is no next row (because the current row is the last), NULL.

If the offset argument is provided, then it must be a non-negative integer. In this case the value returned is the result of evaluating expr against the row offset rows after the current row within the partition. If offset is 0, then expr is evaluated against the current row. If there is no row offset rows after the current row, NULL is returned.

If default is also provided, then it is returned instead of NULL if the row identified by offset does not exist.

first_value(expr)

This built-in window function calculates the window frame for each row in the same way as an aggregate window function. It returns the value of expr evaluated against the first row in the window frame for each row.

last_value(expr)

This built-in window function calculates the window frame for each row in the same way as an aggregate window function. It returns the value of expr evaluated against the last row in the window frame for each row.

nth_value(expr, N)

This built-in window function calculates the window frame for each row in the same way as an aggregate window function. It returns the value of expr evaluated against the row N of the window frame. Rows are numbered within the window frame starting from 1 in the order defined by the ORDER BY clause if one is present, or in arbitrary order otherwise. If there is no Nth row in the partition, then NULL is returned.

3、Aggregate Functions

avg(X)

The avg() function returns the average value of all non-NULL X within a group. String and BLOB values that do not look like numbers are interpreted as 0. The result of avg() is always a floating point value as long as at there is at least one non-NULL input even if all inputs are integers. The result of avg() is NULL if and only if there are no non-NULL inputs.

count(X)
count()*

The count(X) function returns a count of the number of times that X is not NULL in a group. The count(*) function (with no arguments) returns the total number of rows in the group.

group_concat(X)
group_concat(X,Y)

The group_concat() function returns a string which is the concatenation of all non-NULL values of X. If parameter Y is present then it is used as the separator between instances of X. A comma (",") is used as the separator if Y is omitted. The order of the concatenated elements is arbitrary.

max(X)

The max() aggregate function returns the maximum value of all values in the group. The maximum value is the value that would be returned last in an ORDER BY on the same column. Aggregate max() returns NULL if and only if there are no non-NULL values in the group.

min(X)

The min() aggregate function returns the minimum non-NULL value of all values in the group. The minimum value is the first non-NULL value that would appear in an ORDER BY of the column. Aggregate min() returns NULL if and only if there are no non-NULL values in the group.

sum(X)
total(X)

The sum() and total() aggregate functions return sum of all non-NULL values in the group. If there are no non-NULL input rows then sum() returns NULL but total() returns 0.0. NULL is not normally a helpful result for the sum of no rows but the SQL standard requires it and most other SQL database engines implement sum() that way so SQLite does it in the same way in order to be compatible. The non-standard total() function is provided as a convenient way to work around this design problem in the SQL language.

The result of total() is always a floating point value. The result of sum() is an integer value if all non-NULL inputs are integers. If any input to sum() is neither an integer or a NULL then sum() returns a floating point value which might be an approximation to the true sum.

Sum() will throw an "integer overflow" exception if all inputs are integers or NULL and an integer overflow occurs at any point during the computation. Total() never throws an integer overflow.

4、应用案例：

1、查询所有人员的信息，并查询所属城市的人员数以及同年龄的人员数：

CityNum列：
count(Name) over(partition by City)

AgeNum列：
count(Name) over(partition by Age)

image.png

2、查询所有人员的信息，并查询所属城市的人员数，每个城市的人按照年龄排序语句：

AgeId列：
row_number() over(partition by City order by Age)

image.png

3、查询所有人员的信息，并查询销售冠军、了了、上一名、后一名语句：

销售冠军列：
first_value(Name) over()

了了列：
last_value(Name) over()

排在前面列：
lag(Name) over()

排在前面2列：
lead(Name ,2) over()

image.png

4、查询所有人员的信息，并查询区域销售占比语句：

区域销售占比列：
Salary *1.0 / sum(Salary) over( partition by City )

image.png

DatistEQ之开窗口函数

1、开窗口函数语法

2、Window Functions

3、Aggregate Functions

4、应用案例：

猜你喜欢

热点阅读