常用SQL片段

2021-11-15  本文已影响0人  AI贺贺

数据分析这项技能,不管是什么岗位,有总是加分项,一切都是为了数据,能从数据中发现机会那就更有优势了。

以下记录一些有用的SQL片段:

创建临时表

从子查询 => 创建临时表查询

当子查询太多的时候,使用这种方式可以让SQL更清晰

SELECT name
       , salary
FROM People
WHERE name in (SELECT DISTINCT name 
               FROM population 
               WHERE country = "Canada"
                     AND city = "Toronto")
      AND salary >= (SELECT AVG(salary)
                     FROM salaries
                     WHERE gender = "Female")
                     

-- 临时表查询
with toronto_ppl as (
   SELECT DISTINCT name
   FROM population
   WHERE country = "Canada"
         AND city = "Toronto"
)
, avg_female_salary as (
   SELECT AVG(salary) as avgSalary
   FROM salaries
   WHERE gender = "Female"
)
SELECT name
       , salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
      AND salary >= (SELECT avgSalary FROM avg_female_salary)

临时函数

同写代码一样,一些重复的输入输出,提取为一个临时函数,让代码变得更清晰。

SELECT name
       , CASE WHEN tenure < 1 THEN "analyst"
              WHEN tenure BETWEEN 1 and 3 THEN "associate"
              WHEN tenure BETWEEN 3 and 5 THEN "senior"
              WHEN tenure > 5 THEN "vp"
              ELSE "n/a"
         END AS seniority 
FROM employees

-- 临时函数
CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (
   CASE WHEN tenure < 1 THEN "analyst"
        WHEN tenure BETWEEN 1 and 3 THEN "associate"
        WHEN tenure BETWEEN 3 and 5 THEN "senior"
        WHEN tenure > 5 THEN "vp"
        ELSE "n/a"
   END
);
SELECT name
       , get_seniority(tenure) as seniority
FROM employees

增加列,制作宽表

根据条件增加case when
SELECT name
       ,case when age < 18 then '未成年' end as desc1
             ,case when city in  ('伤害') then '一线城市' end as desc2
             ,..
FROM employees

排序 row_number rank dense_rank

SELECT Name
       , GPA
       , ROW_NUMBER() OVER (ORDER BY GPA desc)
       , RANK() OVER (ORDER BY GPA desc)
       , DENSE_RANK() OVER (ORDER BY GPA desc)
FROM student_grades
image-20211115081901264

计算增量

在需要比较不同周期的值时需要

# Comparing each month's sales to last month
SELECT month
       , sales
       , sales - LAG(sales, 1) OVER (ORDER BY month)
FROM monthly_sales

# Comparing each month's sales to the same month last year
SELECT month
       , sales
       , sales - LAG(sales, 12) OVER (ORDER BY month)
FROM monthly_sales

计算累加曲线

SELECT Month
       , Revenue
       , SUM(Revenue) OVER (ORDER BY Month) AS Cumulative
FROM monthly_revenue
image-20211115081750800

https://towardsdatascience.com/ten-advanced-sql-concepts-you-should-know-for-data-science-interviews-4d7015ec74b0

上一篇 下一篇

猜你喜欢

热点阅读