Hive从入门到精通6:Hive中的查询操作
本节使用的测试数据:
hive (default)> select * from dept_in;
dept_in.deptno dept_in.deptname dept_in.address
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
hive (default)> select * from emp_in;
emp_in.empno emp_in.ename emp_in.job emp_in.mgr emp_in.hiredate emp_in.sal emp_in.comm emp_in.deptno
7369 SMITH CLERK 7902 1980/12/17 800 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600 300 30
7521 WARD SALESMAN 7698 1981/2/22 1250 500 30
7566 JONES MANAGER 7839 1981/4/2 2975 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250 1400 30
7698 BLAKE MANAGER 7839 1981/5/1 2850 NULL 30
7782 CLARK MANAGER 7839 1981/6/9 2450 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000 NULL 20
7839 KING PRESIDENT NULL 1981/11/17 5000 NULL 10
7844 TURNER SALESMAN 7698 1981/9/8 1500 0 30
7876 ADAMS CLERK 7788 1987/5/23 1100 NULL 20
7900 JAMES CLERK 7698 1981/12/3 950 NULL 30
7902 FORD ANALYST 7566 1981/12/3 3000 NULL 20
7934 MILLER CLERK 7782 1982/1/23 1300 NULL 10
1.简单查询
hive (default)> select ename,job from emp_in;
ename job
SMITH CLERK
ALLEN SALESMAN
WARD SALESMAN
JONES MANAGER
MARTIN SALESMAN
BLAKE MANAGER
CLARK MANAGER
SCOTT ANALYST
KING PRESIDENT
TURNER SALESMAN
ADAMS CLERK
JAMES CLERK
FORD ANALYST
MILLER CLERK</pre>
2.过滤
hive (default)> select ename,sal from emp_in where sal > 2000;
ename sal
JONES 2975
BLAKE 2850
CLARK 2450
SCOTT 3000
KING 5000
FORD 3000
hive (default)> select ename,deptno from emp_in where deptno != 20;
ename deptno
ALLEN 30
WARD 30
MARTIN 30
BLAKE 30
CLARK 10
KING 10
TURNER 30
JAMES 30
MILLER 10</pre>
3.排序
hive (default)> select ename,sal from emp_in order by sal;
ename sal
SMITH 800
JAMES 950
ADAMS 1100
WARD 1250
MARTIN 1250
MILLER 1300
TURNER 1500
ALLEN 1600
CLARK 2450
BLAKE 2850
JONES 2975
SCOTT 3000
FORD 3000
KING 5000
hive (default)> select ename,sal from emp_in sort by sal desc;
ename sal
KING 5000
FORD 3000
SCOTT 3000
JONES 2975
BLAKE 2850
CLARK 2450
ALLEN 1600
TURNER 1500
MILLER 1300
MARTIN 1250
WARD 1250
ADAMS 1100
JAMES 950
SMITH 800</pre>
4.多表查询
4.1隐式JOIN操作
Hive-0.13开始支持隐式join操作,from后面可以跟多个表,以逗号分隔:
hive (default)> select e.ename,d.deptname,d.address from dept_in d,emp_in e where d.deptno = e.deptno;
e.ename d.deptname d.address
SMITH RESEARCH DALLAS
ALLEN SALES CHICAGO
WARD SALES CHICAGO
JONES RESEARCH DALLAS
MARTIN SALES CHICAGO
BLAKE SALES CHICAGO
CLARK ACCOUNTING NEW YORK
SCOTT RESEARCH DALLAS
KING ACCOUNTING NEW YORK
TURNER SALES CHICAGO
ADAMS RESEARCH DALLAS
JAMES SALES CHICAGO
FORD RESEARCH DALLAS
MILLER ACCOUNTING NEW YORK</pre>
4.2显式JOIN操作
hive (default)> select e.ename,d.deptname,d.address from dept_in d join emp_in e on d.deptno = e.deptno;
e.ename d.deptname d.address
SMITH RESEARCH DALLAS
ALLEN SALES CHICAGO
WARD SALES CHICAGO
JONES RESEARCH DALLAS
MARTIN SALES CHICAGO
BLAKE SALES CHICAGO
CLARK ACCOUNTING NEW YORK
SCOTT RESEARCH DALLAS
KING ACCOUNTING NEW YORK
TURNER SALES CHICAGO
ADAMS RESEARCH DALLAS
JAMES SALES CHICAGO
FORD RESEARCH DALLAS
MILLER ACCOUNTING NEW YORK
更多详细内容请参考Hive帮助文档-Joins。
5.子查询
5.1From子句中的子查询
例1:简单子查询
hive (default)> select tmp.ename from (select ename,deptno from emp_in ) tmp where tmp.deptno = 10;
tmp.ename
CLARK
KING
MILLER
例2:包含Union All的子查询
hive (default)> select tmp.ename, tmp.deptno from (
select ename, deptno from emp_in where deptno = 10
union all
select ename, deptno from emp_in where deptno = 30
) tmp order by tmp.ename;
tmp.ename tmp.deptno
ALLEN 30
BLAKE 30
CLARK 10
JAMES 30
KING 10
MARTIN 30
MILLER 10
TURNER 30
WARD 30
5.2 where子句中的子查询
例1:使用IN或NOT IN约束
hive (default)> select ename,deptno from emp_in where deptno in (select deptno from dept_in where deptno != 30);
ename deptno
SMITH 20
JONES 20
CLARK 10
SCOTT 20
KING 10
ADAMS 20
FORD 20
MILLER 10
例2:使用EXISTS或NOT EXISTS约束
hive (default)> select e.ename,e.job,e.deptno from emp_in e where exists (select d.deptname from dept_in d where e.deptno = d.deptno );
e.ename e.job e.deptno
SMITH CLERK 20
ALLEN SALESMAN 30
WARD SALESMAN 30
JONES MANAGER 20
MARTIN SALESMAN 30
BLAKE MANAGER 30
CLARK MANAGER 10
SCOTT ANALYST 20
KING PRESIDENT 10
TURNER SALESMAN 30
ADAMS CLERK 20
JAMES CLERK 30
FORD ANALYST 20
MILLER CLERK 10</pre>
关于where子句中的子查询,有下面几点需要注意:
- 这些子查询只能用于表达式的右边
- IN或NOT IN子查询可以只选择单个列
- EXISTS或NOT EXISTS必须有一个或多个相关约束
- where子句的子查询不会引用父查询的结果
- where子句的子查询的where子句中可以引用父查询的结果
更多详细内容请参考Hive帮助文档-SubQueries。