Hbase 之 Shell 初级(二)
一. 前言
之前写过的一篇【Hbase 之 Shell 初级(一)】示例介绍了 Hbase General Group 和 DDL Group 下相关 Shell 操作,本文继续介绍一下 Namespace 及 DML 相关 Shell 操作。
二. Shell Api 介绍
1. Group Of Namespace
命名空间是表的逻辑分组,类似于关系型数据库中的相关表分组。命名空间由以下几部分组成:
- 表
- RegionServer 分组
- 权限
- 限额
1.1 create_namespace
创建命名空间。语法如下:
hbase> create_namespace 'ns1'
hbase> create_namespace 'ns1', {'PROERTY_NAME'=>'PROPERTY_VALUE'}
演示如下:
hbase(main):011:0> create_namespace 'ns_test'
0 row(s) in 0.0260 seconds
hdfs 目录下多出一个子目录,如下:
$ hdfs dfs -ls /apps/hbase/data/data/
Found 4 items
drwxr-xr-x - hbase hdfs 0 2018-05-21 10:41 /apps/hbase/data/data/default
drwxr-xr-x - hbase hdfs 0 2018-05-21 10:39 /apps/hbase/data/data/hbase
drwx------ - hbase hdfs 0 2018-05-28 22:09 /apps/hbase/data/data/ns_test
1.2 describe_namespace
显示命名空间的描述。语法如下:
hbase> describe_namespace 'ns1'
演示如下:
hbase(main):013:0> describe_namespace 'ns_test'
DESCRIPTION
{NAME => 'ns_test'}
1 row(s) in 0.0050 seconds
1.3 alter_namespace
修改已存在的命名空间,语法如下:
To add/modify a property:
hbase> alter_namespace 'ns1', {METHOD => 'set', 'PROERTY_NAME' => 'PROPERTY_VALUE'}
To delete a property:
hbase> alter_namespace 'ns1', {METHOD => 'unset', NAME=>'PROERTY_NAME'}
演示如下:
hbase(main):015:0> alter_namespace 'ns_test', {METHOD => 'set', 'PROERTY_NAME' => 'PROPERTY_VALUE'}
0 row(s) in 0.0430 seconds
hbase(main):016:0> describe_namespace 'ns_test'
DESCRIPTION
{NAME => 'ns_test', PROERTY_NAME => 'PROPERTY_VALUE'}
1 row(s) in 0.0100 seconds
hbase(main):018:0> alter_namespace 'ns_test', {METHOD => 'unset', NAME=>'PROERTY_NAME'}
0 row(s) in 0.0130 seconds
hbase(main):019:0>
hbase(main):020:0* describe_namespace 'ns_test'
DESCRIPTION
{NAME => 'ns_test'}
1 row(s) in 0.0090 seconds
1.4 drop_namespace
删除空的命名空间,注意:要删除命名空间必须先删除其下的所有表。
演示如下:
- 删除非空命名空间,会报错
hbase(main):027:0> drop_namespace 'ns_test'
ERROR: org.apache.hadoop.hbase.constraint.ConstraintException: Only empty namespaces can be removed. Namespace ns_test has 1 tables
at org.apache.hadoop.hbase.master.TableNamespaceManager.remove(TableNamespaceManager.java:211)
at org.apache.hadoop.hbase.master.HMaster.deleteNamespace(HMaster.java:2987)
at org.apache.hadoop.hbase.master.MasterRpcServices.deleteNamespace(MasterRpcServices.java:521)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:60010)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Here is some help for this command:
Drop the named namespace. The namespace must be empty.
- 删除表再删除命名空间
hbase(main):030:0> disable 'ns_test:t_test'
0 row(s) in 2.2920 seconds
hbase(main):031:0> drop 'ns_test:t_test'
0 row(s) in 1.2620 seconds
hbase(main):032:0> drop_namespace 'ns_test'
0 row(s) in 0.0390 seconds
1.5 list_namespace
列出所有的命名空间。
演示如下:
hbase(main):023:0> list_namespace
NAMESPACE
default
hbase
ns_test
3 row(s) in 0.0420 seconds
1.6 list_namespace_tables
列出指定命名空间下的所有表。语法如下:
hbase> list_namespace_tables 'ns1'
演示如下:
hbase(main):025:0> create 'ns_test:t_test','f'
0 row(s) in 1.4900 seconds
=> Hbase::Table - ns_test:t_test
hbase(main):026:0> list_namespace_tables 'ns_test'
TABLE
t_test
1 row(s) in 0.0060 seconds
2. Group Of DML
下面我们了解一下Hbase最常用到的增删改查操作命令。
2.1 append
向指定的单元格追加值操作。语法如下:
Appends a cell 'value' at specified table/row/column coordinates.
hbase> append 't1', 'r1', 'c1', 'value', ATTRIBUTES=>{'mykey'=>'myvalue'}
hbase> append 't1', 'r1', 'c1', 'value', {VISIBILITY=>'PRIVATE|SECRET'}
The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:
hbase> t.append 'r1', 'c1', 'value', ATTRIBUTES=>{'mykey'=>'myvalue'}
hbase> t.append 'r1', 'c1', 'value', {VISIBILITY=>'PRIVATE|SECRET'}
演示如下:
hbase(main):004:0> put 't_test','r1','f1:c1','abc'
0 row(s) in 0.0900 seconds
hbase(main):005:0>
hbase(main):006:0* get 't_test','r1'
COLUMN CELL
f1:c1 timestamp=1528019315459, value=abc
1 row(s) in 0.0340 seconds
hbase(main):007:0> append 't_test','r1','f1:c1','_def'
0 row(s) in 0.0470 seconds
hbase(main):008:0> get 't_test','r1'
COLUMN CELL
f1:c1 timestamp=1528019378977, value=abc_def
1 row(s) in 0.0130 seconds
2.2 count
统计表的行数,默认情况下,每统计1000行进行一次展示,扫描 Cahe 默认打开,且默认值为10,如果你的表的行大小比较小,可以适当增加此值。语法如下:
hbase> count 'ns1:t1'
hbase> count 't1'
hbase> count 't1', INTERVAL => 100000
hbase> count 't1', CACHE => 1000
hbase> count 't1', INTERVAL => 10, CACHE => 1000
演示如下:
hbase(main):009:0> count 't_test'
2 row(s) in 0.0440 seconds
=> 2
hbase(main):010:0> count 't_test',CACHE => 2
2 row(s) in 0.0210 seconds
=> 2
hbase(main):011:0> count 't_test',CACHE => 2,INTERVAL => 1
Current count: 1, row: 1news_news_200
Current count: 2, row: r1
2 row(s) in 0.0190 seconds
=> 2
hbase(main):012:0> count 't_test',CACHE => 2,INTERVAL => 2
Current count: 2, row: r1
2 row(s) in 0.0100 seconds
=> 2
但通过此命令进行行数统计可能需要较长时间,建议通过 'hbase org.apache.hadoop.hbase.mapreduce.RowCounter tableName' 命令进行统计。
2.3 delete
删除指定单元格的数据。语法如下:
hbase> delete 'ns1:t1', 'r1', 'c1', ts1
hbase> delete 't1', 'r1', 'c1', ts1
hbase> delete 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
演示如下:
hbase(main):025:0> get 't_test','r1',{COLUMN =>'f1:c2',VERSIONS => 2}
COLUMN CELL
f1:c2 timestamp=1528021313768, value=456
f1:c2 timestamp=1528021113843, value=123
2 row(s) in 0.0130 seconds
hbase(main):026:0> delete 't_test','r1','f1:c2'
0 row(s) in 0.0140 seconds
hbase(main):027:0> get 't_test','r1',{COLUMN =>'f1:c2',VERSIONS => 2}
COLUMN CELL
0 row(s) in 0.0100 seconds
可见,该操作会将所有的版本数据全部删除。
还可以为此删除操作增加一个时间戳标记,如下:
hbase(main):033:0> delete 't_test','r1','f1:c2',1528021809100
0 row(s) in 0.0120 seconds
2.4 deleteall
可删除一行多列数据,同样可选一个时间戳作为标记。语法如下:
hbase> deleteall 'ns1:t1', 'r1'
hbase> deleteall 't1', 'r1'
hbase> deleteall 't1', 'r1', 'c1'
hbase> deleteall 't1', 'r1', 'c1', ts1
hbase> deleteall 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
演示如下:
hbase(main):040:0> get 't_test','r1'
COLUMN CELL
f1:c1 timestamp=1528021101363, value=abc_def
f1:c2 timestamp=1528022493254, value=456
f1:c3 timestamp=1528022504415, value=111
3 row(s) in 0.0130 seconds
hbase(main):041:0> deleteall 't_test','r1'
0 row(s) in 0.0120 seconds
hbase(main):042:0> get 't_test','r1'
COLUMN CELL
0 row(s) in 0.0040 seconds
2.5 get
获取表整行数据或是一行多列数据或是指定单元格的数据或是某单元格指定版本的数据。语法如下:
# 拿到一整行的数据
hbase> get 'ns1:t1', 'r1'
hbase> get 't1', 'r1'
# 指定时间戳范围获取数据,前闭后开
hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
# 获取指定单列数据
hbase> get 't1', 'r1', 'c1'
hbase> get 't1', 'r1', {COLUMN => 'c1'}
# 获取指定多列数据
hbase> get 't1', 'r1', ['c1', 'c2']
hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
# 获取指定单元格指定版本数据
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
# 指定单元格指定时间范围及版本数
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
# 指定列值过滤器删选数据
hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
演示如下:
hbase(main):004:0> get 't_test','r1'
COLUMN CELL
f1:c1 timestamp=1528023147256, value=123
f1:c2 timestamp=1528022954485, value=456
f1:c3 timestamp=1528023156719, value=100
3 row(s) in 0.0330 seconds
hbase(main):005:0> get 't_test','r1',{TIMERANGE => [1528022954485,1528023147256]}
COLUMN CELL
f1:c2 timestamp=1528022954485, value=456
1 row(s) in 0.0100 seconds
hbase(main):009:0> get 't_test','r1',{FILTER => "ValueFilter(=, 'binary:123')"}
COLUMN CELL
f1:c1 timestamp=1528023147256, value=123
1 row(s) in 0.0070 seconds
hbase(main):011:0> get 't_test','r1',['f1:c1','f1:c2']
COLUMN CELL
f1:c1 timestamp=1528023147256, value=123
f1:c2 timestamp=1528022954485, value=456
2 row(s) in 0.0180 seconds
hbase(main):012:0> get 't_test','r1','f1:c1','f1:c2'
COLUMN CELL
f1:c1 timestamp=1528023147256, value=123
f1:c2 timestamp=1528022954485, value=456
2 row(s) in 0.0080 seconds
hbase(main):013:0> get 't_test','r1',COLUMNS=>['f1:c1','f1:c2']
COLUMN CELL
f1:c1 timestamp=1528023147256, value=123
f1:c2 timestamp=1528022954485, value=456
2 row(s) in 0.0080 seconds
hbase(main):020:0> get 't_test','r1',{COLUMN => 'f1:c1', TIMESTAMP =>1528023147256}
COLUMN CELL
f1:c1 timestamp=1528023147256, value=123
1 row(s) in 0.0050 seconds
对于在hbase中存储的中文内容,可以这样查看:
hbase(main):010:0> get 't_test','r1','f1:c4'
COLUMN CELL
f1:c4 timestamp=1528024997298, value=\xE4\xB8\xAD\xE6\x96\x87
1 row(s) in 0.0420 seconds
# 转中文方式如下:
hbase(main):011:0> get 't_test','r1','f1:c4:toString'
COLUMN CELL
f1:c4 timestamp=1528024997298, value=中文
1 row(s) in 0.0130 seconds
hbase(main):012:0> get 't_test','r1','f1:c4:c(org.apache.hadoop.hbase.util.Bytes).toString'
COLUMN CELL
f1:c4 timestamp=1528024997298, value=中文
1 row(s) in 0.0060 seconds
2.6 get_counter
返回计数器的值。语法如下:
hbase> get_counter 'ns1:t1', 'r1', 'c1'
hbase> get_counter 't1', 'r1', 'c1'
演示如下:
hbase(main):018:0> incr 't_test','r1','f1:incr',1
COUNTER VALUE = 1
0 row(s) in 0.0380 seconds
hbase(main):019:0> get_counter 't_test','r1','f1:incr'
COUNTER VALUE = 1
hbase(main):020:0> incr 't_test','r1','f1:incr',1
COUNTER VALUE = 2
0 row(s) in 0.0130 seconds
hbase(main):021:0> get_counter 't_test','r1','f1:incr'
COUNTER VALUE = 2
2.7 get_splits
查看表的预分区规则及分区数,语法如下:
hbase> get_splits 't1'
hbase> get_splits 'ns1:t1'
演示如下:
hbase(main):028:0> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
0 row(s) in 1.2420 seconds
hbase(main):029:0> get_splits 't1'
Total number of splits = 5
=> ["10", "20", "30", "40"]
2.8 incr
增加计数器的值。语法如下:
hbase> incr 'ns1:t1', 'r1', 'c1'
hbase> incr 't1', 'r1', 'c1'
hbase> incr 't1', 'r1', 'c1', 1
hbase> incr 't1', 'r1', 'c1', 10
演示如下:
hbase(main):034:0> incr 't_test','r1','f1:incr',2
COUNTER VALUE = 4
0 row(s) in 0.0160 seconds
hbase(main):035:0> get_counter 't_test','r1','f1:incr'
COUNTER VALUE = 4
2.9 put
向表中插入新值。语法如下:
hbase> put 'ns1:t1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value'
演示如下:
hbase(main):037:0> put 't_test','r1','f1:c5','put'
0 row(s) in 0.0100 seconds
hbase(main):038:0> get 't_test','r1','f1:c5'
COLUMN CELL
f1:c5 timestamp=1528026610237, value=put
1 row(s) in 0.0050 seconds
2.10 scan
扫描表中数据,可指定扫描范围及扫描列,常与过滤器一同使用。语法如下:
# 全扫描
hbase> scan 't1'
# 指定扫描列、返回行数及起始行。
hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
# 指定扫描时间戳范围
hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
# 使用行过滤器进行过滤
hbase> scan 't1', {ROWPREFIXFILTER => 'row2'}
演示如下:
hbase(main):040:0> scan 't_test'
ROW COLUMN+CELL
1news_news_200 column=f1:v, timestamp=1527733074596, value=a,b,c
r1 column=f1:c1, timestamp=1528023147256, value=123
r1 column=f1:c2, timestamp=1528022954485, value=456
r1 column=f1:c3, timestamp=1528023156719, value=100
r1 column=f1:c4, timestamp=1528024997298, value=\xE4\xB8\xAD\xE6\x96\x87
r1 column=f1:c5, timestamp=1528026610237, value=put
r1 column=f1:incr, timestamp=1528026485801, value=\x00\x00\x00\x00\x00\x00\x00\x04
2 row(s) in 0.0220 seconds
hbase(main):041:0> scan 't_test',{COLUMNS => ['f1:c1', 'f1:c2']}
ROW COLUMN+CELL
r1 column=f1:c1, timestamp=1528023147256, value=123
r1 column=f1:c2, timestamp=1528022954485, value=456
1 row(s) in 0.0130 seconds
hbase(main):042:0> scan 't_test',LIMIT=>1
ROW COLUMN+CELL
1news_news_200 column=f1:v, timestamp=1527733074596, value=a,b,c
1 row(s) in 0.0080 seconds
hbase(main):043:0> scan 't_test',{STARTROW => 'r1'}
ROW COLUMN+CELL
r1 column=f1:c1, timestamp=1528023147256, value=123
r1 column=f1:c2, timestamp=1528022954485, value=456
r1 column=f1:c3, timestamp=1528023156719, value=100
r1 column=f1:c4, timestamp=1528024997298, value=\xE4\xB8\xAD\xE6\x96\x87
r1 column=f1:c5, timestamp=1528026610237, value=put
r1 column=f1:incr, timestamp=1528026485801, value=\x00\x00\x00\x00\x00\x00\x00\x04
1 row(s) in 0.0140 seconds
hbase(main):044:0> scan 't_test',{STARTROW => 'r1',COLUMNS => ['f1:c1', 'f1:c2']}
ROW COLUMN+CELL
r1 column=f1:c1, timestamp=1528023147256, value=123
r1 column=f1:c2, timestamp=1528022954485, value=456
1 row(s) in 0.0120 seconds
hbase(main):049:0> scan 't_test',{ROWPREFIXFILTER => '1news'}
ROW COLUMN+CELL
1news_news_200 column=f1:v, timestamp=1527733074596, value=a,b,c
1 row(s) in 0.0070 seconds
2.11 truncate
清空表,且删除预分区,只保留表结构。语法如下:
hbase> truncate 't1'
演示如下:
hbase(main):069:0> scan 't1'
ROW COLUMN+CELL
r1 column=f1:c1, timestamp=1528027849019, value=10
1 row(s) in 0.0260 seconds
hbase(main):070:0> get_splits 't1'
Total number of splits = 5
=> ["10", "20", "30", "40"]
hbase(main):071:0> truncate 't1'
Truncating 't1' table (it may take a while):
- Disabling table...
- Truncating table...
0 row(s) in 3.3320 seconds
hbase(main):072:0> scan 't1'
ROW COLUMN+CELL
0 row(s) in 0.1140 seconds
hbase(main):073:0> get_splits 't1'
Total number of splits = 1
=> []
2.12 truncate_preserve
清空表,但保留预分区。语法如下:
hbase> truncate_preserve 't1'
演示如下:
hbase(main):063:0> scan 't1'
ROW COLUMN+CELL
r1 column=f1:c1, timestamp=1528027790054, value=10
1 row(s) in 0.0540 seconds
hbase(main):064:0> get_splits 't1'
Total number of splits = 5
=> ["10", "20", "30", "40"]
hbase(main):065:0> truncate_preserve 't1'
Truncating 't1' table (it may take a while):
- Disabling table...
- Truncating table...
0 row(s) in 3.3450 seconds
hbase(main):066:0> scan 't1'
ROW COLUMN+CELL
0 row(s) in 0.5620 seconds
hbase(main):067:0> get_splits 't1'
Total number of splits = 5
=> ["10", "20", "30", "40"]