周蓬勃HBASE 知识整理

Hbase 之 Shell 初级(二)

2018-06-03  本文已影响0人  步闲

一. 前言

之前写过的一篇【Hbase 之 Shell 初级(一)】示例介绍了 Hbase General Group 和 DDL Group 下相关 Shell 操作,本文继续介绍一下 Namespace 及 DML 相关 Shell 操作。

二. Shell Api 介绍

1. Group Of Namespace

命名空间是表的逻辑分组,类似于关系型数据库中的相关表分组。命名空间由以下几部分组成:

1.1 create_namespace

创建命名空间。语法如下:

  hbase> create_namespace 'ns1'
  hbase> create_namespace 'ns1', {'PROERTY_NAME'=>'PROPERTY_VALUE'}

演示如下:

hbase(main):011:0> create_namespace 'ns_test'
0 row(s) in 0.0260 seconds

hdfs 目录下多出一个子目录,如下:

$ hdfs dfs -ls /apps/hbase/data/data/
Found 4 items
drwxr-xr-x   - hbase hdfs          0 2018-05-21 10:41 /apps/hbase/data/data/default
drwxr-xr-x   - hbase hdfs          0 2018-05-21 10:39 /apps/hbase/data/data/hbase
drwx------   - hbase hdfs          0 2018-05-28 22:09 /apps/hbase/data/data/ns_test

1.2 describe_namespace

显示命名空间的描述。语法如下:

  hbase> describe_namespace 'ns1'

演示如下:

hbase(main):013:0> describe_namespace 'ns_test'
DESCRIPTION                                                                                  
{NAME => 'ns_test'}                                                                          
1 row(s) in 0.0050 seconds
1.3 alter_namespace

修改已存在的命名空间,语法如下:

To add/modify a property:

  hbase> alter_namespace 'ns1', {METHOD => 'set', 'PROERTY_NAME' => 'PROPERTY_VALUE'}

To delete a property:

  hbase> alter_namespace 'ns1', {METHOD => 'unset', NAME=>'PROERTY_NAME'}

演示如下:

hbase(main):015:0> alter_namespace 'ns_test', {METHOD => 'set', 'PROERTY_NAME' => 'PROPERTY_VALUE'}
0 row(s) in 0.0430 seconds

hbase(main):016:0> describe_namespace 'ns_test'
DESCRIPTION                                                                                  
{NAME => 'ns_test', PROERTY_NAME => 'PROPERTY_VALUE'}                                        
1 row(s) in 0.0100 seconds

hbase(main):018:0> alter_namespace 'ns_test', {METHOD => 'unset', NAME=>'PROERTY_NAME'}
0 row(s) in 0.0130 seconds

hbase(main):019:0> 
hbase(main):020:0* describe_namespace 'ns_test'
DESCRIPTION                                                                                  
{NAME => 'ns_test'}                                                                          
1 row(s) in 0.0090 seconds

1.4 drop_namespace

删除空的命名空间,注意:要删除命名空间必须先删除其下的所有表。

演示如下:

  1. 删除非空命名空间,会报错
hbase(main):027:0> drop_namespace 'ns_test'

ERROR: org.apache.hadoop.hbase.constraint.ConstraintException: Only empty namespaces can be removed. Namespace ns_test has 1 tables
    at org.apache.hadoop.hbase.master.TableNamespaceManager.remove(TableNamespaceManager.java:211)
    at org.apache.hadoop.hbase.master.HMaster.deleteNamespace(HMaster.java:2987)
    at org.apache.hadoop.hbase.master.MasterRpcServices.deleteNamespace(MasterRpcServices.java:521)
    at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:60010)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)

Here is some help for this command:
Drop the named namespace. The namespace must be empty.

  1. 删除表再删除命名空间

hbase(main):030:0> disable 'ns_test:t_test'
0 row(s) in 2.2920 seconds

hbase(main):031:0> drop 'ns_test:t_test'
0 row(s) in 1.2620 seconds

hbase(main):032:0> drop_namespace 'ns_test'
0 row(s) in 0.0390 seconds

1.5 list_namespace

列出所有的命名空间。

演示如下:

hbase(main):023:0> list_namespace
NAMESPACE                                                                                        
default                                                                                          
hbase                                                                                            
ns_test                                                                                                                                                                                
3 row(s) in 0.0420 seconds
1.6 list_namespace_tables

列出指定命名空间下的所有表。语法如下:

  hbase> list_namespace_tables 'ns1'

演示如下:

hbase(main):025:0> create 'ns_test:t_test','f'
0 row(s) in 1.4900 seconds

=> Hbase::Table - ns_test:t_test
hbase(main):026:0> list_namespace_tables 'ns_test'
TABLE                                                                                            
t_test                                                                                           
1 row(s) in 0.0060 seconds

2. Group Of DML

下面我们了解一下Hbase最常用到的增删改查操作命令。

2.1 append

向指定的单元格追加值操作。语法如下:

Appends a cell 'value' at specified table/row/column coordinates.

  hbase> append 't1', 'r1', 'c1', 'value', ATTRIBUTES=>{'mykey'=>'myvalue'}
  hbase> append 't1', 'r1', 'c1', 'value', {VISIBILITY=>'PRIVATE|SECRET'}

The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:

  hbase> t.append 'r1', 'c1', 'value', ATTRIBUTES=>{'mykey'=>'myvalue'}
  hbase> t.append 'r1', 'c1', 'value', {VISIBILITY=>'PRIVATE|SECRET'}

演示如下:

hbase(main):004:0> put 't_test','r1','f1:c1','abc'
0 row(s) in 0.0900 seconds

hbase(main):005:0> 
hbase(main):006:0* get 't_test','r1'
COLUMN                                           CELL                                                                                                                                          
 f1:c1                                           timestamp=1528019315459, value=abc                                                                                                            
1 row(s) in 0.0340 seconds

hbase(main):007:0> append 't_test','r1','f1:c1','_def'
0 row(s) in 0.0470 seconds

hbase(main):008:0> get 't_test','r1'
COLUMN                                           CELL                                                                                                                                          
 f1:c1                                           timestamp=1528019378977, value=abc_def                                                                                                        
1 row(s) in 0.0130 seconds

2.2 count

统计表的行数,默认情况下,每统计1000行进行一次展示,扫描 Cahe 默认打开,且默认值为10,如果你的表的行大小比较小,可以适当增加此值。语法如下:

 hbase> count 'ns1:t1'
 hbase> count 't1'
 hbase> count 't1', INTERVAL => 100000
 hbase> count 't1', CACHE => 1000
 hbase> count 't1', INTERVAL => 10, CACHE => 1000

演示如下:

hbase(main):009:0> count 't_test'
2 row(s) in 0.0440 seconds

=> 2
hbase(main):010:0> count 't_test',CACHE => 2
2 row(s) in 0.0210 seconds

=> 2
hbase(main):011:0> count 't_test',CACHE => 2,INTERVAL => 1
Current count: 1, row: 1news_news_200                                                                                                                                                          
Current count: 2, row: r1                                                                                                                                                                      
2 row(s) in 0.0190 seconds

=> 2
hbase(main):012:0> count 't_test',CACHE => 2,INTERVAL => 2
Current count: 2, row: r1                                                                                                                                                                      
2 row(s) in 0.0100 seconds

=> 2

但通过此命令进行行数统计可能需要较长时间,建议通过 'hbase org.apache.hadoop.hbase.mapreduce.RowCounter tableName' 命令进行统计。

2.3 delete

删除指定单元格的数据。语法如下:

  hbase> delete 'ns1:t1', 'r1', 'c1', ts1
  hbase> delete 't1', 'r1', 'c1', ts1
  hbase> delete 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

演示如下:

hbase(main):025:0> get 't_test','r1',{COLUMN =>'f1:c2',VERSIONS => 2}
COLUMN                                           CELL                                                                                                                                          
 f1:c2                                           timestamp=1528021313768, value=456                                                                                                            
 f1:c2                                           timestamp=1528021113843, value=123                                                                                                            
2 row(s) in 0.0130 seconds

hbase(main):026:0> delete 't_test','r1','f1:c2'
0 row(s) in 0.0140 seconds

hbase(main):027:0> get 't_test','r1',{COLUMN =>'f1:c2',VERSIONS => 2}
COLUMN                                           CELL                                                                                                                                          
0 row(s) in 0.0100 seconds

可见,该操作会将所有的版本数据全部删除。

还可以为此删除操作增加一个时间戳标记,如下:

hbase(main):033:0> delete 't_test','r1','f1:c2',1528021809100
0 row(s) in 0.0120 seconds
2.4 deleteall

可删除一行多列数据,同样可选一个时间戳作为标记。语法如下:

  hbase> deleteall 'ns1:t1', 'r1'
  hbase> deleteall 't1', 'r1'
  hbase> deleteall 't1', 'r1', 'c1'
  hbase> deleteall 't1', 'r1', 'c1', ts1
  hbase> deleteall 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

演示如下:

hbase(main):040:0> get 't_test','r1'
COLUMN                                           CELL                                                                                                                                          
 f1:c1                                           timestamp=1528021101363, value=abc_def                                                                                                        
 f1:c2                                           timestamp=1528022493254, value=456                                                                                                            
 f1:c3                                           timestamp=1528022504415, value=111                                                                                                            
3 row(s) in 0.0130 seconds

hbase(main):041:0> deleteall 't_test','r1'
0 row(s) in 0.0120 seconds

hbase(main):042:0> get 't_test','r1'
COLUMN                                           CELL                                                                                                                                          
0 row(s) in 0.0040 seconds

2.5 get

获取表整行数据或是一行多列数据或是指定单元格的数据或是某单元格指定版本的数据。语法如下:

  # 拿到一整行的数据
  hbase> get 'ns1:t1', 'r1'
  hbase> get 't1', 'r1'
  # 指定时间戳范围获取数据,前闭后开
  hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
  # 获取指定单列数据
  hbase> get 't1', 'r1', 'c1'
  hbase> get 't1', 'r1', {COLUMN => 'c1'}
  # 获取指定多列数据
  hbase> get 't1', 'r1', ['c1', 'c2']
  hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  # 获取指定单元格指定版本数据
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  # 指定单元格指定时间范围及版本数
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
  # 指定列值过滤器删选数据
  hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}

演示如下:

hbase(main):004:0> get 't_test','r1'
COLUMN                                           CELL                                                                                                                                          
 f1:c1                                           timestamp=1528023147256, value=123                                                                                                            
 f1:c2                                           timestamp=1528022954485, value=456                                                                                                            
 f1:c3                                           timestamp=1528023156719, value=100                                                                                                            
3 row(s) in 0.0330 seconds

hbase(main):005:0> get 't_test','r1',{TIMERANGE => [1528022954485,1528023147256]}
COLUMN                                           CELL                                                                                                                                          
 f1:c2                                           timestamp=1528022954485, value=456                                                                                                            
1 row(s) in 0.0100 seconds

hbase(main):009:0> get 't_test','r1',{FILTER => "ValueFilter(=, 'binary:123')"}
COLUMN                                           CELL                                                                                                                                          
 f1:c1                                           timestamp=1528023147256, value=123                                                                                                            
1 row(s) in 0.0070 seconds

hbase(main):011:0> get 't_test','r1',['f1:c1','f1:c2']
COLUMN                                           CELL                                                                                                                                          
 f1:c1                                           timestamp=1528023147256, value=123                                                                                                            
 f1:c2                                           timestamp=1528022954485, value=456                                                                                                            
2 row(s) in 0.0180 seconds

hbase(main):012:0> get 't_test','r1','f1:c1','f1:c2'
COLUMN                                           CELL                                                                                                                                          
 f1:c1                                           timestamp=1528023147256, value=123                                                                                                            
 f1:c2                                           timestamp=1528022954485, value=456                                                                                                            
2 row(s) in 0.0080 seconds

hbase(main):013:0> get 't_test','r1',COLUMNS=>['f1:c1','f1:c2']
COLUMN                                           CELL                                                                                                                                          
 f1:c1                                           timestamp=1528023147256, value=123                                                                                                            
 f1:c2                                           timestamp=1528022954485, value=456                                                                                                            
2 row(s) in 0.0080 seconds

hbase(main):020:0> get 't_test','r1',{COLUMN => 'f1:c1', TIMESTAMP =>1528023147256}
COLUMN                                           CELL                                                                                                                                          
 f1:c1                                           timestamp=1528023147256, value=123                                                                                                            
1 row(s) in 0.0050 seconds

对于在hbase中存储的中文内容,可以这样查看:

hbase(main):010:0> get 't_test','r1','f1:c4'
COLUMN                                                    CELL                                                                                                                                                                  
 f1:c4                                                    timestamp=1528024997298, value=\xE4\xB8\xAD\xE6\x96\x87                                                                                                               
1 row(s) in 0.0420 seconds

# 转中文方式如下:
hbase(main):011:0> get 't_test','r1','f1:c4:toString'
COLUMN                                                    CELL                                                                                                                                                                  
 f1:c4                                                    timestamp=1528024997298, value=中文                                                                                                                                 
1 row(s) in 0.0130 seconds

hbase(main):012:0> get 't_test','r1','f1:c4:c(org.apache.hadoop.hbase.util.Bytes).toString'
COLUMN                                                    CELL                                                                                                                                                                  
 f1:c4                                                    timestamp=1528024997298, value=中文                                                                                                                                 
1 row(s) in 0.0060 seconds

2.6 get_counter

返回计数器的值。语法如下:

  hbase> get_counter 'ns1:t1', 'r1', 'c1'
  hbase> get_counter 't1', 'r1', 'c1'

演示如下:

hbase(main):018:0> incr 't_test','r1','f1:incr',1 
COUNTER VALUE = 1
0 row(s) in 0.0380 seconds

hbase(main):019:0> get_counter 't_test','r1','f1:incr'
COUNTER VALUE = 1

hbase(main):020:0> incr 't_test','r1','f1:incr',1 
COUNTER VALUE = 2
0 row(s) in 0.0130 seconds

hbase(main):021:0> get_counter 't_test','r1','f1:incr'
COUNTER VALUE = 2

2.7 get_splits

查看表的预分区规则及分区数,语法如下:

  hbase> get_splits 't1'
  hbase> get_splits 'ns1:t1'

演示如下:

hbase(main):028:0> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
0 row(s) in 1.2420 seconds

hbase(main):029:0> get_splits 't1'
Total number of splits = 5

=> ["10", "20", "30", "40"]

2.8 incr

增加计数器的值。语法如下:

  hbase> incr 'ns1:t1', 'r1', 'c1'
  hbase> incr 't1', 'r1', 'c1'
  hbase> incr 't1', 'r1', 'c1', 1
  hbase> incr 't1', 'r1', 'c1', 10

演示如下:

hbase(main):034:0> incr 't_test','r1','f1:incr',2
COUNTER VALUE = 4
0 row(s) in 0.0160 seconds

hbase(main):035:0> get_counter 't_test','r1','f1:incr'
COUNTER VALUE = 4

2.9 put

向表中插入新值。语法如下:

  hbase> put 'ns1:t1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value'

演示如下:

hbase(main):037:0> put 't_test','r1','f1:c5','put'
0 row(s) in 0.0100 seconds

hbase(main):038:0> get 't_test','r1','f1:c5'
COLUMN                                                    CELL                                                                                                                                                                  
 f1:c5                                                    timestamp=1528026610237, value=put                                                                                                                                    
1 row(s) in 0.0050 seconds

2.10 scan

扫描表中数据,可指定扫描范围及扫描列,常与过滤器一同使用。语法如下:

  # 全扫描
  hbase> scan 't1'
  # 指定扫描列、返回行数及起始行。
  hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  # 指定扫描时间戳范围
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
  # 使用行过滤器进行过滤
  hbase> scan 't1', {ROWPREFIXFILTER => 'row2'}

演示如下:

hbase(main):040:0> scan 't_test'
ROW                                                       COLUMN+CELL                                                                                                                                                           
 1news_news_200                                           column=f1:v, timestamp=1527733074596, value=a,b,c                                                                                                                     
 r1                                                       column=f1:c1, timestamp=1528023147256, value=123                                                                                                                      
 r1                                                       column=f1:c2, timestamp=1528022954485, value=456                                                                                                                      
 r1                                                       column=f1:c3, timestamp=1528023156719, value=100                                                                                                                      
 r1                                                       column=f1:c4, timestamp=1528024997298, value=\xE4\xB8\xAD\xE6\x96\x87                                                                                                 
 r1                                                       column=f1:c5, timestamp=1528026610237, value=put                                                                                                                      
 r1                                                       column=f1:incr, timestamp=1528026485801, value=\x00\x00\x00\x00\x00\x00\x00\x04                                                                                       
2 row(s) in 0.0220 seconds

hbase(main):041:0> scan 't_test',{COLUMNS => ['f1:c1', 'f1:c2']}
ROW                                                       COLUMN+CELL                                                                                                                                                           
 r1                                                       column=f1:c1, timestamp=1528023147256, value=123                                                                                                                      
 r1                                                       column=f1:c2, timestamp=1528022954485, value=456                                                                                                                      
1 row(s) in 0.0130 seconds

hbase(main):042:0> scan 't_test',LIMIT=>1
ROW                                                       COLUMN+CELL                                                                                                                                                           
 1news_news_200                                           column=f1:v, timestamp=1527733074596, value=a,b,c                                                                                                                     
1 row(s) in 0.0080 seconds

hbase(main):043:0> scan 't_test',{STARTROW => 'r1'}
ROW                                                       COLUMN+CELL                                                                                                                                                           
 r1                                                       column=f1:c1, timestamp=1528023147256, value=123                                                                                                                      
 r1                                                       column=f1:c2, timestamp=1528022954485, value=456                                                                                                                      
 r1                                                       column=f1:c3, timestamp=1528023156719, value=100                                                                                                                      
 r1                                                       column=f1:c4, timestamp=1528024997298, value=\xE4\xB8\xAD\xE6\x96\x87                                                                                                 
 r1                                                       column=f1:c5, timestamp=1528026610237, value=put                                                                                                                      
 r1                                                       column=f1:incr, timestamp=1528026485801, value=\x00\x00\x00\x00\x00\x00\x00\x04                                                                                       
1 row(s) in 0.0140 seconds

hbase(main):044:0> scan 't_test',{STARTROW => 'r1',COLUMNS => ['f1:c1', 'f1:c2']}
ROW                                                       COLUMN+CELL                                                                                                                                                           
 r1                                                       column=f1:c1, timestamp=1528023147256, value=123                                                                                                                      
 r1                                                       column=f1:c2, timestamp=1528022954485, value=456                                                                                                                      
1 row(s) in 0.0120 seconds

hbase(main):049:0> scan 't_test',{ROWPREFIXFILTER => '1news'}
ROW                                                       COLUMN+CELL                                                                                                                                                           
 1news_news_200                                           column=f1:v, timestamp=1527733074596, value=a,b,c                                                                                                                     
1 row(s) in 0.0070 seconds

2.11 truncate

清空表,且删除预分区,只保留表结构。语法如下:

  hbase> truncate 't1'

演示如下:

hbase(main):069:0> scan 't1'
ROW                                                       COLUMN+CELL                                                                                                                                                           
 r1                                                       column=f1:c1, timestamp=1528027849019, value=10                                                                                                                       
1 row(s) in 0.0260 seconds

hbase(main):070:0> get_splits 't1'
Total number of splits = 5

=> ["10", "20", "30", "40"]
hbase(main):071:0> truncate 't1'
Truncating 't1' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 3.3320 seconds

hbase(main):072:0> scan 't1'
ROW                                                       COLUMN+CELL                                                                                                                                                           
0 row(s) in 0.1140 seconds

hbase(main):073:0> get_splits 't1'
Total number of splits = 1

=> []

2.12 truncate_preserve

清空表,但保留预分区。语法如下:

  hbase> truncate_preserve 't1'

演示如下:

hbase(main):063:0> scan 't1'
ROW                                                       COLUMN+CELL                                                                                                                                                           
 r1                                                       column=f1:c1, timestamp=1528027790054, value=10                                                                                                                       
1 row(s) in 0.0540 seconds

hbase(main):064:0> get_splits 't1'
Total number of splits = 5

=> ["10", "20", "30", "40"]
hbase(main):065:0> truncate_preserve 't1'
Truncating 't1' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 3.3450 seconds

hbase(main):066:0> scan 't1'
ROW                                                       COLUMN+CELL                                                                                                                                                           
0 row(s) in 0.5620 seconds

hbase(main):067:0> get_splits 't1'
Total number of splits = 5

=> ["10", "20", "30", "40"]

上一篇 下一篇

猜你喜欢

热点阅读