HBase结合Phoenix使用初探
2019-03-17 本文已影响0人
白面葫芦娃92
hbase命令帮助:
[root@hadoop001 bin]# hbase shell
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
19/03/16 22:48:32 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.0-cdh5.12.0, rUnknown, Thu Jun 29 04:38:21 PDT 2017
hbase(main):001:0> help
HBase Shell, version 1.2.0-cdh5.12.0, rUnknown, Thu Jun 29 04:38:21 PDT 2017
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_mob, compact_rs, flush, major_compact, major_compact_mob, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs, update_peer_config
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: quotas
Commands: list_quotas, set_quota
Group name: security
Commands: grant, list_security_capabilities, revoke, user_permission
Group name: procedures
Commands: abort_procedure, list_procedures
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
Group name: rsgroup
Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_servers_rsgroup, move_tables_rsgroup, remove_rsgroup
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
hbase的create以及put命令帮助:
hbase(main):002:0> create
ERROR: wrong number of arguments (0 for 1)
Creates a table. Pass a table name, and a set of column family
specifications (at least one), and, optionally, table configuration.
Column specification can be a simple string (name), or a dictionary
(dictionaries are described below in main help output), necessarily
including NAME attribute.
Examples:
Create a table with namespace=ns1 and table qualifier=t1
hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}
Create a table with namespace=default and table qualifier=t1
hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
hbase> # The above in shorthand would be the following:
hbase> create 't1', 'f1', 'f2', 'f3'
hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly'}
Table configuration options can be put at the end.
Examples:
hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
hbase> # Optionally pre-split the table into NUMREGIONS, using
hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
hbase> create 't1', {NAME => 'f1', DFS_REPLICATION => 1}
You can also keep around a reference to the created table:
hbase> t1 = create 't1', 'f1'
Which gives you a reference to the table named 't1', on which you can then
call methods.
hbase(main):006:0> put
ERROR: wrong number of arguments (0 for 4)
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates. To put a cell value into table 'ns1:t1' or 't1'
at row 'r1' under column 'c1' marked with the time 'ts1', do:
hbase> put 'ns1:t1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value', ts1
hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:
hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase(main):004:0> create 't_hbase1','info'
0 row(s) in 1.5790 seconds
=> Hbase::Table - t_hbase1
hbase(main):005:0> desc 't_hbase1'
Table t_hbase1 is ENABLED
t_hbase1
COLUMN FAMILIES DESCRIPTION
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KE
EP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', CO
MPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65
536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.1260 seconds
hbase(main):007:0> put 't_hbase1','1','info:id','1'
0 row(s) in 0.1320 seconds
hbase(main):008:0> put 't_hbase1','1','info:salary','1'
0 row(s) in 0.0170 seconds
hbase(main):009:0> put 't_hbase1','1','info:start_date','2019-03-18'
0 row(s) in 0.0230 seconds
hbase(main):010:0> put 't_hbase1','1','info:end_date','2019-03-18'
0 row(s) in 0.0190 seconds
hbase(main):011:0> scan 't_hbase1'
ROW COLUMN+CELL
1 column=info:end_date, timestamp=1552748561766, value=2019-
03-18
1 column=info:id, timestamp=1552748507732, value=1
1 column=info:salary, timestamp=1552748515347, value=1
1 column=info:start_date, timestamp=1552748545143, value=201
9-03-18
1 row(s) in 0.0640 seconds
[root@hadoop001 bin]# ./sqlline.py 192.168.137.2:2181
0: jdbc:phoenix:192.168.137.2:2181> create table "t_hbase1"( "ROW" varchar primary key, "info"."start_date" varchar , "info"."end_date" varchar , "info"."id" varchar , "info"."salary" varchar);
0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2019-03-18 | 2019-03-18 | 1 | 1 |
+------+-------------+-------------+-----+---------+
1 row selected (0.091 seconds)
0: jdbc:phoenix:192.168.137.2:2181> UPSERT INTO "t_hbase1" VALUES('2','2019-04-18','2019-04-18','2','2');
t_hbase1
1 row affected (0.052 seconds)
0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
t_hbase1
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2019-03-18 | 2019-03-18 | 1 | 1 |
| 2 | 2019-04-18 | 2019-04-18 | 2 | 2 |
+------+-------------+-------------+-----+---------+
2 rows selected (0.059 seconds)
hbase(main):012:0> scan 't_hbase1'
ROW COLUMN+CELL
1 column=info:_0, timestamp=1552748561766, value=
1 column=info:end_date, timestamp=1552748561766, value=2019-
03-18
1 column=info:id, timestamp=1552748507732, value=1
1 column=info:salary, timestamp=1552748515347, value=1
1 column=info:start_date, timestamp=1552748545143, value=201
9-03-18
2 column=info:_0, timestamp=1552748985363, value=x
2 column=info:end_date, timestamp=1552748985363, value=2019-
04-18
2 column=info:id, timestamp=1552748985363, value=2
2 column=info:salary, timestamp=1552748985363, value=2
2 column=info:start_date, timestamp=1552748985363, value=201
9-04-18
2 row(s) in 0.0300 seconds
hbase(main):002:0> put 't_hbase1','3','info:id','3'
0 row(s) in 0.0990 seconds
hbase(main):002:0> put 't_hbase1','3','info:salary','3'
0 row(s) in 0.0990 seconds
hbase(main):003:0> put 't_hbase1','3','info:start_date','2019-05-18'
0 row(s) in 0.0130 seconds
hbase(main):004:0> put 't_hbase1','3','info:end_date','2019-05-18'
0 row(s) in 0.0190 seconds
hbase(main):005:0> scan 't_hbase1'
ROW COLUMN+CELL
1 column=info:_0, timestamp=1552748561766, value=
1 column=info:end_date, timestamp=1552748561766, value=2019-
03-18
1 column=info:id, timestamp=1552748507732, value=1
1 column=info:salary, timestamp=1552748515347, value=1
1 column=info:start_date, timestamp=1552748545143, value=201
9-03-18
2 column=info:_0, timestamp=1552748985363, value=x
2 column=info:end_date, timestamp=1552748985363, value=2019-
04-18
2 column=info:id, timestamp=1552748985363, value=2
2 column=info:salary, timestamp=1552748985363, value=2
2 column=info:start_date, timestamp=1552748985363, value=201
9-04-18
3 column=info:end_date, timestamp=1552749225788, value=2019-
05-18
3 column=info:id, timestamp=1552749074811, value=3
3 column=info:salary, timestamp=1552749201553, value=3
3 column=info:start_date, timestamp=1552749215234, value=201
9-05-18
3 row(s) in 0.0410 seconds
0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2019-03-18 | 2019-03-18 | 1 | 1 |
| 2 | 2019-04-18 | 2019-04-18 | 2 | 2 |
| 3 | 2019-05-18 | 2019-05-18 | 3 | 3 |
+------+-------------+-------------+-----+---------+
3 rows selected (0.061 seconds)
0: jdbc:phoenix:192.168.137.2:2181> delete from "t_hbase1" where "id"='3';
t_hbase1
1 row affected (0.028 seconds)
0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
t_hbase1
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2019-03-18 | 2019-03-18 | 1 | 1 |
| 2 | 2019-04-18 | 2019-04-18 | 2 | 2 |
+------+-------------+-------------+-----+---------+
2 rows selected (0.048 seconds)
hbase(main):001:0> scan 't_hbase1'
ROW COLUMN+CELL
1 column=info:_0, timestamp=1552748561766, value=
1 column=info:end_date, timestamp=1552748561766, value=2019-
03-18
1 column=info:id, timestamp=1552748507732, value=1
1 column=info:salary, timestamp=1552748515347, value=1
1 column=info:start_date, timestamp=1552748545143, value=201
9-03-18
2 column=info:_0, timestamp=1552748985363, value=x
2 column=info:end_date, timestamp=1552748985363, value=2019-
04-18
2 column=info:id, timestamp=1552748985363, value=2
2 column=info:salary, timestamp=1552748985363, value=2
2 column=info:start_date, timestamp=1552748985363, value=201
9-04-18
2 row(s) in 0.3110 seconds
创建二级索引
0: jdbc:phoenix:localhost:2181:/hbase> CREATE INDEX THBASE1_INDEX_ID ON "t_hbase1"("info"."id");
0: jdbc:phoenix:localhost:2181:/hbase> select * from THBASE1_INDEX_ID;
+----------+-------+
| info:id | :ROW |
+----------+-------+
| 1 | 1 |
| 2 | 2 |
+----------+-------+
2 rows selected (0.072 seconds)
[root@hadoop001 ~]# hadoop fs -ls /hbase/data/default
Found 4 items
drwxr-xr-x - hbase hbase 0 2019-03-10 16:19 /hbase/data/default/SYSTEM.MUTEX
drwxr-xr-x - hbase hbase 0 2019-03-16 23:51 /hbase/data/default/THBASE1_INDEX_ID
drwxr-xr-x - hbase hbase 0 2019-03-10 16:19 /hbase/data/default/USER
drwxr-xr-x - hbase hbase 0 2019-03-16 22:59 /hbase/data/default/t_hbase1
有了索引表之后,直接在hbase里插入数据
hbase(main):002:0> put 't_hbase1','3','info:id','3'
0 row(s) in 0.0990 seconds
hbase(main):003:0> put 't_hbase1','3','info:salary','3'
0 row(s) in 0.0210 seconds
hbase(main):004:0> put 't_hbase1','3','info:start_date','2019-09-18'
0 row(s) in 0.0180 seconds
hbase(main):005:0> put 't_hbase1','3','info:end_date','2019-09-18'
0 row(s) in 0.0210 seconds
//phoenix里的数据会更新
0: jdbc:phoenix:localhost:2181:/hbase> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2019-03-18 | 2019-03-18 | 1 | 1 |
| 2 | 2019-04-18 | 2019-04-18 | 2 | 2 |
| 3 | 2019-09-18 | 2019-09-18 | 3 | 3 |
+------+-------------+-------------+-----+---------+
3 rows selected (0.096 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select * from "t_hbase1" where "id"='3';
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 3 | 2019-09-18 | 2019-09-18 | 3 | 3 |
+------+-------------+-------------+-----+---------+
1 row selected (0.084 seconds)
//但是索引表不会更新
0: jdbc:phoenix:localhost:2181:/hbase> select count(*) from "t_hbase1";
+-----------+
| COUNT(1) |
+-----------+
| 2 |
+-----------+
1 row selected (0.029 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select * from THBASE1_INDEX_ID;
+----------+-------+
| info:id | :ROW |
+----------+-------+
| 1 | 1 |
| 2 | 2 |
+----------+-------+
2 rows selected (0.025 seconds)
//那尝试一下直接在phoenix插入一条数据
0: jdbc:phoenix:localhost:2181:/hbase> UPSERT INTO "t_hbase1" VALUES('4','2019-10-18','2019-10-18','4','4');
t_hbase1
1 row affected (0.17 seconds)
//发现数据和索引都更新了
0: jdbc:phoenix:localhost:2181:/hbase> select * from THBASE1_INDEX_ID;
+----------+-------+
| info:id | :ROW |
+----------+-------+
| 1 | 1 |
| 2 | 2 |
| 4 | 4 |
+----------+-------+
3 rows selected (0.024 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select count(*) from "t_hbase1";
+-----------+
| COUNT(1) |
+-----------+
| 3 |
+-----------+
1 row selected (0.029 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2019-03-18 | 2019-03-18 | 1 | 1 |
| 2 | 2019-04-18 | 2019-04-18 | 2 | 2 |
| 3 | 2019-09-18 | 2019-09-18 | 3 | 3 |
| 4 | 2019-10-18 | 2019-10-18 | 4 | 4 |
+------+-------------+-------------+-----+---------+
4 rows selected (0.086 seconds)
hbase里的数据和索引也都更新了
hbase(main):009:0> scan 't_hbase1'
ROW COLUMN+CELL
1 column=info:_0, timestamp=1552748561766, value=
1 column=info:end_date, timestamp=1552748561766, value=2019-
03-18
1 column=info:id, timestamp=1552748507732, value=1
1 column=info:salary, timestamp=1552748515347, value=1
1 column=info:start_date, timestamp=1552748545143, value=201
9-03-18
2 column=info:_0, timestamp=1552748985363, value=x
2 column=info:end_date, timestamp=1552748985363, value=2019-
04-18
2 column=info:id, timestamp=1552748985363, value=2
2 column=info:salary, timestamp=1552748985363, value=2
2 column=info:start_date, timestamp=1552748985363, value=201
9-04-18
3 column=info:end_date, timestamp=1552751714137, value=2019-
09-18
3 column=info:id, timestamp=1552751696636, value=3
3 column=info:salary, timestamp=1552751700685, value=3
3 column=info:start_date, timestamp=1552751709072, value=201
9-09-18
4 column=info:_0, timestamp=1552751877072, value=x
4 column=info:end_date, timestamp=1552751877072, value=2019-
10-18
4 column=info:id, timestamp=1552751877072, value=4
4 column=info:salary, timestamp=1552751877072, value=4
4 column=info:start_date, timestamp=1552751877072, value=201
9-10-18
4 row(s) in 0.0510 seconds
hbase(main):008:0> scan 'THBASE1_INDEX_ID'
ROW COLUMN+CELL
1\x001 column=0:_0, timestamp=1552751474487, value=x
2\x002 column=0:_0, timestamp=1552751474487, value=x
4\x004 column=0:_0, timestamp=1552751877072, value=_0
3 row(s) in 0.0220 seconds
这个问题的解决参考下面这个链接:
https://blog.csdn.net/u012551524/article/details/79575632
通过上述结果发现,当给t_hbase1表创建了二级索引后,如果通过hbase shell 进行插入数据时,该二级索引表数据是不会同步进行更新的。
当给t_hbase1表创建了二级索引后,如果通过phoenix shell 进行插入数据时,该二级索引表数据是会自动同步的,原理主要是通过协处理器进行更新。
总的来说如果需要维护phoenix表所创建的二级索引,源表数据的操作需要通过Phoenix客户端进行操作,当然如果不用维护对应的二级索引,数据的操作就无所谓通过什么方式进行了。
本文参考:https://blog.csdn.net/u013850277/article/details/80978858