HBase表设计 - 二级索引Secondary Indexin
2020-12-22 本文已影响0人
诺之林
本文的主线 概念 => 主键 => 配置 => 索引
概念
-
Secondary indexes are an orthogonal way to access data from its primary access path.
-
In HBase, you have a single index that is lexicographically sorted on the primary row key. Access to records in any way other than through the primary row requires scanning over potentially all the rows in the table to test them against your filter.
-
With secondary indexing, the columns or expressions you index form an alternate row key to allow point lookups and range scans along this new axis.
主键
- 单一主键
CREATE TABLE IF NOT EXISTS t_gps (
row_key VARCHAR NOT NULL PRIMARY KEY,
id BIGINT,
city VARCHAR,
created_ts BIGINT
);
UPSERT INTO t_gps VALUES('3001-1608605947',3001,'beijing',1608605947);
UPSERT INTO t_gps VALUES('3001-1608615947',3001,'shanghai',1608615947);
EXPLAIN SELECT * FROM t_gps WHERE row_key > '3001-1608605947';
+--------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER T_GPS ['3001-1608605947'] - [*] | null | null | null |
+--------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.018 seconds)
- 复合主键
DROP TABLE t_gps;
CREATE TABLE IF NOT EXISTS t_gps (
id BIGINT NOT NULL,
city VARCHAR,
created_ts BIGINT NOT NULL,
CONSTRAINT pk PRIMARY KEY (id,created_ts)
);
UPSERT INTO t_gps VALUES(3001,'beijing',1608605947);
UPSERT INTO t_gps VALUES(3001,'shanghai',1608615947);
EXPLAIN SELECT * FROM t_gps WHERE id = 3001 AND created_ts > 1608605947;
+-----------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER T_GPS [3001,1608605948] - [3001,*] | null | null | null |
+-----------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.017 seconds)
配置
vim ~/hbase-2.0.0/conf/hbase-site.xml
<configuration>
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
</configuration>
~/hbase-2.0.0/bin/stop-hbase.sh
~/hbase-2.0.0/bin/start-hbase.sh
索引
EXPLAIN SELECT * FROM t_gps WHERE created_ts > 1608605947;
+-----------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T_GPS | null | null | null |
| SERVER FILTER BY CREATED_TS > 1608605947 | null | null | null |
+-----------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.017 seconds)
CREATE INDEX IF NOT EXISTS ts_idx ON t_gps (created_ts DESC);
EXPLAIN SELECT * FROM t_gps WHERE created_ts > 1608605947;
+-----------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T_GPS | null | null | null |
| SERVER FILTER BY CREATED_TS > 1608605947 | null | null | null |
+-----------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.018 seconds)
DROP INDEX ts_idx ON t_gps;
CREATE INDEX IF NOT EXISTS ts_idx ON t_gps (created_ts DESC) INCLUDE (id, city);
EXPLAIN SELECT * FROM t_gps WHERE created_ts > 1608605947;
+---------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER TS_IDX [*] - [~1608605947] | null | null | null |
+---------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.023 seconds)