大数据

HBase入门实践

2022-01-19  本文已影响0人  肥兔子爱豆畜子
概要

本文简单的安装单机版的HBase数据库,单机版底层存储是直接使用的本地文件系统、这样的话就不用搭建HDFS文件服务了。然后HBase提供了hbase-client来对数据库做操作,但是这里使用Apache Phoenix,可以支持SQL的方式来读写HBase,搭建完HBase并安装Phoenix插件之后,我们基于Spring JDBC和Phoenix客户端来开发一个增删改查HBase的示例。

Phoenix分为客户端和服务端两部分,相当于在HBase上再加了一层SQL翻译,支持JDBC协议,客户端发送SQL经由phoenix发到其作为一个HBase插件的服务端上,把SQL再转成HBase指令交给HBase执行。

HBase简介

HBase是大数据时代的默认存储,适合存储海量数据,用户行为类数据、其他大数据平台的底层存储、报表展示类。

环境安装与搭建

吐槽一下,HBase这入门环境搭建简直是霍格大爷,差点被劝退。
hbase-2.3.7 + phoenix-hbase-2.3-5.1.2死活不行,hbase本身倒是能正常用shell登入进行操作,用phoenix就是不行,卡在sqlline.py连接那里,然后hbase就Region in transition了、要么就ConnectionLoss for /hbase/hbaseid,只能删除数据目录重启。
后来只能老实的安装网上别人的成功安装经验,用hbase-2.2.4 + phoenix-hbase-2.0-5.0.0这个组合才成功。

conf/hbase-env.sh里改一下JAVA_HOME环境变量:

export JAVA_HOME=/usr/java/jdk1.8.0_131/

修改hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<!-- hbase存放数据目录 -->
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/hbase-2.2.4/hbase</value>
  </property>
  <!-- ZooKeeper数据文件路径 -->
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/hbase-2.2.4/zookeeper</value>
  </property>
  <property>
    <name>hbase.master.ipc.address</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>hbase.regionserver.ipc.address</name>
    <value>0.0.0.0</value>
  </property>

 <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>
</configuration>

这里可以启动一下试试看,./bin/start-hbase.sh
hbase shell进入命令行,list查看表,create 'test', 'cf' ,describe 'test'

hbase(main):001:0> list
TABLE                                                                                                                                                   
0 row(s)
Took 1.1635 seconds                                                                                                                                     
=> []


hbase(main):013:0* create 'test', 'cf'
Created table test
Took 0.7584 seconds                                                                                                                                     
=> Hbase::Table - test
hbase(main):014:0> list
TABLE                                                                                                                                                   
test                                                                                                                                                    
1 row(s)
Took 0.0299 seconds                                                                                                                                     
=> ["test"]


hbase(main):019:0* describe 'test'
Table test is ENABLED                                                                                                                                   
test                                                                                                                                                    
COLUMN FAMILIES DESCRIPTION                                                                                                                             
{NAME => 'cf', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION =>
 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                   

1 row(s)
Quota is disabled
Took 0.3620 seconds 


hbase(main):020:0> put 'test', 'row1', 'cf:a', 'value1'
Took 0.1434 seconds                                                                                                                                     
hbase(main):021:0> put 'test', 'row2', 'cf:b', 'value2'
Took 0.0289 seconds                                                                                                                                     
hbase(main):022:0> put 'test', 'row3', 'cf:c', 'value3'
Took 0.0142 seconds                                                                                                                                     
hbase(main):023:0> scan 'test'
ROW                                     COLUMN+CELL                                                                                                     
 row1                                   column=cf:a, timestamp=2022-01-18T14:16:36.606, value=value1                                                    
 row2                                   column=cf:b, timestamp=2022-01-18T14:16:49.123, value=value2                                                    
 row3                                   column=cf:c, timestamp=2022-01-18T14:16:59.043, value=value3                                                    
3 row(s)
Took 0.0911 seconds

hbase(main):025:0* get 'test', 'row1'
COLUMN                                  CELL                                                                                                            
 cf:a                                   timestamp=2022-01-18T14:16:36.606, value=value1                                                                 
1 row(s)
Took 0.0510 seconds

禁用表、启用表、禁用后删除表:

disable 'test'
enable 'test'
drop 'test'

然后安装Phoenix:
1、把phoenix安装包里的jar包复制到hbase的lib目录里
2、把hbase-site.xml文件cp到phoenix的bin目录,后面用本地这个phoenix客户端需要。
3、添加环境变量
vim /etc/profile

# For Phoenix
export PHOENIX_HOME=/usr/phoenix-hbase-2.3-5.1.2-bin
export PHOENIX_CLASSPATH=$PHOENIX_HOME
export PATH=$PHOENIX_HOME/bin:$PATH

source /etc/profile 生效。
使用phoenix自带的sqlline.py localhost:2181 验证一下:

[root@VM_0_11_centos bin]# ./sqlline.py localhost:2181
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:localhost:2181 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:localhost:2181
22/01/18 17:14:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 5.0)
Driver: PhoenixEmbeddedDriver (version 5.0)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
133/133 (100%) Done
Done
sqlline version 1.2.0
0: jdbc:phoenix:localhost:2181> !table
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMU |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
|            | SYSTEM       | CATALOG     | SYSTEM TABLE  |          |            |                            |                 |              | fals |
|            | SYSTEM       | FUNCTION    | SYSTEM TABLE  |          |            |                            |                 |              | fals |
|            | SYSTEM       | LOG         | SYSTEM TABLE  |          |            |                            |                 |              | true |
|            | SYSTEM       | SEQUENCE    | SYSTEM TABLE  |          |            |                            |                 |              | fals |
|            | SYSTEM       | STATS       | SYSTEM TABLE  |          |            |                            |                 |              | fals |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+

测试一下表操作:

0: jdbc:phoenix:localhost:2181> create table if not exists "staff"(
. . . . . . . . . . . . . . . > id varchar primary key,
. . . . . . . . . . . . . . . > name varchar,
. . . . . . . . . . . . . . . > age varchar);
No rows affected (1.28 seconds)

0: jdbc:phoenix:localhost:2181> !table
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMU |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
|            | SYSTEM       | CATALOG     | SYSTEM TABLE  |          |            |                            |                 |              | fals |
|            | SYSTEM       | FUNCTION    | SYSTEM TABLE  |          |            |                            |                 |              | fals |
|            | SYSTEM       | LOG         | SYSTEM TABLE  |          |            |                            |                 |              | true |
|            | SYSTEM       | SEQUENCE    | SYSTEM TABLE  |          |            |                            |                 |              | fals |
|            | SYSTEM       | STATS       | SYSTEM TABLE  |          |            |                            |                 |              | fals |
|            |              | staff       | TABLE         |          |            |                            |                 |              | fals |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
SpringBoot整合开发

用的org.apache.phoenix:phoenix-core:5.0.0-HBase-2.0这个依赖,slf4j绑定跟springboot的冲突,所以exclude掉:

plugins {
    id 'org.springframework.boot' version '2.1.13.RELEASE'
    id 'io.spring.dependency-management' version '1.0.9.RELEASE'
    id 'java'
}


version = '0.0.1-SNAPSHOT'
sourceCompatibility = '1.8'
 
repositories {
    mavenLocal()
    maven { url 'http://maven.aliyun.com/nexus/content/groups/public/' }
    //mavenCentral()
}

dependencies {

    implementation 'org.springframework.boot:spring-boot-starter-web'
    implementation 'org.springframework.boot:spring-boot-starter-jdbc'
    implementation 'org.springframework.boot:spring-boot-starter-test'
    implementation 'org.projectlombok:lombok:1.18.22'
    annotationProcessor('org.projectlombok:lombok')
    compile group: 'com.alibaba', name: 'fastjson', version: '1.2.73'
    
    compile('org.apache.phoenix:phoenix-core:5.0.0-HBase-2.0'){
        exclude group: 'org.slf4j'
    }
    
}

phoenix支持JDBC,这里选了Spring JDBC也就是JdbcTemplate来通过phoenix对HBase做增删改查。

数据源配置:

server.port=8080
spring.application.name=hbase-test

spring.datasource.driver-class-name=org.apache.phoenix.jdbc.PhoenixDriver
spring.datasource.name=phoenixDataSource
spring.datasource.url=jdbc:phoenix:122.xx.xxx.187:2181

演示代码:

应用启动的时候创建一个custem_user表:

@Slf4j
@Component
public class SystemInitRunner implements ApplicationRunner{
    
    @Autowired
    private JdbcTemplate jdbcTemplate;

    @Override
    public void run(ApplicationArguments args) throws Exception {
        
        log.info("应用启动...");
        
        initHBaseTables();
    }
    
    public void initHBaseTables() {
        
        StringBuilder builder = new StringBuilder();
        builder.append("CREATE TABLE IF NOT EXISTS \"custemuser\" (")
                .append("\"uid\" VARCHAR primary key,")
                .append("\"basic\".\"name\" VARCHAR,")
                .append("\"basic\".\"mobile\" VARCHAR)");
        String sql = builder.toString();
        
        log.info("开始执行HBase建表语句 {}" , sql);
        
        try {
            jdbcTemplate.execute(sql);
            log.info("HBase custemuser表创建完毕");
        }catch(DataAccessException e) {
            log.error("HBase custemuser表创建失败:{}", e.getMessage());
            throw new RuntimeException(e.getCause());
        }
        
    }

}

对custem_user表的新增与查询接口:

@Slf4j
@RestController
@RequestMapping("/hbase")
public class HBaseTestController {
    
    @Autowired
    private JdbcTemplate jdbcTemplate;
    
    @RequestMapping(value = "/addUser", method = RequestMethod.POST)
    public void addUser(@RequestBody CustemUser user) {
        
        String sql = "upsert into \"custemuser\"  values(?,?,?)";
        
        int ret = jdbcTemplate.update(sql,  new PreparedStatementSetter() {

            @Override
            public void setValues(PreparedStatement ps) throws SQLException {
                ps.setString(1, user.getUid());
                ps.setString(2, user.getName());
                ps.setString(3, user.getMobile());
            }});
        
        log.info("HBase表custem_user已添加修改完毕,数据库返回{}", ret);
    }
    
    @RequestMapping(value = "/getUserByMobile", method = RequestMethod.GET)
    public CustemUser getUserByMobile(String mobile) {
        
        String sql = "select * from \"custemuser\" where \"basic\".\"mobile\" = ?";
        
        CustemUser user= jdbcTemplate.queryForObject(sql, 
                            new Object[] {mobile}, 
                            new RowMapper<CustemUser>() {

                                @Override
                                public CustemUser mapRow(ResultSet rs, int rowNum) throws SQLException {
                                    CustemUser u = new CustemUser();
                                    u.setUid(rs.getString(1));
                                    u.setName(rs.getString(2));
                                    u.setMobile(rs.getString(3));
                                    return u;
                                }});
        
        log.info("HBase用户查询结果{}", JSON.toJSONString(user));
        
        return user;
    }
    
}

DTO对象:

@Setter
@Getter
@NoArgsConstructor
@ToString
public class CustemUser {
    private String uid;
    private String name;
    private String mobile;
}

postMan测试:

POST http://localhost:8080/hbase/addUser

requestBody:

{  
  "uid":"1001",
  "name":"肥兔子爱豆畜子",
  "mobile":"137xxxx8612"
}

GET http://localhost:8080/hbase/getUserByMobile?mobile=137xxxx8612

返回:

{
    "uid": "1001",
    "name": "肥兔子爱豆畜子",
    "mobile": "137xxxx8612"
}
Phoenix SQL语法

我们直接使用hbase shell去数据库里看一下custem_user的记录:

hbase(main):013:0> scan "custem_user"
ROW                                     COLUMN+CELL                                                                                                     
 1001                                   column=0:\x00\x00\x00\x00, timestamp=1642576351561, value=x                                                     
 1001                                   column=0:\x80\x0B, timestamp=1642576351561, value=\xE8\x82\xA5\xE5\x85\x94\xE5\xAD\x90\xE7\x88\xB1\xE8\xB1\x86\x
                                        E7\x95\x9C\xE5\xAD\x90                                                                                          
 1001                                   column=0:\x80\x0C, timestamp=1642576351561, value=137xxxx8612                                                   
1 row(s)
Took 0.0814 seconds 

可以看到Rowkey对应的就是我们建的表的主键id,然后id、name、mobile3个列一起被归到0这个列族了,这是因为我们在建表的时候没有指定列族。把建表语句改一下就行了:

CREATE TABLE IF NOT EXISTS "custem_user" (
                "uid" VARCHAR primary key,
                "basic"."name" VARCHAR,
                "basic"."mobile" VARCHAR)

就可以把name和mobile归结到basic这个列族里。
一般开发时在写到Java代码之前可以用DBeaver工具测试一下SQL是否正确:

CREATE TABLE IF NOT EXISTS "test" (
"uid" VARCHAR primary key,
"basic"."name" VARCHAR,
"basic"."mobile" VARCHAR
);
UPSERT INTO "test" values('123','liny','13789388372');
UPSERT INTO "test" values('456','douchuzi','13429586338');

SELECT * FROM "test" WHERE "basic"."mobile" = '13789388372'; 
SELECT * FROM "test" WHERE "mobile" = '13429586338'; 

上面两种查询方式都是可以的。
而如下这么写不行:

SELECT * FROM "test" WHERE mobile = '13429586338'; 

报错:SQL 错误 [504] [42703]: ERROR 504 (42703): Undefined column. columnName=test.MOBILE

Phoenix SQL里边表名、列明都是大小写敏感的,需要用双引号标识,我们建表的时候表custem_user的basic列族下mobile列,WHERE条件后的mobile字段没有加双引号,而从报错信息看显然是去按照test.MOBILE去找列了。

实践中遇到的问题:

坑1:

应用启动的时候报错:HADOOP_HOME AND HADOOP.HOME.DIR ARE UNSET,解决办法是去steveloughran/winutils: Windows binaries for Hadoop versions (github.com) 下载各版本Hadoop的winutil到本地,然后设置好环境变量就可以了。依赖包里可以看到是Hadoop3.0,所以设置里边的3.0目录到环境变量HADOOP_HOME,重启IDE即可。

坑2:

应用开始运行后用phoenix创建表的时候报错:Can not resolve VM_0_11_centos, please check your network java.net.UnknownHostException: VM_0_11_centos

报错日志可以看到是hbase-client连接失败,VM_0_11_centos是笔者远程HBase所在服务器的机器名,查阅一些文档发现HBase的Region Server启动的时候就是把自己的hostname存放在zookeeper的、而不是ip,所以在客户端本地hosts文件中添加:122.xx.xxx.187 VM_0_11_centos,然后刷新下windows的本地dns即可:
ipconfig /displaydns
ipconfig /flushdns

下一步进阶

HBase的原理,包括它的架构和集群搭建。
底层存储LSM-Tree数据结构,数据读写流程。
由底层存储结构和架构决定的性能特性,使用场景:海量数据存储、高性能的随机写、较高性能的随机读。
集群服务故障的处理机制,集群工具,周边生态,性能调优以及最佳实践等。

参考:

入门环境搭建与Phoenix集成开发:
SpringBoot - 使用Phoenix操作HBase教程2(使用JdbcTemplate) (hangge.com) 系列

基础概念:
我终于看懂了HBase,太不容易了... - 知乎 (zhihu.com)

入门HBase,看这一篇就够了 - 简书 (jianshu.com)

Hbase--读取数据快还是写数据快 - 简书 (jianshu.com)

架构应用:
云数据库HBase,云时代的大会数据存储 - 阿里云 (aliyun.com)

分库分表技术演进暨最佳实践 - 简书 (jianshu.com)

HBase实战 | 从MySQL到HBase:数据存储方案转型的演进-阿里云开发者社区 (aliyun.com)

基于HBase快速构建 海量订单存储系统-阿里云开发者社区 (aliyun.com)

参考书:
《HBase实战》

上一篇下一篇

猜你喜欢

热点阅读