hbase1.4.9使用小记

2019-04-04  本文已影响0人  九号自行车司机

下载安装

下载

  1. 打开官方下载链接:https://www.apache.org/dyn/closer.lua/hbase/,打开页面上推荐的最上面的那个链接。显示的内容如下:
  2. 打开上图中红框选中的stable文件夹,下载后缀名为bin.tar.gz的文件

安装

  1. 将下载好的文件放到对应目录,linux系统一般将该文件放在/user/local下面
  2. 解压文件:
tar xzvf hbase-1.4.9-bin.tar.gz
  1. 安装jdk并且配置JAVA_HOME环境变量,hbase与jdk的版本对照表如图所示:

    本文使用的是1.4.9版本的hbase,所以对应的java版本最好是jdk7或者jdk8

配置

单机部署

进入解压后产生的文件目录

  1. 编辑conf/hbase-site.xml文件,文件内容如下:
<configuration>
  <property>
      <name>hbase.rootdir</name>
      <value>file:///data/hbase</value>
  </property>
  <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/data/zookeeper</value>
  </property>
  <property>
      <name>hbase.unsafe.stream.capability.enforce</name>
      <value>false</value>
      <description>
        Controls whether HBase will check for stream capabilities (hflush/hsync).

        Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
        with the 'file://' scheme, but be mindful of the NOTE below.

        WARNING: Setting this to false blinds you to potential data loss and
        inconsistent system state in the event of process and/or node failures. If
        HBase is complaining of an inability to use hsync or hflush it's most
        likely not a false positive.
      </description>
  </property>
</configuration>
  1. 编辑conf/hbase-env.sh文件
# 配置JAVA_HOME
export JAVA_HOME=/usr/local/jdk1.8.0_201
# 推荐配置pid文件目录,若不配置此项,默认使用/tmp目录,文件易丢失
export HBASE_PID_DIR=/var/hadoop/pids

伪分布式部署

在单机部署的基础上

  1. 将hbase设置为分布式的运行模式
<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
  1. hbase.rootdir指向hdfs
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://localhost:9000/hbase</value>
</property>
  1. 移出hbase.unsafe.stream.capability.enforce的配置,或者将它置为true
<property>
  <name>hbase.unsafe.stream.capability.enforce</name>
  <value>true</value>
</property>

完全分布式部署

在伪分布式部署的基础上

  1. 编辑conf/regionservers,填入所有regionServer的hostname
centos
  1. 编辑conf/hbase-site.xml文件,填入所有ZooKeeper所在服务器的hostname
<property>
  <name>hbase.zookeeper.quorum</name>
  <value>centos</value>
</property>

启动

测试环境最好关闭服务器防火墙之后再使用下面的指令启动hbase,否则可能会出现各种问题。

bin/start-hbase.sh

假如要停止hbase,使用下面的指令

bin/stop-hbase.sh

验证

执行指令:

jps -lv | grep hbase

控制台显示如下图所示:



一个名为HMaster的进程正在运行,表示安装成功

初识hbase

  1. 连接hbase
bin/hbase shell
  1. 创建表
create 'test', 'cf'

这里指定了表名为test,列族名为cf

  1. 查看表信息
list 'test'

控制台会显示刚才创建的test表,证明表创建成功。

  1. 查看表详情
describe 'test'
  1. 放入数据到表中
put 'test', 'row1', 'cf:a', 'value1'
put 'test', 'row2', 'cf:b', 'value2'
put 'test', 'row3', 'cf:c', 'value3'

这里放入了三条数据

  1. 查看表数据
scan 'test'

上面是查看表的全部数据,假如只需要获取一行表数据的话,输入指令

get 'test', 'row1'
  1. 停用/启用数据表
    停用数据表
disable 'test'

停用数据表之后,可以使用指令删除表

drop 'test'

若不想删除表,那么还可以使用指令再次启用该表

enable 'test'
  1. 退出HBase Shel
exit

数据模型

Table

Table由若干row组成

Row

row由row key以及若干column value组成,一个table中的若干row按row key的字典序排序。

Column

column由column family和column qualifier两部分组成,两部分用 : (冒号)隔开:

Cell

通过row key,column family,和column qualifier结合起来,可以唯一的定位到一个cell。cell由value和timestamp(表示value的版本)组成。

Timestamp

timestamp表示value的版本,写入数据时,默认会同时取RegionServer的当前时间作为timestamp,当然在写入数据时也可以自己指定timestamp。

表设计经验法则

  1. region大小为10到50gb
  2. cell大小不超过10 MB,若超过这个大小,将数据存到HDFS,hbase仅存一个指向该数据的指针。
  3. 每个表拥有1到3个column family,可以的话,尽可能做到一张表只有1个column family。
  4. 一个拥有1到2个column family的表最佳region数量为50-100个,需要注意一个region实际上就是一个column family的连续段。
  5. column family名需要尽可能的短,最好用一个字母就能表示。
  6. 假如row key是单调递增的话,那么可能会引发一个问题,就是所有的数据读写都集中在某一个region,而老的region将不会被充分利用,所以row key最好不要是单调递增的。

Java客户端配置

maven加入hbase-shaded-client依赖

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-shaded-client</artifactId>
    <version>1.4.9</version>
</dependency>

代码示例

  1. 创建表
public class HbaseClient {
    
    public static void main(String[] args) throws IOException {
        Connection connection = null;
        Admin admin = null;
        try {
            Configuration config = HBaseConfiguration.create();
            config.set("hbase.zookeeper.quorum", "192.168.41.129");
            connection = ConnectionFactory.createConnection(config);
            admin = connection.getAdmin();
            HTableDescriptor table = new HTableDescriptor(TableName.valueOf("test"));
            table.addFamily(new HColumnDescriptor("cf").setCompressionType(Algorithm.NONE));
            System.out.print("Creating table. ");
            admin.createTable(table);
            System.out.println(" Done.");
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            admin.close();
            connection.close();
        }
    }

}
  1. 放入数据到表中
public class HbaseClient {
    
    public static void main(String[] args) throws IOException {
        Connection connection = null;
        Table table = null;
        try {
            Configuration config = HBaseConfiguration.create();
            config.set("hbase.zookeeper.quorum", "192.168.41.129");
            connection = ConnectionFactory.createConnection(config);
            table = connection.getTable(TableName.valueOf("test"));
            Put put = new Put("row1".getBytes());
            put.addColumn("cf".getBytes(), "a".getBytes(), "value1".getBytes());
            table.put(put);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            table.close();
            connection.close();
        }
    }
    
}
  1. 查看表数据
public class HbaseClient {
    
    public static void main(String[] args) throws IOException {
        Connection connection = null;
        Table table = null;
        ResultScanner rs = null;
        try {
            Configuration config = HBaseConfiguration.create();
            config.set("hbase.zookeeper.quorum", "192.168.41.129");
            connection = ConnectionFactory.createConnection(config);
            table = connection.getTable(TableName.valueOf("test"));
            Scan scan = new Scan();
            scan.addColumn("cf".getBytes(), "a".getBytes());
            scan.setRowPrefixFilter(Bytes.toBytes("row"));
            rs = table.getScanner(scan);
            for (Result r = rs.next(); r != null; r = rs.next()) {
                System.out.println("row:" + new String(r.getValue("cf".getBytes(), "a".getBytes())));
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            rs.close();
            table.close();
            connection.close();
        }
    }

}

注意

知识点

TTL

alter 'test', NAME => 'cf', TTL=> 100

设置超时时间为100秒,这里是按column family设置的。

数据块编码

正确使用数据块编码可以有效节省存储空间,但也会为随之带来的编码解码工作所累,而带来数据读写效率的下降。hbase提供了四种数据块编码供选择,分别是:PrefixDiffFast DiffPrefix Tree。具体选用哪一种,可根据你的具体需求来决定。

alter 'test', NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST_DIFF'

这里设置的数据块编码为Fast Diff

配置web UI

<property>
  <name>hbase.master.info.port</name>
  <value>16010</value>
</property>

<property>
  <name>hbase.regionserver.info.port</name>
  <value>16030</value>
</property>

注意,在单机模式下部署hbase时,这两个端口会由hbase随机选择

问题

2019-04-23 11:34:26,632 WARN  [ProcedureExecutor-1] master.SplitLogManager: Returning success without actually splitting and deleting all the log files in path hdfs://localhost:9000/hbase/WALs/centos,54477,1555989722027-splitting: [FileStatus{path=hdfs://localhost:9000/hbase/WALs/centos,54477,1555989722027-splitting/centos%2C54477%2C1555989722027.meta.1555989759561.meta; isDirectory=false; length=1084; replication=3; blocksize=134217728; modification_time=1555989769891; access_time=1555989759573; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false}, FileStatus{path=hdfs://localhost:9000/hbase/WALs/centos,54477,1555989722027-splitting/centos%2C54477%2C1555989722027.meta.1555989860335.meta; isDirectory=false; length=91; replication=3; blocksize=134217728; modification_time=1555989922609; access_time=1555989860342; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false}]
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.PathIsNotEmptyDirectoryException): `/hbase/WALs/centos,54477,1555989722027-splitting is non empty': Directory is not empty
    at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:84)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3690)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:953)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:623)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)

    at org.apache.hadoop.ipc.Client.call(Client.java:1476)
    at org.apache.hadoop.ipc.Client.call(Client.java:1413)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy16.delete(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:545)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy17.delete(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:307)
    at com.sun.proxy.$Proxy18.delete(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:307)
    at com.sun.proxy.$Proxy18.delete(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2044)
    at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
    at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
    at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:296)
    at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:433)
    at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:406)
    at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:323)
    at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:440)
    at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:253)
    at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:75)
    at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:139)
    at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:506)
    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1167)
    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:955)
    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:908)
    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:77)
    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:482)

解决办法是:进入hadoop文件系统,删除掉报错的目录或整个WALs。

bin/hadoop fs -ls /hbase/WALs
bin/hadoop fs -rm -r /hbase/WALs

org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Wed Mar 27 17:31:57 CST 2019, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=76610: Call to localhost/127.0.0.1:38364 failed on connection exception: java.net.ConnectException: Connection refused: no further information row 'test,row,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=localhost,38364,1553670561949, seqNum=0

    at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:329)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:242)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
    at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:275)
    at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:436)
    at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:310)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1341)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1230)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:356)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
    at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:275)
    at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:436)
    at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:310)
    at com.hychong.coreutil.HbaseClient.main(HbaseClient.java:51)
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=76610: Call to localhost/127.0.0.1:38364 failed on connection exception: java.net.ConnectException: Connection refused: no further information row 'test,row,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=localhost,38364,1553670561949, seqNum=0
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:178)
    at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:38364 failed on connection exception: java.net.ConnectException: Connection refused: no further information
    at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:389)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:94)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
    at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
    at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:422)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:327)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:37059)
    at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:405)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:274)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
    ... 4 more
Caused by: java.net.ConnectException: Connection refused: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.setupConnection(BlockingRpcConnection.java:256)
    at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.setupIOstreams(BlockingRpcConnection.java:437)
    at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.writeRequest(BlockingRpcConnection.java:540)
    at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.tracedWriteRequest(BlockingRpcConnection.java:520)
    at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.access$200(BlockingRpcConnection.java:85)
    at org.apache.hadoop.hbase.ipc.BlockingRpcConnection$4.run(BlockingRpcConnection.java:724)
    at org.apache.hadoop.hbase.ipc.HBaseRpcControllerImpl.notifyOnCancel(HBaseRpcControllerImpl.java:240)
    at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.sendRequest(BlockingRpcConnection.java:699)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:420)
    ... 15 more

解决问题的办法:hbase客户端和hbase服务端所在主机的hosts文件中,都添加

192.168.41.129  centos

参考资料

上一篇下一篇

猜你喜欢

热点阅读