hbase1.4.9使用小记
下载安装
下载
- 打开官方下载链接:https://www.apache.org/dyn/closer.lua/hbase/,打开页面上推荐的最上面的那个链接。显示的内容如下:
- 打开上图中红框选中的
stable
文件夹,下载后缀名为bin.tar.gz
的文件
安装
- 将下载好的文件放到对应目录,linux系统一般将该文件放在
/user/local
下面 - 解压文件:
tar xzvf hbase-1.4.9-bin.tar.gz
- 安装jdk并且配置
JAVA_HOME
环境变量,hbase与jdk的版本对照表如图所示:
本文使用的是1.4.9版本的hbase,所以对应的java版本最好是jdk7或者jdk8
配置
单机部署
进入解压后产生的文件目录
- 编辑
conf/hbase-site.xml
文件,文件内容如下:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///data/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/data/zookeeper</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
<description>
Controls whether HBase will check for stream capabilities (hflush/hsync).
Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
with the 'file://' scheme, but be mindful of the NOTE below.
WARNING: Setting this to false blinds you to potential data loss and
inconsistent system state in the event of process and/or node failures. If
HBase is complaining of an inability to use hsync or hflush it's most
likely not a false positive.
</description>
</property>
</configuration>
- 编辑
conf/hbase-env.sh
文件
# 配置JAVA_HOME
export JAVA_HOME=/usr/local/jdk1.8.0_201
# 推荐配置pid文件目录,若不配置此项,默认使用/tmp目录,文件易丢失
export HBASE_PID_DIR=/var/hadoop/pids
- 这里
hbase.rootdir
指向了一个本地目录/data/hbase
,测试环境这么使用没有问题,生产环境则最好不要这么做。 - 不需要预先创建上面配置文件中填写的目录(
/data/hbase
和/data/zookeeper
),hbase启动之后将自动创建这些目录。
伪分布式部署
在单机部署的基础上
- 将hbase设置为分布式的运行模式
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
- 将
hbase.rootdir
指向hdfs
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
- 移出
hbase.unsafe.stream.capability.enforce
的配置,或者将它置为true
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>true</value>
</property>
完全分布式部署
在伪分布式部署的基础上
- 编辑
conf/regionservers
,填入所有regionServer的hostname
centos
- 编辑
conf/hbase-site.xml
文件,填入所有ZooKeeper所在服务器的hostname
<property>
<name>hbase.zookeeper.quorum</name>
<value>centos</value>
</property>
启动
测试环境最好关闭服务器防火墙之后再使用下面的指令启动hbase,否则可能会出现各种问题。
bin/start-hbase.sh
假如要停止hbase,使用下面的指令
bin/stop-hbase.sh
验证
执行指令:
jps -lv | grep hbase
控制台显示如下图所示:
一个名为HMaster的进程正在运行,表示安装成功
初识hbase
- 连接hbase
bin/hbase shell
- 创建表
create 'test', 'cf'
这里指定了表名为test
,列族名为cf
。
- 查看表信息
list 'test'
控制台会显示刚才创建的test
表,证明表创建成功。
- 查看表详情
describe 'test'
- 放入数据到表中
put 'test', 'row1', 'cf:a', 'value1'
put 'test', 'row2', 'cf:b', 'value2'
put 'test', 'row3', 'cf:c', 'value3'
这里放入了三条数据
- 查看表数据
scan 'test'
上面是查看表的全部数据,假如只需要获取一行表数据的话,输入指令
get 'test', 'row1'
- 停用/启用数据表
停用数据表
disable 'test'
停用数据表之后,可以使用指令删除表
drop 'test'
若不想删除表,那么还可以使用指令再次启用该表
enable 'test'
- 退出HBase Shel
exit
数据模型
Table
Table由若干row组成
Row
row由row key以及若干column value组成,一个table中的若干row按row key的字典序排序。
Column
column由column family和column qualifier两部分组成,两部分用 : (冒号)隔开:
- Column Family
column family将一系列column和它们的column value物理上聚拢在一起,每一个column family有它自己的存储属性,包括是否这些column value应该缓存到内存,该使用什么样的压缩手段,row key该如何编码,等等。column family在创建table的时候就会确定,同一个table中的多个row拥有相同的column family。 - Column Qualifier
column qualifier配合column family用于索引指定的数据块。column qualifier无需在创建table的时候确定,同一个table中的多个row可以拥有不一样的column qualifier。
Cell
通过row key,column family,和column qualifier结合起来,可以唯一的定位到一个cell。cell由value和timestamp(表示value的版本)组成。
Timestamp
timestamp表示value的版本,写入数据时,默认会同时取RegionServer的当前时间作为timestamp,当然在写入数据时也可以自己指定timestamp。
表设计经验法则
- region大小为10到50gb
- cell大小不超过10 MB,若超过这个大小,将数据存到HDFS,hbase仅存一个指向该数据的指针。
- 每个表拥有1到3个column family,可以的话,尽可能做到一张表只有1个column family。
- 一个拥有1到2个column family的表最佳region数量为50-100个,需要注意一个region实际上就是一个column family的连续段。
- column family名需要尽可能的短,最好用一个字母就能表示。
- 假如row key是单调递增的话,那么可能会引发一个问题,就是所有的数据读写都集中在某一个region,而老的region将不会被充分利用,所以row key最好不要是单调递增的。
Java客户端配置
maven加入hbase-shaded-client依赖
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-shaded-client</artifactId>
<version>1.4.9</version>
</dependency>
- 客户端版本最好与服务端版本一致
代码示例
- 创建表
public class HbaseClient {
public static void main(String[] args) throws IOException {
Connection connection = null;
Admin admin = null;
try {
Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "192.168.41.129");
connection = ConnectionFactory.createConnection(config);
admin = connection.getAdmin();
HTableDescriptor table = new HTableDescriptor(TableName.valueOf("test"));
table.addFamily(new HColumnDescriptor("cf").setCompressionType(Algorithm.NONE));
System.out.print("Creating table. ");
admin.createTable(table);
System.out.println(" Done.");
} catch (Exception e) {
e.printStackTrace();
} finally {
admin.close();
connection.close();
}
}
}
- 放入数据到表中
public class HbaseClient {
public static void main(String[] args) throws IOException {
Connection connection = null;
Table table = null;
try {
Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "192.168.41.129");
connection = ConnectionFactory.createConnection(config);
table = connection.getTable(TableName.valueOf("test"));
Put put = new Put("row1".getBytes());
put.addColumn("cf".getBytes(), "a".getBytes(), "value1".getBytes());
table.put(put);
} catch (Exception e) {
e.printStackTrace();
} finally {
table.close();
connection.close();
}
}
}
- 查看表数据
public class HbaseClient {
public static void main(String[] args) throws IOException {
Connection connection = null;
Table table = null;
ResultScanner rs = null;
try {
Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "192.168.41.129");
connection = ConnectionFactory.createConnection(config);
table = connection.getTable(TableName.valueOf("test"));
Scan scan = new Scan();
scan.addColumn("cf".getBytes(), "a".getBytes());
scan.setRowPrefixFilter(Bytes.toBytes("row"));
rs = table.getScanner(scan);
for (Result r = rs.next(); r != null; r = rs.next()) {
System.out.println("row:" + new String(r.getValue("cf".getBytes(), "a".getBytes())));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
rs.close();
table.close();
connection.close();
}
}
}
注意
-
Connection
是重量级对象,而且是线程安全的,所以整个应用里面有一个该对象就足够了;Table
,Admin
和RegionLocator
是轻量级对象,所以最好是用完就关闭,在需要的时候再获取即可。
知识点
TTL
alter 'test', NAME => 'cf', TTL=> 100
设置超时时间为100秒,这里是按column family设置的。
数据块编码
正确使用数据块编码可以有效节省存储空间,但也会为随之带来的编码解码工作所累,而带来数据读写效率的下降。hbase提供了四种数据块编码供选择,分别是:Prefix,Diff,Fast Diff,Prefix Tree。具体选用哪一种,可根据你的具体需求来决定。
alter 'test', NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST_DIFF'
这里设置的数据块编码为Fast Diff。
配置web UI
<property>
<name>hbase.master.info.port</name>
<value>16010</value>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>16030</value>
</property>
注意,在单机模式下部署hbase时,这两个端口会由hbase随机选择
问题
- Directory is not empty
2019-04-23 11:34:26,632 WARN [ProcedureExecutor-1] master.SplitLogManager: Returning success without actually splitting and deleting all the log files in path hdfs://localhost:9000/hbase/WALs/centos,54477,1555989722027-splitting: [FileStatus{path=hdfs://localhost:9000/hbase/WALs/centos,54477,1555989722027-splitting/centos%2C54477%2C1555989722027.meta.1555989759561.meta; isDirectory=false; length=1084; replication=3; blocksize=134217728; modification_time=1555989769891; access_time=1555989759573; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false}, FileStatus{path=hdfs://localhost:9000/hbase/WALs/centos,54477,1555989722027-splitting/centos%2C54477%2C1555989722027.meta.1555989860335.meta; isDirectory=false; length=91; replication=3; blocksize=134217728; modification_time=1555989922609; access_time=1555989860342; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false}]
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.PathIsNotEmptyDirectoryException): `/hbase/WALs/centos,54477,1555989722027-splitting is non empty': Directory is not empty
at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:84)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3690)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:953)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:623)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy16.delete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:545)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy17.delete(Unknown Source)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:307)
at com.sun.proxy.$Proxy18.delete(Unknown Source)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:307)
at com.sun.proxy.$Proxy18.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2044)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:296)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:433)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:406)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:323)
at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:440)
at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:253)
at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:75)
at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:139)
at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:506)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1167)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:955)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:908)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:77)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:482)
解决办法是:进入hadoop文件系统,删除掉报错的目录或整个WALs。
bin/hadoop fs -ls /hbase/WALs
bin/hadoop fs -rm -r /hbase/WALs
- java客户端远程连接hbase时,有可能会出现下面的问题:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Wed Mar 27 17:31:57 CST 2019, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=76610: Call to localhost/127.0.0.1:38364 failed on connection exception: java.net.ConnectException: Connection refused: no further information row 'test,row,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=localhost,38364,1553670561949, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:329)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:242)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:275)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:436)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:310)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1341)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1230)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:356)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:275)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:436)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:310)
at com.hychong.coreutil.HbaseClient.main(HbaseClient.java:51)
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=76610: Call to localhost/127.0.0.1:38364 failed on connection exception: java.net.ConnectException: Connection refused: no further information row 'test,row,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=localhost,38364,1553670561949, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:178)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:38364 failed on connection exception: java.net.ConnectException: Connection refused: no further information
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:389)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:94)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:422)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:327)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:37059)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:405)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:274)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
... 4 more
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.setupConnection(BlockingRpcConnection.java:256)
at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.setupIOstreams(BlockingRpcConnection.java:437)
at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.writeRequest(BlockingRpcConnection.java:540)
at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.tracedWriteRequest(BlockingRpcConnection.java:520)
at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.access$200(BlockingRpcConnection.java:85)
at org.apache.hadoop.hbase.ipc.BlockingRpcConnection$4.run(BlockingRpcConnection.java:724)
at org.apache.hadoop.hbase.ipc.HBaseRpcControllerImpl.notifyOnCancel(HBaseRpcControllerImpl.java:240)
at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.sendRequest(BlockingRpcConnection.java:699)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:420)
... 15 more
解决问题的办法:hbase客户端和hbase服务端所在主机的hosts文件中,都添加
192.168.41.129 centos
- 左边是hbase所在服务器的ip,右边是主机名
- 服务端修改hosts文件之后,需要清除掉相关数据文件之后(按本文的配置,需清除
/data/hbase
和/data/zookeeper
两个文件夹),重新启动hbase。