04.Hadoop:namenode节点从sencondaryn
本节主要内容:
删除namenode节点上的所有fsimage,Edits,seen_txid文件,导致namenode节点无法读取datanode数据。通过将secondarynamenode节点的数据恢复到namenode节点上,实现元数据的恢复及namenode节点对datanode节点的正常管理。
1.系统环境:
OS:CentOS Linux release 7.5.1804 (Core)
CPU:2核心
Memory:1GB
运行用户:root
JDK版本:1.8.0_252
Hadoop版本:cdh5.16.2
2.集群各节点角色规划为:
172.26.37.245 node1.hadoop.com namenode
172.26.37.246 node2.hadoop.com datanode
172.26.37.247 node3.hadoop.com datanode
172.26.37.248 node4.hadoop.com sencondarynamenode
方法一:将SecondaryNameNode中数据拷贝到NameNode存储数据的目录;
1.清空namenode节点元数据
nodename节点清空/data/hdfs/name
# rm -rf /data/hdfs/name/*
重启服务,发现服务启动失败,查看日志
# service hadoop-hdfs-namenode restart
# service hadoop-hdfs-namenode status
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
# netstat -ant|grep 50070
查看日志
# vi /var/log/hadoop-hdfs/hadoop-hdfs-namenode-chefserver.log
2020-06-16 02:45:55,444 WARN org.apache.hadoop.hdfs.server.common.Storage: Storage directory /data/hdfs/name does not exist
2020-06-16 02:45:55,449 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/hdfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
提示/data/hdfs/name目录不存在。
2.从secondarynamenode节点上将数据恢复
# scp -r ./* root@192.168.100.245:/data/hdfs/name
3.查看namenode节点是否恢复
# cd /data/hdfs/
# chown -R hdfs:hdfs ./name
# cd name/
# ll
total 8
drwxr-xr-x 2 hdfs hdfs 4096 Jun 16 02:50 current
-rw-r--r-- 1 hdfs hdfs 22 Jun 16 02:50 in_use.lock
4.重启namenode节点服务,验证数据:
# service hadoop-hdfs-namenode start
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-chefserver.out
Started Hadoop namenode:[ OK ]
# sudo -u hdfs hadoop fs -ls /
Found 3 items
drwxr-xr-x - hdfs supergroup 0 2020-06-14 23:31 /secondary
drwxr-xr-x - hdfs supergroup 0 2020-06-14 23:32 /secondaryt
drwxrwxrwt - hdfs supergroup 0 2020-06-14 01:48 /tmp
namenode web可以正常查看datanode
方法二:-importCheckpoint
使用importCheckpoint选项启动NameNode节点守护进程,从而将SecondaryNameNode节点中数据拷贝到NameNode节点目录中。
1.清空namenode节点元数据
同上,略
2.namenode节点hdfs-site.xml文件中声明fs.checkpoint.dir
# vi /etc/hadoop/conf/hdfs-site.xml
添加如下:
<property>
<name>fs.checkpoint.dir</name>
<value>file:///home/hadoop/tmp</value>
</property>
namenode节点将从这个文件夹读取checkpoint数据,并恢复到原有的元数据配置文件夹
3.secondaryNamenode节点恢复数据到namenode节点
# scp -r /home/hadoop/tmp/dfs/namesecondary root@172.26.37.245:/home/hadoop/tmp
4.namenode节点恢复数据
# chown -R hdfs:hdfs /home/hadoop/tmp
# sudo -u hdfs hdfs namenode -importCheckpoint
20/06/16 03:06:38 INFO blockmanagement.CacheReplicationMonitor: Rescanning after 30002 milliseconds
20/06/16 03:06:38 INFO blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
5.重启namenode节点服务,验证数据:
重新启动namenode节点服务
# service hadoop-hdfs-namenode start
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-chefserver.out
Started Hadoop namenode:[ OK ]
# sudo -u hdfs hadoop fs -ls /
Found 3 items
drwxr-xr-x - hdfs supergroup 0 2020-06-14 23:31 /secondary
drwxr-xr-x - hdfs supergroup 0 2020-06-14 23:32 /secondaryt
drwxrwxrwt - hdfs supergroup 0 2020-06-14 01:48 /tmp
namenode web可以正常查看datanode