大数据之---hadoop问题排查汇总终极篇---持续更新中-创新互联
1、软件环境
RHEL6 | 角色 | jdk-8u45 |
hadoop-2.8.1.tar.gz | | ssh |
xx.xx.xx.xx ip地址 | NN | hadoop1 |
xx.xx.xx.xx ip地址 | DN | hadoop2 |
xx.xx.xx.xx ip地址 | DN | hadoop3 |
xx.xx.xx.xx ip地址 | DN | hadoop4 |
xx.xx.xx.xx ip地址 | DN | hadoop5 |
本次涉及伪分布式部署只是要主机hadoop1
成都创新互联公司-专业网站定制、快速模板网站建设、高性价比博乐网站开发、企业建站全套包干低至880元,成熟完善的模板库,直接使用。一站式博乐网站制作公司更省心,省钱,快速模板网站建设找我们,业务覆盖博乐地区。费用合理售后完善,十余年实体公司更值得信赖。
2、启动密钥互信问题
HDFS启动
[hadoop@hadoop01 hadoop]$ ./sbin/start-dfs.sh
Starting namenodes on [hadoop01]
The authenticity of host 'hadoop01 (172.16.18.133)' can't be established.
RSA key fingerprint is 8f:e7:6c:ca:6e:40:78:b8:df:6a:b4:ca:52:c7:01:4b.
Are you sure you want to continue connecting (yes/no)? yes
hadoop01: Warning: Permanently added 'hadoop01' (RSA) to the list of known hosts.
hadoop01: chown: changing ownership of `/opt/software/hadoop-2.8.1/logs': Operation not permitted
hadoop01: starting namenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-hadoop01.out
hadoop01: /opt/software/hadoop-2.8.1/sbin/hadoop-daemon.sh: line 159:
/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-hadoop01.out: Permission denied
启动如果有交互输入密码,不输入报错权限限制,这是因为我们没有配置互信,
伪分布式即便在同一台机器上面我们也需要配置ssh登陆互信。
非root用户公钥文件权限必须是600权限(root除外)
在hadoop用户配置ssh免密码登陆
[hadoop@hadoop01 .ssh]$ cat id_rsa.pub > authorized_keys [hadoop@hadoop01 .ssh]$ chmod 600 authorized_keys [hadoop@hadoop01 hadoop]$ ssh hadoop01 date [hadoop@hadoop01 .ssh]$ [hadoop@hadoop01 hadoop]$ ./sbin/start-dfs.sh Starting namenodes on [hadoop01] hadoop01: starting namenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-hadoop01.out hadoop01: starting datanode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-datanode-hadoop01.out Starting secondary namenodes [hadoop01] hadoop01: starting secondarynamenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-hadoop01.out [hadoop@hadoop01 hadoop]$ jps 1761 Jps 1622 SecondaryNameNode 1388 DataNode 1276 NameNode |
3、进程process information unavailable 问题
分两种情况:1、进程不存在,且process information unavailable
2、进程存在 报process information unavailable
对于第一种情况:
[hadoop@hadoop01 sbin]$ jps 3108 DataNode 4315 Jps 4156 SecondaryNameNode 2990 NameNode [hadoop@hadoop01 hsperfdata_hadoop]$ ls 5295 5415 5640 [hadoop@hadoop01 hsperfdata_hadoop]$ ll total 96 -rw------- 1 hadoop hadoop 32768 Apr 27 09:35 5295 -rw------- 1 hadoop hadoop 32768 Apr 27 09:35 5415 -rw------- 1 hadoop hadoop 32768 Apr 27 09:35 5640 [hadoop@hadoop01 hsperfdata_hadoop]$ pwd /tmp/hsperfdata_hadoop /tmp/hsperfdata_hadoop 里面记录jps显示的进程号,如果此时jps看到报错[hadoop@hadoop01 tmp]$ jps 3330 SecondaryNameNode -- process information unavailable 3108 DataNode -- process information unavailable 3525 Jps 2990 NameNode -- process information unavailable 查询异常进程是否存在 [hadoop@hadoop01 tmp]$ ps -ef |grep 3330 hadoop 3845 2776 0 09:29 pts/6 00:00:00 grep 3330 |
对于进程不存在了,ok去/tmp/hsperfdata_xxx删除文件, 直接重新启动进程。。
jps查询的是当前用户的 hsperfdata_当前用户/文件 [root@hadoop01 ~]# jps 7153 -- process information unavailable 8133 -- process information unavailable 7495 -- process information unavailable 8489 Jps [root@hadoop01 ~]# ps -ef |grep 7153 ---查看异常进程存在 hadoop 7153 1 2 09:47 ? 00:00:17 /usr/java/jdk1.8.0_45/bin/java -Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/software/hadoop-2.8.1/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/software/hadoop-2.8.1 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/software/hadoop-2.8.1/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/software/hadoop-2.8.1/logs -Dhadoop.log.file=hadoop-hadoop-namenode-hadoop01.log -Dhadoop.home.dir=/opt/software/hadoop-2.8.1 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/opt/software/hadoop-2.8.1/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode root 8505 2752 0 09:58 pts/6 00:00:00 grep 7153 假如存在,当前用户查看就是process information unavailable ,这时候查看是否进程是否存在,当前用户 ps –ef |grep 进程号,看进程运行用户,不是切换用户
[hadoop@hadoop01 hadoop]$ jps -----切换hadoop用户查看进程 7153 NameNode 8516 Jps 8133 DataNode 7495 SecondaryNameNode 切换用户发现进程都正常。 这个情况是查看的用户不对,hadoop查看jps不是运行用户查看,这个情况是不需要进行任何处理,服务运行正常
|
总结:对应process information unavailable报错,处理:
1.查看进程是否存在 (进程不存在,删/tmp/hsperfdata_xxx,重新启动进程)
2.如果进程存在,查看存在的进程运行用户,如果不是当前用户 切换用户后重新运行jps
另外有需要云服务器可以了解下创新互联scvps.cn,海内外云服务器15元起步,三天无理由+7*72小时售后在线,公司持有idc许可证,提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案,具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势,专为企业上云打造定制,能够满足用户丰富、多元化的应用场景需求。
网站栏目:大数据之---hadoop问题排查汇总终极篇---持续更新中-创新互联
浏览地址:
http://abwzjs.com/article/coceoc.html