我已经建立了一个多节点Hadoop集群.NameNode和Secondary namenode在同一台机器上运行,集群只有一个Datanode.所有节点都在Amazon EC2计算机上配置.
masters 54.68.218.192 (public IP of the master node) slaves 54.68.169.62 (public IP of the slave node)
核心的site.xml
fs.default.name hdfs://localhost:9000
mapred-site.xml中
mapreduce.framework.name yarn
HDFS-site.xml中
dfs.replication 1 dfs.namenode.name.dir file:/usr/local/hadoop_store/hdfs/namenode dfs.datanode.name.dir file:/usr/local/hadoop_store/hdfs/datanode
核心的site.xml
fs.default.name hdfs://54.68.218.192:10001
mapred-site.xml中
mapred.job.tracker 54.68.218.192:10002
HDFS-site.xml中
dfs.replication 1 dfs.namenode.name.dir file:/usr/local/hadoop_store/hdfs/namenode dfs.datanode.name.dir file:/usr/local/hadoop_store/hdfs/datanode
在Namenode上运行的jps给出以下内容:
5696 NameNode 6504 Jps 5905 SecondaryNameNode 6040 ResourceManager
和datanode上的jps:
2883 DataNode 3496 Jps 3381 NodeManager
对我来说似乎是对的.
现在,当我尝试运行put命令时:
hadoop fs -put count_inputfile /test/input/
它给了我以下错误:
put: File /count_inputfile._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
datanode上的日志说明如下:
hadoop-datanode log INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 54.68.218.192/54.68.218.192:10001. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
yarn-nodemanager日志:
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
节点管理器(50070)的Web UI显示有0个活动节点和0个死节点,并且使用的dfs是100%
我也禁用了IPV6.
在一些网站上我发现我也应该编辑该/etc/hosts
文件.我也编辑了它们,它们看起来像这样:
127.0.0.1 localhost 172.31.25.151 ip-172-31-25-151.us-west-2.compute.internal 172.31.25.152 ip-172-31-25-152.us-west-2.compute.internal
为什么我仍然在犯这个错误?