我使用以下python代码从本地系统使用pyhdfs
将文件上载到远程HDFS
from pyhdfs import HdfsClient
client = HdfsClient(hosts='1.1.1.1',user_name='root')
client.mkdirs('/jarvis')
client.copy_from_local('/my/local/file,'/hdfs/path')
使用python3.5/。 Hadoop在默认端口50070中运行 1.1.1.1是我的远程Hadoop url
创建dir“jarvis”工作正常,但复制文件不起作用。我得到以下错误
Traceback (most recent call last):
File "test_hdfs_upload.py", line 14, in client.copy_from_local('/tmp/data.json','/test.json')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyhdfs.py", line 753, in copy_from_local self.create(dest, f, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyhdfs.py", line 426, in create metadata_response.headers['location'], data=data, **self._requests_kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 99, in put return request('put', url, data=data, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 44, in request return session.request(method=method, url=url, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 383, in request resp = self.send(prep, **send_kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 486, in send r = adapter.send(request, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 378, in send raise ConnectionError(e) requests.exceptions.ConnectionError: HTTPConnectionPool(host='ip-1-1-1-1', port=50075): Max retries exceeded with url: /webhdfs/v1/test.json?op=CREATE&user.name=root&namenoderpcaddress=ip-1-1-1-1:9000&overwrite=false (Caused by : [Errno 8] nodename nor servname provided, or not known)
首先,检查您的HDFS集群是否启用了
webhdfs
。PyHDFS library uses webhdfs因此需要在HDFS配置中启用webhdfs。 要启用webhdfs,请按如下方式修改hdfs-site.xml
:另外,当从PyHDFS库发出
copy_from_local()
API调用时,HDFS节点管理器会随机从HDFS集群中选择并分配一个节点,当它这样做时,它可能只返回与该节点相关联的域名。 然后尝试与该域建立HTTP连接以执行操作。这是因为您的主机无法理解(无法解析)域名而失败的情况。在要解析域,您需要在
/etc/hosts
文件中添加适当的域映射。在例如,如果您有一个HDFS集群,其中有一个namenode和两个datanodes,具有以下IP地址和主机名:
您需要更新
^{pr2}$/etc/hosts
文件,如下所示:这将启用从主机到HDFS集群的域名解析,并且可以通过PyHDFS进行webhdfsapi调用。在
相关问题 更多 >
编程相关推荐