我想将csv文件写入HDFS
CSV文件以zip格式来自HTTP请求。请求的内容被转换为zipfile对象。从该zipfile中,如何正确提取csv以及如何将其写入HDFS
到目前为止,我已经尝试过:
import os
from hdfs.util import HdfsError
from http_wrapper import HttpWrapper
from io import BytesIO
from zipfile import ZipFile
unite_legale_data = HttpWrapper.get_zip_data(args.url_unite_legale)
unite_legale_name = unite_legale_data['content_name']
unite_legale_content = unite_legale_data['content']
log("INFO", "start writing to HDFS")
cli_hdfs = InsecureClient('http://' + os.environ['HDFS_IP'] + ':'+str(os.environ['HDFS_PORT']),user = "hdfs")
with cli_hdfs.write(args.unit_legale_output_path, encoding = 'utf-8', overwrite = True) as writer:
with unite_legale_content.open(unite_legale_name) as file:
writer.write(file.read())
我的类HttpWrapper如下所示:
class HttpWrapper:
@staticmethod
def get_zip_data(url):
print("get zip data from {}".format(url))
content = urlopen(url)
zipped_content = ZipFile(BytesIO(content.read()))
content_name = zipped_content.namelist()[0]
print("got data for file named {}".format(content_name))
return {"content_name": content_name,
"content": zipped_content}
这会产生以下错误:
AttributeError: 'bytes' object has no attribute 'encode'
对于这一行:
writer.write(file.read())
目前没有回答
相关问题 更多 >
编程相关推荐