python中的MapReduce：操作系统环境[“map_input_file”]无法在中工作地图.py

#!/bin/bash # -*- coding: utf-8 -*- import os import sys def mapper(): filepath = os.environ["map_input_file"] filename = os.path.split(filepath)[-1] #get the names for line in sys.stdin: if line.strip()=="": continue fields = line[:-1].split("\t") sno = fields[0] #get student ID if filename == 'worksheet1': #get student ID and name, mark 0 name = fields[1] print '\t'.join((sno,'0',name)) elif filename == 'worksheet2': #get student ID, course number, grade, mark 1 courseno = fields[1] grade = fields[2] print '\t'.join((sno,'1',courseno,grade)) if __name__=='__main__': mapper()

Traceback (most recent call last): File "map.py", line 30, in <module> mapper() File "map.py", line 11, in mapper filepath = os.environ['map_input_file'] File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__ raise KeyError(key) KeyError: 'map_input_file'

2条回答

网友

1楼 · 编辑于 2024-05-02 22:25:39

尚未设置map_input_file环境变量。此外，您将数据文件管道化到脚本中，以便在脚本中以sys.stdin的形式提供，但是发现当前正在读取其中哪一个的代码是完全错误的。我建议只使用fileinput模块。在

网友

2楼 · 编辑于 2024-05-02 22:25:39

在这种情况下，不能在本地测试程序。在

当您运行Hadoop流时，os.environ['map_input_file']将由Hadoop框架设置，以便您可以获得文件名。但是，当你在本地运行时，没有人为你设置。在

所以不要在本地机器上测试它，只需在Hadoop上运行它。在

顺便说一句，通过检查字段的数量来区分不同的文件是一种不好的做法，例如len(line.split(","))。因为你不会那么幸运，不同的文件总是有不同的len()。如果你正在处理其他人生成的文件，如果他们将来更改文件的格式（例如，添加更多字段），你会生气的。在

相关问题更多 >

编程相关推荐

热门问题

热门文章