控制python输出到cons

import sys def mapper(): ''' From Mapper1 : we need only UserID , (MovieID , rating) as output. ''' #* First mapper # Read input line for line in sys.stdin: # Strip whitespace and delimiter - ',' print line data = line.strip().split(',') if len(data) == 4: # Using array to print out values # Direct printing , makes python interpret # values with comma in between as tuples # tempout = [] userid , movieid , rating , timestamp = data # tempout.append(userid) # tempout.append((movieid , float(rating))) # print tempout # print "{0},({1},{2})".format(userid , movieid , rating)

1条回答

网友

1楼 · 发布于 2024-10-05 10:43:48

事实上，数据是一个字符串，然后将y拆分并分配给它，它仍然是一个字符串。在

如果你想要元组的原始值，作为数字，你需要解析它们。在

ast.literal_eval可以帮上忙。在

例如

In [1]: line = """671,(4973,4.5)"""

In [2]:  data = line.strip().split(',',1)

In [3]: data
Out[3]: ['671', '(4973,4.5)']

In [4]: x , y = data

In [5]: type(y)
Out[5]: str

In [6]: import ast

In [7]: y = ast.literal_eval(y)

In [8]: y
Out[8]: (4973, 4.5)

In [9]: type(y)
Out[9]: tuple

In [10]: type(y[0])
Out[10]: int

现在，如果您想切换到PySpark，那么您将能够更好地控制变量/对象类型，而不是使用Hadoop流式处理的所有字符串

相关问题更多 >

编程相关推荐

热门问题

热门文章