from itertools import izip
text = "I'm working on language model and want to count the number pairs of two consequent words.\
I found an examples of such problem on language model and want to count the number pairs"
i = iter(text.split())
rdd = sc.parallelize([" ".join(x) for x in izip(i,i)])
print rdd.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x + y).collect()
也许这会有帮助。您可以在这里找到其他拆分方法:Is there a way to split a string by every nth separator in Python?
相关问题 更多 >
编程相关推荐