Apache火花和Pythonλ

file = spark.textFile("hdfs://...") counts = file.flatMap(lambda line: line.split(" ")) \ .map(lambda word: (word, 1)) \ .reduceByKey(lambda a, b: a + b) counts.saveAsTextFile("hdfs://...")

1条回答

网友

1楼 · 发布于 2024-09-30 10:30:15

请参阅内联注释：

file = spark.textFile("hdfs://...") # opens a file
counts = file.flatMap(lambda line: line.split(" ")) \  # iterate over the lines, split each line by space (into words)
             .map(lambda word: (word, 1)) \ # for each word, create the tuple (word, 1)
             .reduceByKey(lambda a, b: a + b) # go over the tuples "by key" (first element) and sum the second elements
counts.saveAsTextFile("hdfs://...")

关于reduceByKey的更详细的解释可以找到here

编程相关推荐

java为九个按钮编写for循环
GridLayout的java问题
使用Apache POI XSSF将Unicode字符从Java写入Excel文件“.xlsx”
java如何在安卓 studio中从recycler视图向SQLite数据库添加数据
从视图传递日期的html。jsp到我的javaportlet
java使用卡片布局，卡片不交换？
java需要一个函数来限制一条线（通过它的坐标知道）的长度
java我可以用Jersey（和MOXy）发送对象数组吗？
java决定在Spring Boot应用程序中使用多个实现中的哪一个
java Eclipse如何在junit测试替换变量之前执行maven构建？

相关问题更多 >

编程相关推荐

热门问题

热门文章

Apache火花和Pythonλ

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >